Check out Intro to RL tutorials


Reinforcement Learning

Learn how AlphaGo works and build your own version in 4 weeks.

Next cohort starts the week of 3rd October.

Learn to implement the most exciting algorithm of the

21st Century

(so far)

Understand all the components you need to build AlphaGo, and unlock one of the most powerful and rapidly advancing technologies: Deep Reinforcement Learning.

A Unique Master's Level Syllabus

Intermediate Reinforcement Learning

Learn about each component of the famous AlphaGo algorithm, and how they fit together. Apply what you learn by writing AIs to solve iconic games, and go head-to-head to compete with the rest of the cohort. Taught by experts from DeepMind, Oxford and Cambridge.


  • 12 Handcrafted Tutorials

  • 4 RL Competitions

  • 8 Interactive coding exercises

  • 4 'Office Hours' with Experts

  • Slack Workspace to ask experts any questions

  • Cohort of Peers to Learn and Compete with


  • RL Fundamentals

  • Intermediate Python

  • Deep Neural Networks

Missing these prerequisites? Take our Intro to RL course first

Starts 3




Per week for 4 weeks.

Course Organisation

4 Weeks

3rd October to 30th October

Fully remote. Learn from anywhere.

2 live sessions per week:

  • Thurs/Fri: Office Hours (optional, 1 hour)
  • Sunday: Live Competition & Discussion (30 mins)

A new way to learn tech skills

Expert-Crafted Tutorials

Every week starts with 3 tutorials explaining new concepts. Each has Python coding exercises to solve to ensure you can put what you're learning into practice.

Compete Every Week

Apply what you learn each week in the competition. The code is released on Monday, with the submission deadline the following Sunday afternoon.

Live Competition & Discussion

Discuss how each team's solution works and watch the AI's compete! Afterwards, discuss why the winner won & see the code from the experts.

Course Syllabus

Week 1

Policy Gradients

Write algorithms that learn a policy and select actions without consulting a value function. Come the end of Week 1, you’ll understand how to approximate a stochastic policy and how to train a model in practice with policy gradients algorithms. This is the first step on the path from action-value learning to AlphaGo.

Week 2

Actor-Critic Methods

First, you’ll learn about balancing the bias-variance trade-off associated with value function updates with TD-λ. In actor-critic methods, the ‘Critic’ estimates the value function, and the ‘Actor’ updates the policy based on the Critic’s estimates. This helps resolve the instability in policy gradient methods. You’ll understand why this works and learn how to build Advantage Actor-Critic (A2C), and how to use Generalised Advantage Estimation.

Week 3

Monte Carlo Tree Search

This week sees a change in direction, to search and planning methods. With a model of the world, suddenly search and planning methods can be applied to solve Reinforcement Learning problems. Learn about simulation-based search and Monte Carlo Tree Search, the final component of AlphaGo.

Week 4

Case Study: AlphaGo

Now we bring it all together to understand AlphaGo from top to bottom. We’ll cover its design, including imitation learning, the value and policy components of AlphaGo, and how Monte-Carlo Tree Search is used. By the end, you’ll understand AlphaGo & build a replica (without as much compute).

Learn, Build & Compete

in live AI contests

Online courses are rarely fun. It’s easy to lose motivation and give up.

Delta Academy makes learning RL a blast. In weekly competitions, work as a team to build a game AI and compete against others.

Get up to Speed

Get introduced to new concepts in code through short interactive tutorials that prepare you for the competition at the end of the week.

Team Up

Software is built by teams, not individuals. That's why we encourage collaborating in pairs in competitions. Form your dream-team: bring a friend, or make new ones!

Strive for Victory

Get competitive. Unlike dull online tutorials, where there’s nothing on the line, find yourself ultra-motivated as you strive for victory!

Ready, Set,


in 8 weeks

Go from RL novice to understanding AlphaGo, the system that beat the World Champion in the game of Go, through our two 4-week courses.

Cutting-Edge Code

Learn PyTorch, the machine learning framework used by researchers and practitioners in industry. All exercises and competition code are written in Python 3 with typing hints.

Stuck? Here to help!

Experts are always on hand to immediately answer questions and help you out in the cohort Slack workspace.

Office Hours

Once a week, ask questions in office hours, discuss the content & competition and listen to answers to other cohort members' questions.

What Alumni say

I really enjoyed Delta Academy.

It has the high quality of top universities, the competitive spirit of Kaggle, and all the conveniences of remote working.

Hristo Buyukliev
Senior Data Scientist, TBI Buy

One of the best classes I've ever taken — it is SO FUN. The competitions are thrilling and hilarious. There is a lot of class camaraderie - people answering questions all the time, and the instructors are truly experts.

This class is one-of-a-kind and I would take any course they create without hesitation.

Siddharth Hiregowdara
Siddharth Hiregowdara
Product Manager,

Learning by developing games and joining competitions is probably one of my most fun learning experiences. I was so motivated to keep improving my models and learning from peers.

I can still remember during the four weeks of learning, I was so excited to wake up on Sunday mornings to watch the live competition.

Yiqi Wu
Engineering Manager

I've gone through dozens of free reinforcement learning tutorials and while I "learned" RL, I never really



Delta Academy's approach is different. By building functional bots, I can now implement these algorithms confidently and that's something no other tutorial has done.

Kevin Wang
Kevin Wang
Co-Founder & CTO, Muxable

Where our Alumni work

Harvard University

Interested in joining the cohort?

Join the 4-Week Intermediate Reinforcement Learning cohort starting 3rd October while there are still spaces!

Join Cohort

Frequently asked questions