Check out Intro to RL tutorials

Reinforcement Learning

Learn how AlphaGo works and build your own version in 4 weeks.

Next cohort starts the week of 2nd January.

Learn to implement the most exciting algorithm of the
21st Century
(so far)

Understand all the components you need to build AlphaGo, and unlock one of the most powerful and rapidly advancing technologies: Deep Reinforcement Learning.

A Unique Master's Level Syllabus

Intermediate Reinforcement Learning

Learn about each component of the famous AlphaGo algorithm, and how they fit together. Apply what you learn by writing AIs to solve iconic games, and go head-to-head to compete with the rest of the cohort. Taught by experts from DeepMind, Oxford and Cambridge.


  • 12 Handcrafted Tutorials
  • 4 RL Competitions
  • 8 Interactive coding exercises
  • 4 'Office Hours' with Experts
  • Slack Workspace to ask experts any questions
  • Cohort of Peers to Learn and Compete with


  • RL Fundamentals
  • Intermediate Python
  • Deep Neural Networks
Missing these prerequisites? Take our Intro to RL course first
Starts 2
Per week for 4 weeks.
Course Organisation

4 Weeks

2nd January to 29th January

Fully remote. Learn from anywhere.
2 live sessions per week:
  • Thurs/Fri: Office Hours (optional, 1 hour)
  • Sunday: Live Competition & Discussion (30 mins)

A new way to learn tech skills

Expert-Crafted Tutorials
Every week starts with 3 tutorials explaining new concepts. Each has Python coding exercises to solve to ensure you can put what you're learning into practice.
Compete Every Week
Apply what you learn each week in the competition. The code is released on Monday, with the submission deadline the following Sunday afternoon.
Live Competition & Discussion
Discuss how each team's solution works and watch the AI's compete! Afterwards, discuss why the winner won & see the code from the experts.

Course Syllabus

Week 1
Policy Gradients
Write algorithms that learn a policy and select actions without consulting a value function. Come the end of Week 1, you’ll understand how to approximate a stochastic policy and how to train a model in practice with policy gradients algorithms. This is the first step on the path from action-value learning to AlphaGo.
Week 2
Actor-Critic Methods
First, you’ll learn about balancing the bias-variance trade-off associated with value function updates with TD-λ. In actor-critic methods, the ‘Critic’ estimates the value function, and the ‘Actor’ updates the policy based on the Critic’s estimates. This helps resolve the instability in policy gradient methods. You’ll understand why this works and learn how to build Advantage Actor-Critic (A2C), and how to use Generalised Advantage Estimation.
Week 3
Monte Carlo Tree Search
This week sees a change in direction, to search and planning methods. With a model of the world, suddenly search and planning methods can be applied to solve Reinforcement Learning problems. Learn about simulation-based search and Monte Carlo Tree Search, the final component of AlphaGo.
Week 4
Case Study: AlphaGo
Now we bring it all together to understand AlphaGo from top to bottom. We’ll cover its design, including imitation learning, the value and policy components of AlphaGo, and how Monte-Carlo Tree Search is used. By the end, you’ll understand AlphaGo & build a replica (without as much compute).

Meet the Instructors

Dr. Matthew Phillips

Google DeepMindDeepMind
University College London
Matt's research at DeepMind was in Multi-agent Reinforcement Learning - how autonomous agents trade-off collaboration and competition. His PhD from University College London is in the Neuroscience of learning and memory, so deeply understands teaching.

Henry Pulver

University of CambridgeFive AI
While at Cambridge University studying for a Masters' in Machine Learning, Henry wrote his thesis on Reinforcement Learning. He then published papers as a machine learning researcher at the UK's largest autonomous driving startup, Five AI.

Dr. James Rowland

Oxford UniversityUniversity College London
James completed his PhD from the University of Oxford in Neural Computation, studying the paths information takes through the mammalian brain. He's since worked as a Data Scientist at early-stage startups, implementing machine learning models.

Learn, Build & Compete
in live AI contests

Online courses are rarely fun. It’s easy to lose motivation and give up.

Delta Academy makes learning RL a blast. In weekly competitions, work as a team to build a game AI and compete against others.
Get up to Speed
Get introduced to new concepts in code through short interactive tutorials that prepare you for the competition at the end of the week.
Team Up
Software is built by teams, not individuals. That's why we encourage collaborating in pairs in competitions. Form your dream-team: bring a friend, or make new ones!
Strive for Victory
Get competitive. Unlike dull online tutorials, where there’s nothing on the line, find yourself ultra-motivated as you strive for victory!

Ready, Set,
in 8 weeks

Go from RL novice to understanding AlphaGo, the system that beat the World Champion in the game of Go, through our two 4-week courses.
Cutting-Edge Code
Learn PyTorch, the machine learning framework used by researchers and practitioners in industry. All exercises and competition code are written in Python 3 with typing hints.
Stuck? Here to help!
Experts are always on hand to immediately answer questions and help you out in the cohort Slack workspace.
Office Hours
Once a week, ask questions in office hours, discuss the content & competition and listen to answers to other cohort members' questions.

What Alumni say

One of the best classes I've ever taken — it is SO FUN. The competitions are thrilling and hilarious. There is a lot of class camaraderie - people answering questions all the time, and the instructors are truly experts.

This class is one-of-a-kind and I would take any course they create without hesitation.
Siddharth Hiregowdara
Siddharth Hiregowdara
Product Manager,
I really enjoyed Delta Academy.

It has the high quality of top universities, the competitive spirit of Kaggle, and all the conveniences of remote working.
Hristo Buyukliev
Hristo Buyukliev
Senior Data Scientist, TBI Buy
Learning by developing games and joining competitions is probably one of my most fun learning experiences. I was so motivated to keep improving my models and learning from peers.

I can still remember during the four weeks of learning, I was so excited to wake up on Sunday mornings to watch the live competition.
Yiqi Wu
Yiqi Wu
Engineering Manager
I've gone through dozens of free reinforcement learning tutorials and while I "learned" RL, I never really

Delta Academy's approach is different. By building functional bots, I can now implement these algorithms confidently and that's something no other tutorial has done.
Kevin Wang
Kevin Wang
Co-Founder & CTO, Muxable
Where our Alumni work
Harvard University

Interested in joining the cohort?

Join the 4-Week Intermediate Reinforcement Learning cohort starting 2nd January while there are still spaces!
Join Cohort

Frequently asked questions