The 5 most promising applications of Reinforcement Learning (with code)

Examples of Reinforcement Learning applications, with code to get started

Matthew Phillips
The 5 most promising applications of Reinforcement Learning (with code)

The past decade has seen a boom in AI research breakthroughs, and they are starting to impact the real world.

Much of the excitement has been driven by Reinforcement Learning for surpassing human world champions in games for the first time. But Reinforcement Learning applications are increasingly being used and rolled out as the field reaches maturity.

We’ve put together the most promising applications of Reinforcement Learning - including games - and found really useful open source code across different industries to get you started on your Reinforcement Learning journey.


Many of the high profile advances in Reinforcement Learning and AI have centred on games. This is because they replicate aspects of human intelligence, but in a controlled and well-defined environment. Iconic examples include DeepMind conquering Atari games and the game of Go.

Given the recent success in developing game AIs, this is one of the clearest success stories of Reinforcement Learning.

Delta Academy’s Introduction to Reinforcement Learning course is built around building AIs that solve games, partly for this reason. The course has experts on hand to guide, a group of peers to keep you accountable, and live competitions to test you.

But for getting up and running, the repos provided here should be useful.


Image: Google

Our first example learns to play Pong - the legendary arcade game. In Pong, at each timestep you have three possible actions: move up, move down, or stay still. The aim is to deflect the ball using your paddle such that it goes past the opponents paddle at the other end. It’s a classic game - you’ve probably played it at some point.

The algorithm originally used to surpass human level play at Pong by DeepMind - DQN - is the culmination of Delta Academy’s Introduction to Reinforcement Learning course. We’ve picked out an implementation that uses PyTorch and OpenAI Gym, and is particularly nicely documented.

GitHub repo: Pong Reinforcement Learning


Image: Google

DeepMind’s defeat of Lee Sedol, world champion of Go, is one of the iconic achievements of modern AI. It’s the moment at which the world woke up to AI’s potential.

The most important papers to understand how this works are the original AlphaGo paper, and the follow up AlphaGo Zero. The follow up learned purely by playing against itself, with no prior knowledge of expert human moves, while improving performance.

It’s the AlphaGo Zero algorithm that is the culmination of Delta Academy’s Intermediate Reinforcement Learning course.

There’s an open source implementation using the package Keras available here, if you wanted to give it a try yourself:

GitHub repo: BetaGo


Image: Google

Behind the scenes, finance is a major adopter of Reinforcement Learning. And it makes sense: you need to take actions in complex environments, and have large datasets of historic data to train on. Needless to say, none of the following is investment advice, but if you want to build your own models for these purposes, we have provided some useful resources to learn from.


Perhaps the most obvious application of RL to finance is selecting which trades to make. Automating trading has the advantages of saving time and being able to trade 24/7. And as we’ve seen, the potential for reinforcement learning systems to learn to take actions in complex settings is vast.

An example of this, in the trading-bot repo below, uses a Deep Q-Network trained on previous stock data (note: this is a toy model, not used in real trading). It uses example stock prices from Yahoo finance to train a bot that buys and trades the stock.

GitHub repo: Trading Bot

Portfolio Management

Deep Reinforcement Learning can also be applied to Portfolio Management, where a client needs their assets invested to achieve certain financial goals. Given a set of stocks, you must allocate money to maximise returns through time.

An implementation of a solution to this problem using data from the S&P500 and crypto-currency exchange Poloniex is provided below, based on a paper by Jiang and colleagues in 2017.

GitHub repo: Deep Portfolio Management RL

Finance Gym

Similar implementations can be found for high-frequency trading and cryptocurrency trading, along with tutorials and research publications, at the repo below.

This repo aims to provide a framework for people to learn the different aspects of RL for finance. It also links to repos for getting started and more advanced financial algorithms to learn from.

GitHub repo: Finance RL Gym

Smart cities

Image: Google

Smart cities use data and other digital technology to manage resources and services effectively in urban areas. Reinforcement learning can be used to optimize a variety of different systems in smart cities. For example, reinforcement learning can be used to control the flow of traffic in a city or to determine the most efficient routes for garbage collection.

If this sounds far-fetched, it shouldn’t. A similar problem was one of the early applications of reinforcement learning, with Google’s data centres being made much more efficient by DeepMind’s reinforcement learning models.

Smart Cities

Our first repository provides environments that simulate different problems in cities, such as allocating taxis and planning route for garbage trucks to take.

It’s a relatively simple environment that provides a good test bed for trying out RL algorithms in a toy setting. It’s clearly documented and has example solutions, making it a great place to get started training RL agents to tackle smart city problems.

GitHub repo: Smart Cities

Water Distribution

A related problem is water distribution: how to control pumps to ensure demand is met across a city. The next repo focusses on this problem, and includes both an environment to test in and example solutions. The code is a little challenging to dive into as it’s not clearly documented, but for an intermediate level coder it’s a useful example and environment to learn from.

GitHub repo: RL for Water Distribution

Power distribution

Our final example models power distribution for the purposes of heating buildings in a city.

This is a particularly sophisticated environment: the classes used include buildings with energy stores, pumps and heaters. The goal is to facilitate coordination of demand across the different buildings to maximise efficiency and heat use. It focusses on allocating energy resources depending on demand.

The aim of the lab developing it - the Intelligent Environment Lab - is to provide a framework to compare Reinforcement Learning algorithms for use in smart cities. And, as we’ve seen commonly across these repos, it’s based on an OpenAI Gym environment.

GitHub repo: City Learn


Image: Google

Robotics is an area that has held a lot of promise for RL. This is because RL is well suited to taking sensory observations and learning to take actions that achieve a goal. These actions can be highly complex, and may need to be learned in new settings - which could be very useful to robots interacting freely with the world.

Let’s consider an example of a robot given a reward for each step it takes towards a goal, and a penalty for each step away from the goal. The robot would then learn through trial and error which actions lead to the highest rewards, and eventually converges on a policy that enables it to reach the goal (this might look like walking, like in this research by Google Brain.

Fortunately, you don’t need a real robot to get started learning how to write reinforcement learning algorithms that control robots. Simulation environments have been developed that enable you to test your algorithms virtually.


Our first example uses PyBullet - a physics simulation environment - and OpenAI Gym to model a grasping task. This repo is clearly documented with several example solutions, formatted as Jupyter notebooks. It’s a good place to go to see how to build a solution of intermediate difficulty using widely used packages.

GitHub repo: Kuka RL

Learning to walk

Reinforcement Learning has also been applied to more open ended, playful settings. For example, YouTuber Sentdex created a video explaining how to train small virtual robots to walk towards a target. They provided code to go alongside the video, which is a great place to jump in if you’re more of a beginner - it provides an overview of what the process of getting reinforcement learning algorithms to work really looks like.

YouTube video: Sentdex RL walking


Our final example - SenseAct - is formatted similarly to OpenAI Gym. The repository includes tasks like reaching for objects with robots with different numbers of joints, and a robot docking task. In the reacher task, your algorithm must learn to control a robot arm reaching towards a target. In the docking task, your algorithm controls a robot that must return to a home base.

This repo includes the environment only - example solutions aren’t provided. It’s therefore best tackled by those with more experience.

GitHub repo: Sense Act

Recommendation systems

In a Recommendation System, the goal is to provide the user with the best content or product for their preferences or needs.

This can be formulated as a reinforcement learning problem. The state would be the information about the customer or user, the action as the recommendation to be made, and the reward as a readout of whether the user liked it - watch-time or clickthrough, for example. Alternatively, the action could be a prediction of a rating, with a reward given if the predicted rating matched the actual rating.

In the example repo, this is applied to movie recommendations. The environment is setup using OpenAI’s Gym framework with the goal of predicting a movie’s rating based on information about the movie and the user providing a rating.

By predicting the rating a user will give a movie, we can design algorithms that serve movies that the user is likely to give a high rating to. In the repo, this is achieving using Proximal Policy Optimisation (PPO) and a multilayer perceptron as a neural network.

GitHub repo: Recommendation Gym