Reinforcement Learning Algorithms and Use Cases

Written by Coursera Staff • Updated on Apr 25, 2025

Reinforcement learning algorithms allow artificial intelligence agents to learn the optimal way to perform a task through trial and error without human intervention. Explore reinforcement learning algorithms such as Q-learning and actor-critic.

[Featured Image] A diverse group of professionals collaborating in a modern workspace, discussing reinforcement learning algorithms while engaging with a computer.

Reinforcement learning is a machine learning method in which computers, robots, or other AI models find the best way to accomplish a goal through trial and error without the need for a computer scientist or other person to show them what to do. Reinforcement learning allows the AI to reflect on its own decisions and determine its value toward reaching the goal, ultimately finding the solution with the highest or best value.

You can use different reinforcement learning algorithms that utilize various approaches to the learning process, primarily in how they determine the value of actions within the decision-making process and learn through trial and error.

Explore how reinforcement learning algorithms work and examples such as Q-learning, SARSA, and DDPG.

How do reinforcement learning algorithms work?

Reinforcement learning algorithms learn in a way that might remind you of how humans learn—through trying things and determining whether that attempt was good. If you want to learn how to do something, such as how to play chess, one way to learn is to sit down in front of a chessboard and start playing. As a beginner, you’re guaranteed to make a lot of mistakes. Every time you make a mistake, you can learn from what went wrong and become a stronger player for the next game. After playing many, many games of chess, you’ll start to understand the best way to dominate your opponents, no matter what maneuvers they attempt.

Reinforcement learning works similarly. The algorithm attempts to accomplish a goal and then evaluates its own performance. It will adjust its decision-making process based on the feedback it gives itself about its actions. It uses a system of rewards and penalties to learn the most effective way to achieve a goal, much like how humans learn through trial and error. Continuing to use chess as a real-world example of this process, DeepMind developed AlphaZero, an artificial intelligence model that can play chess, shogi, and go.

Reinforcement learning uses the Markov decision process, a sequential decision-making process based on mathematics, to evaluate specific actions' immediate and cumulative rewards. The AI model will first explore its environment by trying different actions and considering whether they move the state toward the final goal.

By looking at the immediate and long-term rewards of certain decisions, the AI model can choose the solution with the most value.

Model-based vs. model-free

Reinforcement learning algorithms can be differentiated as model-based or model-free, which describes whether the AI model builds an internal model of its environment or not. In a controlled, unchanging environment, an AI model may build a map or model of its environment to determine the optimum way to navigate the space. For example, a robot that serves drinks at a restaurant may create a map of the area to choose the best path to each table. With the model, the AI can predict the best action without physically navigating the space first. This is a model-based reinforcement learning algorithm.

In more complex or dynamic environments, the model-free agent will learn directly through trial and error because it cannot build an internal model similarly. For example, a self-driving car can’t map out the space it will move through because of the variability of other drivers, pedestrians, road conditions, and other factors. The AI must instead learn through trying different actions and seeing what works. In this case, learning happens within a virtual environment, so the agent can experiment freely without endangering anyone.

What are the algorithms used in reinforcement learning?

The most common reinforcement learning algorithms are Q-learning, SARSA, REINFORCE, PPO, TRPO, A2C and A3C, and DDPG. These algorithms differ in how they allow the main components of reinforcement learning to interact (i.e., the agent, environment, policy, and reward). The agent is the AI model, the environment is everything the AI model interacts with, the policy is the programming or instructions the AI model has, and the reward is a score representing the value of an action.

Each reinforcement learning algorithm has a different approach to implementing these four primary components.

Q-learning or Deep Q-Networks (DQNs): Q-learning is a model-free algorithm that allows an AI model to learn without any prior knowledge or policy, or with the ability to deviate from policy. As a result, a Q-learning algorithm can create its own set of rules for achieving the desired action by predicting what the reward (Q-value) will be for any given action. Because of this, you can use Q-learning in uncontrolled or unpredictable environments. Combining Q-learning with neural networks allows you to use a DQN algorithm.

SARSA (State, Action, Reward, State, Action): SARSA is a model-free algorithm like Q-learning, but it learns based on the actions it actually takes.

REINFORCE: The REINFORCE algorithm is a type of policy-gradient algorithm, which means that it adjusts its policy as it learns by predicting the return of certain actions. Because REINFORCE seeks to identify the optimal policy as it manipulates its environment, it's considered an off-policy algorithm.

Actor-critic and A2C: Actor-critic algorithms use two neural networks, one as the actor to select actions and the other as the critic to evaluate the actions. The actor follows the current policy, and the critic evaluates and adjusts the policy after each iteration. This architecture can help you get the best of both value-based and policy-based algorithms.

Trust-region policy optimization (TRPO): TRPO algorithms help solve a common problem with policy gradient algorithms. Sometimes the policy changes can be so big or small that the program won’t work as expected. A TRPO prevents the policy changes from being too drastic by adding constraints to the policy updates in each iteration.

Proximal policy optimization (PPO): PPO is an on-policy algorithm developed as a simpler and just as effective solution to the problems that a TRPO can solve. This algorithm uses a novel equation to simplify the program by implementing updates in batches.

Deep deterministic policy gradient (DDPG): A DDPG is an algorithm that combines many of the qualities of other algorithms mentioned. It is an off-policy, actor-critic model that uses a value-based critic to learn a deterministic policy, or a policy that is predictable based on the input.

Use cases for reinforcement learning algorithms in machine learning

You can use reinforcement learning in many different industries for various applications. A few examples of the many ways you could use reinforcement learning include gaming, health care, and self-driving vehicles:

Gaming: Reinforcement learning algorithms can learn to play games, allowing you to play against an opponent who can adapt to your moves. You can also use reinforcement learning algorithms for game testing.

Self-driving cars: You can use reinforcement learning to control a self-driving car that can learn to maneuver in a complex and unpredictable environment. Reinforcement learning allows the AI model to manage complex variables like speed, multiple lanes, and other drivers.

Health care: Health care professionals use reinforcement learning to help guide patient treatment decisions. This technology is called dynamic treatment regimes.

Learn more about reinforcement learning algorithms on Coursera

Reinforcement learning algorithms allow AI models to learn the best way to accomplish a goal with little to no guidance from a human agent. Consider taking a course online if you’d like to learn more about reinforcement learning algorithms. You can begin today on Coursera with Fundamentals of Reinforcement Learning offered by the University of Alberta as part of the Reinforcement Learning Specialization.

Keep reading

Updated on Apr 25, 2025

Written by:

Coursera Staff

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.