.

This blog is the first part of a three-blog series, which talks about basics of reinforcement learning (RL)and how we can formulate a given problem into a reinforcement learning problem.

The blog is based on my teaching and insights from our book at the University of Oxford. I also wish to thank my co-authors Phil Osborne and Dr Matt Taylor for their feedback to my work.

In this blog, we introduce Reinforcement learning and the idea of an autonomous agent.

In the next blog, we will discuss the RL problem in context of other similar techniques – specifically Multi-arm bandits and Contextual bandits

Finally, we will look at various applications of RL in the context of an autonomous agent

Thus, in these three blogs – we consider RL, not as an algorithm in itself but rather as a mechanism to create autonomous agents (and their applications)

This series will help you understand the core concepts of reinforcement learning and encourage you to build and define your problem into an RL problem.

**What is Reinforcement Learning?** – *“It is a field of Artificial Intelligence in which the machine learns in an environment setup by trial and error methods. Here the machine is referred to as an agent that performs certain actions and for each valuable action, a reward is given.* *Reinforcement learning algorithm's focus is based on finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge).**”*

**Understanding with an example . . .**

Let’s go with the most common yet easy example to understand the basic concept of Reinforcement learning. Think of a new dog and ways with which you train it. Here, the dog is the **agent** and its surroundings become the **environment.** Now, when you throw a frisbee away, you expect your dog to run behind it and get it back to you. Here, throwing away the frisbee becomes the **state** and whether or not the dog runs behind the frisbee will depict its **action.** If the dog chooses to run behind the frisbee (an **action**) and get it back, you will **reward** him with a cookie/biscuit to indicate the positive response. If otherwise, some punishment can be given in order to indicate the negative response. That’s exactly what happens in reinforcement learning.

This interactive method of learning stands on four pillars, also called “The Elements of Reinforcement Learning” –

**Policy**- A policy can be termed as a way of tackling agent’s learning behaviour at a given instance. In a more generic language, it is a strategy used by agent towards its end goal.

**Reward**- In RL, training the agent is more like luring it to a bait of reward points. For every right decision an agent makes, it is rewarded with positive points, whereas, for every wrong decision an agent makes, a punishment or negative points are given.

**Value**- The value function works upon the probability of achieving the maximum reward. It is an algorithm that determines whether or not the current action in a given state will yield or help yield best reward.

**Model (optional)**- RL can either be model-free or model-based. Model-based reinforcement learning helps connect the environment with some prior knowledge i.e. it comes with a planned idea of agent’s policy determination with integrated functional environment.

**Formulating an RL problem . . .**

Reinforcement learning is a general interacting, learning, predicting, and decision-making paradigm. This can be applied to an application where the problem can be treated as a sequential decision-making problem. For which we first formulate the problem by defining the environment, the agent, states, actions, and rewards.

A summary of the steps involved in formulating an RL problem to modelling the problem and finally deploying the system is discussed below –

– Define environment, agent, states, actions, and rewards.*Define the RL problem*– Prepare data from interactions with the environment and/or a model/simulator.*Collect data*– This can probably be a manual task with the domain knowledge.*Feature engineering*– Decide the best representation and model/algorithm. It can be online/offline, on-/off-policy, model-free/model-based, etc.*Choose modelling method*– Iterate and refine the previous steps based on experiments.*Backtrack and refine*– Monitor the deployed system*Deploy and Monitor*

**RL framework – Markov Decision Processes (MDPs)**

Generally, typical reinforcement learning problems are formalized in the form of Markov Decision Processes, which acts as a framework for modelling a decision-making situation. They follow the principles of Markov property, i.e. any future state will only be dependent on the current state and independent of past states, and hence the name Markov decision process. Mathematically, MDPs are derived consisting of following elements –

where, the end goal is to get the value of state, V(s), or the value of state-action pairs, Q(s,a) while there is a continuous interaction of the agent and environment space.

In the next blog, we will discuss the RL problem in context of other similar techniques – specifically Multi-arm bandits and Contextual bandits. This will expand on the problem of using RL to create autonomous agents. In the final part, we will talk about real-world reinforcement learning applications and how one can apply the same in multiple sectors.

About Me (Kajal Singh)

Kajal Singh is a Data Scientist and a Tutor at the Artificial Intelligence – Cloud and Edge implementations course at the University of Oxford. She is also the co-author of the book “*Applications of Reinforcement Learning to Real-World Data: An educational introduction to the fundamentals of Reinforcement Learning with practical examples on real data* (2021)”

**References –**

- How social media can help you find jobs that aren't advertised
- Insightsoftware acquisition of Izenda targets embedded BI
- Top 20 cloud computing skills to boost your career in 2021
- Will codeless test automation work for you?
- Reap the rewards of IT/OT convergence in manufacturing
- New IoT Cybersecurity Improvement Law is a start, not a final solution
- Who belongs on a high-performance data governance team?
- Interpreted vs. compiled languages: What's the difference?
- IBM acquires MyInvenio to build its automation portfolio
- Structured vs. unstructured data: The key differences

Posted 12 April 2021

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central