Summary: At the core of modern AI, particularly robotics, and sequential tasks is Reinforcement Learning. Although RL has been around for many years it has become the third leg of the Machine Learning stool and increasingly important for Data Scientist to know when and how to implement.
If you poled a group of data scientist just a few years back about how many machine learning problem types there are you would almost certainly have gotten a binary response: problem types were clearly divided into supervised and unsupervised.
Today if you asked that same question you are very likely to find that machine learning problem types are divided into three categories:
While Reinforcement Learning (RL) has been around since at least the 80’s and before that in the behavioral sciences, its introduction as a major player in machine learning reflects it rising importance in AI.
The key to understanding when to use Reinforcement Learning is this:
What problems fit this description? Well robotic control for one and game play for another, both a central focus of AI over the last few years.
To drastically simplify, RL methods are deployed to address two problem types:
We’ll offer some additional examples to clarify this, but you should immediately perceive the confusion created by introducing this third ML type. The distinction between Supervised and Unsupervised problem types was immediately clear by both the problem definition and the data that is available.
RL on the other hand is defined by the absence of pre-existing data, but has goals that could also be addressed by Supervised or Unsupervised techniques as well if you first gathered training data. For example, value prediction is clearly also in the realm of Supervised problems, and some Control problems that focus on optimized outcomes can also be answered with Supervised or even Unsupervised techniques.
The concepts underlying RL come from animal behavior studies. One of the most commonly used examples is that of the new-born baby gazelle. Although it is born without any understanding or model of how to use its legs, within minutes it is standing and within 20 minutes it is running. This learning has come from rapidly interacting with its environment, learning which muscle responses are successful, and being rewarded by survival.
RL Basic Concepts
RL is exactly that: a system in which success is learned by interacting with its environment through trial and error. Contrasted to supervised and unsupervised learning which both have data on which to learn, RL is making its own data through experience and determining the ‘champion model’ through trial and error, and pure reinforcement. RL agents learn from their own experience, contrasted to Supervised learners which have examples from which to learn
The basic idea behind RL is simplicity itself.
Some additional examples of problems where RL is a good choice.
Overlap With Supervised Learning and Some RL Strategies
In the table above look specifically at the ‘Marketing Team’ and ‘Website Designer’ examples and you should recognize problems where classical A/B testing or a supervised classification problem to detect best prospects have been the historical go-to solutions. Both these would require running tests to gather training data, then modeling and applying results. Similarly, many of the other problems would yield to supervised or unsupervised modeling if training data were available.
Consider the A/B test for example. The RL practitioner would say “Why wait for training data? Let’s just start showing options (probably more than just A and B) to customers and continue to show the ones that get the best results.”
This in fact is the core of the RL strategy with this exception. If we allow the agent to settle in too quickly on what appears to be the winning solution it may never explore all the other options (the full extent of the State space). This may result in overfitting or an early hang up on a local optima.
To minimize this likelihood, in designing the RL agent we should use ‘a greedy Theta (Ɵ’). This factor tells the agent at what rate to continue to look randomly in the State space for a better solution. Set it too high and the system may take too long to optimize. Set it too low, the system may never examine some of the State space. (For more on this see Multi-Armed Bandits and Markov Decision Process).
Four Problems that RL Must Deal With
In the design of your RL project there are four problems you must deal with (Kevin Murphy, 1998, Florentin Woergoetter and Bernd Porr, 2008).
All of these are areas of active research and exploration. As compute speeds grow higher and costs lower we can visit increasingly larger subsets of the State space and examine more time delay options. Many of these solutions apply principles from neural nets, back propagation, evolutionary or annealing-like procedures, and factored structures all of which reside primarily in the realms of Supervised and Unsupervised learning. This cross over of domains can be confusing but where and how to apply RL is an increasingly important body of knowledge for data scientists.
Other Articles in this Series
About the author: Bill Vorhies is Editorial Director for Data Science Central and has practiced as a data scientist and commercial predictive modeler since 2001. He can be reached at: