Have you heard about AI learning to play computer games on their own and giving tough competitions to expert Human gamers?
A very popular example being Deepmind whose AlphaGo program defeated the South Korean Go world champion in 2016. Other than this there are other AI agents developed with the intent of playing Atari games like Breakout, Pong, and Space Invaders.
Reinforcement learning differs from supervised learning in not needing labeled input/output pairs to be presented, and in not needing sub-optimal actions to be explicitly corrected. Instead, the focus is on finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge).
Note: If you are already aware of RL and Q-learning concepts, you may directly move to Part 2 which has an implementation of Q-learning using R from Scratch
Reinforcement learning (RL)
RL is an area of machine learning concerned with how software agents ought to take actions in an environment to maximize some notion of cumulative reward.
As an example, consider the process of boarding a train, in which the reward is measured by the negative of the total time spent boarding (alternatively, the cost of boarding the train is equal to the boarding time). One strategy is to enter the train door as soon as they open, minimizing the initial wait time for yourself. If the train is crowded, however, then you will have a slow entry after the initial action of entering the door as people are fighting you to depart the train as you attempt to board. The total boarding time, or cost, is then:
0 seconds wait time + 15 seconds fight time
On the next day, by random chance (exploration), you decide to wait and let other people depart first. This initially results in longer wait times. However, time-fighting other passengers are less. Overall, this path has a higher reward than that of the previous day, since the total boarding time is now:
5 second wait time + 0 second fight time.
Through exploration, despite the initial (patient) action resulting in a larger cost (or negative reward) than in the forceful strategy, the overall cost is lower, thus revealing a more rewarding strategy.
Many algorithms that come under Reinforcement Learning. For this article, we would focus on Q-learning which one of the most famous among other RL algorithms.
Q-learning is an off-policy, model-free RL algorithm based on the well-known Bellman Equation.
Alpha(α) - Learning rate (0<α≤1) - It is the rate by which the Q-values are updated. A high value of Alpha (close to 1) means the magnitude of the Q values will update fastly and take fewer iterations to learn. Similarly, low values of Alpha, will update Q values slowly and take more iterations to learn.
Gamma(γ) - Discount factor (0≤γ≤1) - Determines how much importance we want to give to future rewards. A high value for the discount factor (close to 1) captures the long-term effective award, whereas, a discount factor of 0 makes our agent consider only immediate reward, hence making it greedy.
Important terms to understand:
This procedural approach can be translated into simple language steps as follows:
Reinforcement learning is an awesome and interesting set of algorithms but there are few of many scenarios where you should not use the reinforcement learning model:
I hope with this article we could get an overview of RL and Q-learning algorithms. If you are still curious and want to see this working, Check out Part 2 of Reinforcement Learning; which has an implementation of Q-learning using R from Scratch.
Comments are closed for this blog post