This blog is the second part of a two-blog series. Here, we discuss different sectors where reinforcement learning can be used to solve complex problems efficiently. The blog is based on this paper. In the previous post, we studied the basics of reinforcement learning and how one can think of a problem as a Reinforcement Learning problem. In this follow-on post, we look at how real-world reinforcement learning applications can be developed.
In general, to formulate any reinforcement learning problem, we need to define the environment, the agent, states, actions, and rewards. This idea forms the basis for the examples in this post.
We cover reinforcement learning for -
Algorithms for recommendation systems are constantly evolving and reinforcement learning techniques play a key part in recommendation algorithms. Recommender systems face some unique challenges which can be addressed using reinforcement learning techniques. These challenges are
Horizon is Facebook’s open source applied reinforcement learning platform for recommendations. We formulate the illustrations in the figure below in terms of reinforcement learning as –
Cooling is quite an essential process for data centers in order to lower high temperatures and conserve energy. Reinforcement learning can be efficiently used for data center cooling. Ideally MPC (Model-predictive method) is used to monitor or regulate the temperature and airflow for the components in the data center, such as the fan speeds, water flow regulators, air handling units (AHUs) etc. This problem can be solved as a reinforcement learning technique as –
For reinforcement learning purposes, the data center is modelled as a control loop for cooling processes. The figure below illustrates the process.
I have done a good amount of work in the financial sector and with the same domain knowledge, I think there are multiple problems that can be modelled as sequential decision problems in the financial sector. Reinforcement learning can be employed for some of these, which include problems such as option pricing, portfolio optimization, risk management, etc.
In case of option pricing, the challenge comes with determining the right price for the option. To formulate option pricing as a reinforcement learning problem, we again define states, actions, and rewards as below –
When we model option pricing as a reinforcement learning problem, the entire training or process depends on learning the state-action-value function.
Reinforcement Learning aims to improve efficiency and reduce cost for its applications in the transportation sector. Order dispatching process in ridesharing systems is one of the best applications of RL in transportation (example – Uber). The process of allocating a driver to a passenger is a complex process and depends on various factors such as demand prediction, route planning, fleet management, etc. The problem of order dispatching includes both spatial and temporal components. This problem could be formulated as a reinforcement learning problem where:
Composition and workflow of the order dispatching simulator. (from Tang et al. (2019))
The model is initialized using historical data. After that, the process is driven by an order dispatch policy learned with reinforcement learning.
Healthcare is one of the most crucial sectors where there are many opportunities and challenges for AI where reinforcement learning could be used. We will discuss some of these below –
In the case of DTRs, we could consider –
Another healthcare application for reinforcement learning can be generation of reports from medical images. A medical report comprises specific segments such as the findings, the report's conclusion (main finding and diagnosis), any secondary information, etc. For this case, I leave the problem on to the reader for formulating the same into a reinforcement learning problem.
Hint – In this scenario, first a CNN (convolutional neural network) is used to extract a set of images' visual features and transform the features into a context vector. From this context vector, a sentence decoder generates latent topics recurrently. Based on a latent topic, a retrieval policy module generates sentences using either a generation approach or a template. The RL based retrieval policy integrates prior human knowledge.
Hope you enjoyed reading the blog! For any questions or doubts, please drop a comment.
About Me (Kajal Singh)
Kajal Singh is a Data Scientist and a Tutor at the Artificial Intelligence – Cloud and Edge implementations course at the University of Oxford. She is also the co-author of the book “Applications of Reinforcement Learning to Real-World Data: An educational introduction to the fundamentals of Reinforcement Learning with practical examples on real data (2021)”.