Heating, ventilation, and air conditioning of buildings accounts alone for nearly *40%* *of the global energy demand* **[1]**.

The need for

Energy Savingshas become increasily foundamental to fightClimate Change.We have been working on a cloud-based RL algorithm that can retrofit existing HVAC controls to obtain substantial results.

In the last decade, a new class of controls which relies on Artificial Intelligence have been proposed. In particular, we are going to highlight data-driven controls based on *Reinforcement Learning *(RL), since they showed from the very beginning promising results as HVAC controls **[2]**.

There are two main ways to upgrade with RL the air conditioning systems: to implement RL on new systems or to retrofit the existing ones. The first approach is suitable for manufacturers of Heating and air-conditioning systems, while the latter can be applied to any existing plant that can be controlled remotely.

We designed a Cloud-Based RL algorithm that continuously learns how to optimize power consumption by remotely reading the environmental data and consequently defining the HVAC set-points. The cloud-based solution is suitable to scale to a significant number of buildings.

Our test demonstrated a reduction between 5.4% and 9.4% in primary energy consumption for two different locations, guaranteeing the same thermal comfort of state-of- the-art controls.

Traditionally HVAC systems are controlled by *model-based *(e.g., Model Predictive Control)* *and *rule-based* controls:

The basic MPC concept can be summarized as follows. Suppose that we wish to control a multiple-input, multiple-output process while satisfying inequality constraints on the input and output variables. If a reasonably accurate dynamic model of the process is available, model and current measurements can be used to predict future values of the outputs. Then the appropriate changes in the input variables can be computed based on both predictions and measurements.

In essence, MPC can fit complex thermodynamics and achieve excellent results in terms of energy savings on a single building. Following this train of thought, there is a significant issue: the retrofit application of this kind of models requires to develop a thermo-energetic model for each existing building. Similarly, it is clear that the performance of the model relies on its quality, and having a pretty accurate model is usually expensive. High initial investments are one of the main problems of model-based approaches **[3]**. In the same fashion, for any intervention of energy efficiency on the building, the model has to be rebuilt or tuned, again, with an expensive involvement of a domain expert.

Rule-based modeling is an approach that uses a set of rules that *indirectly* specifies a mathematical model. This methodology is especially effective whenever the rule-set is significantly simpler than the model it implies, in such a way that the model is a repeated manifestation of a limited number of patterns.

RBCs are, thus, state-of-the-art model-free controls that represent an industry standard. A model-free solution can potentially scale up because the absence of a model makes the solution easily applicable to different buildings without the need for a domain expert. The main drawback of RBC is that they are difficult to be optimally tuned because they are not adaptable enough for the intrinsic complexity of the coupled building and plant thermodynamics.

Before introducing the advantages of RL Controls, we are going to talk briefly about RL itself. Using the words of Sutton and Barto **[4]**:

Reinforcement learning is learning what to do — how to map situations to actions — so as to maximize a numerical reward signal. The learner is not told which actions to take, but instead must discover which actions yield the most reward by trying them. In the most interesting and challenging cases, actions may affect not only the immediate reward but also the next situation and, through that, all subsequent rewards. These two characteristics — trial-and-error search and delayed reward — are the two most important distinguishing features of reinforcement learning.

In our case, an RL algorithm interacts directly with the HVAC control system. It adapts continuously to the controlled environment using real-time data collected on-site, without the need to access to a thermo-energetic model of the building. In this way, an RL solution could obtain primary energy savings reducing the operating costs while remaining suitable to a large-scale application.

Therefore, it is desirable to introduce RL controls for large-scale applications on HVAC systems where the operating cost is high, like those in charge of the thermo-regulation of a significant volume.

One of the building use classes where it could be convenient to implement an RL solution is the *supermarket* class. Supermarkets are, by definition, widespread buildings with variable thermal loads and complex occupational patterns that introduce a non-negligible stochastic component from the HVAC control point of view.

We are going to formalize this problem by using the framework of Reinforcement Learning, let us contextualize it in a better way.

In RL, an *agent* interacts with an *environment* and learns the optimal sequence of actions, represented by a policy to reach the desired goal. As reported in **[4]**:

The learner and decision maker is called the agent. The thing it interacts with, comprising everything outside the agent, is called the environment. These interact continually, the agent selecting actions and the environment responding to these actions and presenting new situations to the agent.

In our work, the environment is a supermarket building. A reward expresses the learning goal of the agent: a scalar value returned to the agent, which tells us how the agent is behaving with respect to the learning objective.

*Original post can be viewed here.*

© 2020 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central