Skip to content

tahanakabi/Deep-Reinforcenment-learning-for-TCL-control

Repository files navigation

Deep-Reinforcenment-learning-for-TCL-control

This is an attempt to implement the RL control method used in https://arxiv.org/pdf/1604.08382.pdf We consider the problem of continuous control of aggregated heating systems in households in a price sensitive environment. The objective of this work is to minimize the total cost of energy consumed by a group of households for heating taking into consideration the comfort level of the residents and the price signals. The control is performed under uncertainty of the buildings’ mass temperatures and the specific dynamics of each household. We propose a new approach to learn an abstract representation of these hidden features using a convolutional neural network. The proposed control process is based on batch reinforcement learning, which will enable the controller to learn and improve its results by interacting with the environment. We consider that the heating systems, also known as thermostatically controlled loads (TCL) are individually equipped with backup controllers to maintain a reasonable temperature inside the house. We present and compare two deep reinforcement learning methods for this problem namely deep Q-learning network (DQN) and policy gradient networks (PGN). The benefit of these approaches is being model-free and can be transferrable to any kind of aggregate households’ controller as they don’t depend on houses’ parameters or thermodynamics. A simulation of a group of 30 TCLs was built using an aggregate model for heterogeneous TCLs with demand response. The convolutional neural network uses a matrix of historic states of charge (SOC) of TCLs during the past hours. Other punctual feature like outdoor temperatures, electricity prices and the time of the day are fed to a fully connected network concatenated to the convolutional network and outputting together an evaluation of the action or a percentage of best action. The control policy window is an episode of 24 hours and the network weights are modified at the end of each episode for DQN or after each step in PGN. The simulation was first run using random actions to explore the environment and collect a database of state, action, reward, next state. This database is used to train the agent offline before interacting with the environment with an epsilon-greedy strategy. The main contribution of this work is the inclusion of electricity prices into the decision process of the agent in order to minimize the cost and push the consumption to periods of low energy prices. Another contribution is the comparison between two main deep reinforcement learning techniques and showing the advantages and weaknesses of each method.

About

This is an attempt to implement the RL control method used in https://arxiv.org/pdf/1604.08382.pdf

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages