# DRL usage example

In this notebook example, we've used Stable Baselines 3 to train and load an agent. However, Sinergym is entirely agnostic to any DRL algorithm (though it does have custom callbacks specifically for SB3) and can be used with any DRL library that interfaces with Gymnasium.

## Training a model

We'll be using the `train_agent.py` script located in the repository root. This script leverages all the capabilities of Sinergym for working with deep reinforcement learning algorithms and sets parameters for everything, allowing us to easily define training options via a JSON file when executing the script.

For more details on how to run `train_agent.py`, please refer to [Train a model](https://ugr-sail.github.io/sinergym/compilation/main/pages/deep-reinforcement-learning.html#train-a-model).

In [1]:
import sys
from datetime import datetime

import gymnasium as gym
import numpy as np
import wandb
from stable_baselines3 import *
from stable_baselines3.common.callbacks import CallbackList
from stable_baselines3.common.logger import HumanOutputFormat
from stable_baselines3.common.logger import Logger as SB3Logger
from stable_baselines3.common.monitor import Monitor

import sinergym
import sinergym.utils.gcloud as gcloud
from sinergym.utils.callbacks import *
from sinergym.utils.constants import *
from sinergym.utils.logger import CSVLogger, WandBOutputFormat
from sinergym.utils.rewards import *
from sinergym.utils.wrappers import *

First, let's set some variables for the execution.

In [2]:
# Environment ID
environment = "Eplus-5zone-mixed-continuous-stochastic-v1"
# Training episodes
episodes = 5
#Name of the experiment
experiment_date = datetime.today().strftime('%Y-%m-%d_%H:%M')
experiment_name = 'SB3_PPO-' + environment + \
    '-episodes-' + str(episodes)
experiment_name += '_' + experiment_date

This experiment can be combined with [Weights&Biases](https://ugr-sail.github.io/sinergym/compilation/main/pages/deep-reinforcement-learning.html#weights-and-biases-structure) to manage all extracted information. With *wandb*, you can track and visualize the entire DRL training process in real time, record hyperparameters and experiment details, save artifacts like models and *sinergym* output, and compare different executions.

In [3]:
# Create wandb.config object in order to log all experiment params
experiment_params = {
    'sinergym-version': sinergym.__version__,
    'python-version': sys.version
}
experiment_params.update({'environment':environment,
                          'episodes':episodes,
                          'algorithm':'SB3_PPO'})

# Get wandb init params (you have to specify your own project and entity)
wandb_params = {"project": 'sinergym',
                "entity": 'alex_ugr'}
# Init wandb entry
run = wandb.init(
    name=experiment_name + '_' + wandb.util.generate_id(),
    config=experiment_params,
    ** wandb_params
)

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33malex_ugr[0m. Use [1m`wandb login --relogin`[0m to force relogin


Now, we're ready to create the Gym environment. We'll use the previously defined environment name, but remember that you can [change the default environment configuration](https://ugr-sail.github.io/sinergym/compilation/main/pages/notebooks/change_environment.html#Changing-an-environment-registered-in-Sinergym). We'll also create an eval_env for evaluation episodes. If desired, we can replace the env name with the experiment name.

In [4]:
env = gym.make(environment, env_name=experiment_name)
eval_env = gym.make(environment, env_name=experiment_name+'_EVALUATION')

[38;20m[ENVIRONMENT] (INFO) : Creating Gymnasium environment... [SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05][0m
[38;20m[MODELING] (INFO) : Experiment working directory created [/workspaces/sinergym/examples/Eplus-env-SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05-res35041][0m
[38;20m[MODELING] (INFO) : Model Config is correct.[0m
[38;20m[MODELING] (INFO) : Updated building model with whole Output:Variable available names[0m
[38;20m[MODELING] (INFO) : Updated building model with whole Output:Meter available names[0m
[38;20m[MODELING] (INFO) : runperiod established: {'start_day': 1, 'start_month': 1, 'start_year': 1991, 'end_day': 31, 'end_month': 12, 'end_year': 1991, 'start_weekday': 0, 'n_steps_per_hour': 4}[0m
[38;20m[MODELING] (INFO) : Episode length (seconds): 31536000.0[0m
[38;20m[MODELING] (INFO) : timestep size (seconds): 900.0[0m
[38;20m[MODELING] (INFO) : timesteps per episode: 35040[0m
[38;

We can also add a Wrapper to the environment. We'll use an action and observation normalization wrapper and a logger (extensions of `gym.Wrapper`). Normalization is highly recommended for DRL algorithms with continuous action space, and the logger is used to monitor and log environment interactions and save the data into a CSV. The generated files will also be stored as *wandb* artifacts.

In [5]:
env = NormalizeObservation(env)
env = NormalizeAction(env)
env = LoggerWrapper(env)

eval_env = NormalizeObservation(eval_env)
eval_env = NormalizeAction(eval_env)
eval_env = LoggerWrapper(eval_env)

[38;20m[WRAPPER NormalizeObservation] (INFO) : Wrapper initialized.[0m
[38;20m[WRAPPER NormalizeAction] (INFO) : New normalized action Space: Box(-1.0, 1.0, (2,), float32)[0m
[38;20m[WRAPPER NormalizeAction] (INFO) : Wrapper initialized[0m
[38;20m[WRAPPER LoggerWrapper] (INFO) : Wrapper initialized.[0m
[38;20m[WRAPPER NormalizeObservation] (INFO) : Wrapper initialized.[0m
[38;20m[WRAPPER NormalizeAction] (INFO) : New normalized action Space: Box(-1.0, 1.0, (2,), float32)[0m
[38;20m[WRAPPER NormalizeAction] (INFO) : Wrapper initialized[0m
[38;20m[WRAPPER LoggerWrapper] (INFO) : Wrapper initialized.[0m


At this point, the environment is set up and ready to use. We'll create our learning model (Stable Baselines 3 PPO), but any other algorithm can be used.

In [6]:
model = PPO('MlpPolicy', env, verbose=1)

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.


Evaluation will run the current model for a set number of episodes to determine if it's the best current version of the model at that training stage. The generated output will also be stored on the *wandb* server. We'll use the LoggerEval callback to print and save the best evaluated model during training.

In [7]:
callbacks = []

# Set up Evaluation and saving best model
eval_callback = LoggerEvalCallback(
    eval_env,
    best_model_save_path=eval_env.get_wrapper_attr('workspace_path') +
    '/best_model/',
    log_path=eval_env.get_wrapper_attr('workspace_path') +
    '/best_model/',
    eval_freq=(eval_env.get_wrapper_attr('timestep_per_episode') - 1) * 2 - 1,
    deterministic=True,
    render=False,
    n_eval_episodes=1)
callbacks.append(eval_callback)


To track the entire training process in *wandb*, we need to create a callback with a compatible wandb output format (which calls the *wandb* log method in the learning algorithm process).

In [8]:
# wandb logger and setting in SB3
logger = SB3Logger(
    folder=None,
    output_formats=[
        HumanOutputFormat(
            sys.stdout,
            max_length=120),
        WandBOutputFormat()])
model.set_logger(logger)
# Append callback
log_callback = LoggerCallback()
callbacks.append(log_callback)


callback = CallbackList(callbacks)

This is the total number of time steps for the training.

In [9]:
timesteps = episodes * (env.get_wrapper_attr('timestep_per_episode') - 1)

Now, it's time to train the model with the previously defined callbacks. This may take a few minutes, depending on your computer.

In [10]:
model.learn(
    total_timesteps=timesteps,
    callback=callback,
    log_interval=100)

#----------------------------------------------------------------------------------------------#
[38;20m[ENVIRONMENT] (INFO) : Starting a new episode... [SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05] [Episode 1][0m
#----------------------------------------------------------------------------------------------#
[38;20m[MODELING] (INFO) : Episode directory created [/workspaces/sinergym/examples/Eplus-env-SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05-res35041/Eplus-env-sub_run1][0m
[38;20m[MODELING] (INFO) : Weather file USA_NY_New.York-J.F.Kennedy.Intl.AP.744860_TMY3.epw used.[0m
[38;20m[MODELING] (INFO) : Adapting weather to building model. [USA_NY_New.York-J.F.Kennedy.Intl.AP.744860_TMY3.epw][0m
[38;20m[ENVIRONMENT] (INFO) : Saving episode output path... [/workspaces/sinergym/examples/Eplus-env-SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05-res35041/Eplus-env-sub_run1/output][0m

  epw_content = self._headers_to_epw(use_datetimes=use_datetimes) + df.to_csv(


[38;20m[SIMULATOR] (INFO) : Running EnergyPlus with args: ['-w', '/workspaces/sinergym/examples/Eplus-env-SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05-res35041/Eplus-env-sub_run1/USA_NY_New.York-J.F.Kennedy.Intl.AP.744860_TMY3_Random_1.0_0.0_0.001.epw', '-d', '/workspaces/sinergym/examples/Eplus-env-SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05-res35041/Eplus-env-sub_run1/output', '/workspaces/sinergym/examples/Eplus-env-SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05-res35041/Eplus-env-sub_run1/5ZoneAutoDXVAV.epJSON'][0m
[38;20m[ENVIRONMENT] (INFO) : Episode 1 started.[0m
[38;20m[SIMULATOR] (INFO) : handlers initialized.[0m
[38;20m[SIMULATOR] (INFO) : handlers are ready.[0m
[38;20m[SIMULATOR] (INFO) : System is ready.[0m
[38;20m[WRAPPER NormalizeObservation] (INFO) : Saving normalization calibration data... [SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-

  logger.warn(
  gym.logger.warn("Casting input x to numpy array.")
  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(


--------------------------------------------------------------------------------------------------------------| 1%
| action_network/                 |              |
|    Cooling_Setpoint_RL          | -1.0         |
|    Heating_Setpoint_RL          | -1.0         |
| action_simulation/              |              |
|    Cooling_Setpoint_RL          | 23.25        |
|    Heating_Setpoint_RL          | 12.0         |
| normalized_observation/         |              |
|    HVAC_electricity_demand_rate | -0.814899    |
|    air_humidity                 | 1.3925622    |
|    air_temperature              | -0.73331887  |
|    clg_setpoint                 | -0.88596326  |
|    co2_emission                 | 0.0          |
|    day_of_month                 | 2.1552622    |
|    diffuse_solar_radiation      | -0.5900505   |
|    direct_solar_radiation       | -0.44649348  |
|    hour                         | -1.280776    |
|    htg_setpoint                 | -1.3434962   |
|    month        

  epw_content = self._headers_to_epw(use_datetimes=use_datetimes) + df.to_csv(


[38;20m[SIMULATOR] (INFO) : handlers are ready.[0m
[38;20m[SIMULATOR] (INFO) : System is ready.[0m
[38;20m[WRAPPER NormalizeObservation] (INFO) : Saving normalization calibration data... [SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05][0m
[38;20m[WRAPPER LoggerWrapper] (INFO) : Creating monitor.csv for current episode (episode 2) if logger is active[0m
------------------------------------------------------
| action_network/                     |              |
|    Cooling_Setpoint_RL              | 1.0          |
|    Heating_Setpoint_RL              | 1.0          |
| action_simulation/                  |              |
|    Cooling_Setpoint_RL              | 30.0         |
|    Heating_Setpoint_RL              | 23.25        |
| episode/                            |              |
|    comfort_violation_time(%)        | 34.1         |
|    cumulative_abs_comfort_penalty   | -1.33e+04    |
|    cumulative_abs_energy_penalty    | -9.31e+07    |


  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  gym.logger.warn("Casting input x to numpy array.")


-------------------------------------------------
| action_network/                 |             |
|    Cooling_Setpoint_RL          | 0.13925427  |
|    Heating_Setpoint_RL          | -0.65428823 |
| action_simulation/              |             |
|    Cooling_Setpoint_RL          | 27.094984   |
|    Heating_Setpoint_RL          | 13.944629   |
| normalized_observation/         |             |
|    HVAC_electricity_demand_rate | 0.8283593   |
|    air_humidity                 | -1.058512   |
|    air_temperature              | 0.25035265  |
|    clg_setpoint                 | -1.2106607  |
|    co2_emission                 | 0.0         |
|    day_of_month                 | -1.5466638  |
|    diffuse_solar_radiation      | -0.41049454 |
|    direct_solar_radiation       | -0.39536366 |
|    hour                         | 0.50671417  |
|    htg_setpoint                 | 1.3934845   |
|    month                        | -1.5898544  |
|    outdoor_humidity             | -1.6140932  |


  epw_content = self._headers_to_epw(use_datetimes=use_datetimes) + df.to_csv(


[38;20m[SIMULATOR] (INFO) : handlers initialized.[0m
[38;20m[SIMULATOR] (INFO) : handlers are ready.[0m
[38;20m[SIMULATOR] (INFO) : System is ready.[0m
[38;20m[WRAPPER NormalizeObservation] (INFO) : Saving normalization calibration data... [SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05_EVALUATION][0m
[38;20m[WRAPPER LoggerWrapper] (INFO) : Creating monitor.csv for current episode (episode 1) if logger is active[0m
Progress: |*--------------------------------------------------------------------------------------------------| 1%

  logger.warn(
  gym.logger.warn("Casting input x to numpy array.")


Progress: |***************************************************************************************************| 99%
[38;20m[WRAPPER LoggerWrapper] (INFO) : End of episode, recording summary (progress.csv) if logger is active[0m
[38;20m[ENVIRONMENT] (INFO) : Environment closed. [SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05_EVALUATION][0m
[38;20m[WRAPPER NormalizeObservation] (INFO) : Saving normalization calibration data... [SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05_EVALUATION][0m
#----------------------------------------------------------------------------------------------#
[38;20m[ENVIRONMENT] (INFO) : Starting a new episode... [SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05] [Episode 3][0m
#----------------------------------------------------------------------------------------------#
[38;20m[MODELING] (INFO) : Episode directory created [/workspaces/sinergym/examples/Eplus

  epw_content = self._headers_to_epw(use_datetimes=use_datetimes) + df.to_csv(


[38;20m[WRAPPER NormalizeObservation] (INFO) : Saving normalization calibration data... [SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05][0m
[38;20m[WRAPPER LoggerWrapper] (INFO) : Creating monitor.csv for current episode (episode 3) if logger is active[0m
Eval num_timesteps=70077, episode_reward=-7873.80 +/- 0.00
Episode length: 35040.00 +/- 0.00
--------------------------------------------------------
| action_network/                        |             |
|    Cooling_Setpoint_RL                 | -1.0        |
|    Heating_Setpoint_RL                 | -0.3610608  |
| action_simulation/                     |             |
|    Cooling_Setpoint_RL                 | 23.25       |
|    Heating_Setpoint_RL                 | 15.594033   |
| eval/                                  |             |
|    comfort_violation(%)                | 21.4        |
|    cumulative_absolute_comfort_penalty | -5.82e+03   |
|    cumulative_absolute_energy_penalty  | -9

  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  gym.logger.warn("Casting input x to numpy array.")


--------------------------------------------------------------------------------------------------------------| 1%
| action_network/                 |              |
|    Cooling_Setpoint_RL          | 0.0042271614 |
|    Heating_Setpoint_RL          | 0.61073136   |
| action_simulation/              |              |
|    Cooling_Setpoint_RL          | 26.639267    |
|    Heating_Setpoint_RL          | 21.060364    |
| normalized_observation/         |              |
|    HVAC_electricity_demand_rate | 0.43898392   |
|    air_humidity                 | -1.1601273   |
|    air_temperature              | -1.5489684   |
|    clg_setpoint                 | -0.9137452   |
|    co2_emission                 | 0.0          |
|    day_of_month                 | -1.5546337   |
|    diffuse_solar_radiation      | -0.74571544  |
|    direct_solar_radiation       | -0.6143348   |
|    hour                         | -0.7939818   |
|    htg_setpoint                 | -0.2531272   |
|    month        

  epw_content = self._headers_to_epw(use_datetimes=use_datetimes) + df.to_csv(


[38;20m[SIMULATOR] (INFO) : handlers are ready.[0m
[38;20m[SIMULATOR] (INFO) : System is ready.[0m
[38;20m[WRAPPER NormalizeObservation] (INFO) : Saving normalization calibration data... [SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05][0m
[38;20m[WRAPPER LoggerWrapper] (INFO) : Creating monitor.csv for current episode (episode 4) if logger is active[0m
-----------------------------------------------------
| action_network/                     |             |
|    Cooling_Setpoint_RL              | 0.9940889   |
|    Heating_Setpoint_RL              | -0.5075133  |
| action_simulation/                  |             |
|    Cooling_Setpoint_RL              | 29.98005    |
|    Heating_Setpoint_RL              | 14.770238   |
| episode/                            |             |
|    comfort_violation_time(%)        | 19.7        |
|    cumulative_abs_comfort_penalty   | -9.99e+03   |
|    cumulative_abs_energy_penalty    | -2.12e+08   |
|    cumula

  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  gym.logger.warn("Casting input x to numpy array.")


-------------------------------------------------
| action_network/                 |             |
|    Cooling_Setpoint_RL          | 1.0         |
|    Heating_Setpoint_RL          | -1.0        |
| action_simulation/              |             |
|    Cooling_Setpoint_RL          | 30.0        |
|    Heating_Setpoint_RL          | 12.0        |
| normalized_observation/         |             |
|    HVAC_electricity_demand_rate | -0.21371971 |
|    air_humidity                 | -0.7919203  |
|    air_temperature              | -1.5395818  |
|    clg_setpoint                 | 1.2232087   |
|    co2_emission                 | 0.0         |
|    day_of_month                 | -1.6710285  |
|    diffuse_solar_radiation      | -0.74607044 |
|    direct_solar_radiation       | -0.6145723  |
|    hour                         | 1.228204    |
|    htg_setpoint                 | -1.4021665  |
|    month                        | -1.6005225  |
|    outdoor_humidity             | -1.9605635  |


  epw_content = self._headers_to_epw(use_datetimes=use_datetimes) + df.to_csv(


[38;20m[SIMULATOR] (INFO) : Running EnergyPlus with args: ['-w', '/workspaces/sinergym/examples/Eplus-env-SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05_EVALUATION-res35041/Eplus-env-sub_run2/USA_NY_New.York-J.F.Kennedy.Intl.AP.744860_TMY3_Random_1.0_0.0_0.001.epw', '-d', '/workspaces/sinergym/examples/Eplus-env-SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05_EVALUATION-res35041/Eplus-env-sub_run2/output', '/workspaces/sinergym/examples/Eplus-env-SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05_EVALUATION-res35041/Eplus-env-sub_run2/5ZoneAutoDXVAV.epJSON'][0m
[38;20m[ENVIRONMENT] (INFO) : Episode 2 started.[0m
[38;20m[SIMULATOR] (INFO) : handlers are ready.[0m
[38;20m[SIMULATOR] (INFO) : System is ready.[0m
[38;20m[WRAPPER NormalizeObservation] (INFO) : Saving normalization calibration data... [SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05_EVALUATION][

  logger.warn(
  gym.logger.warn("Casting input x to numpy array.")


Progress: |***************************************************************************************************| 99%
[38;20m[WRAPPER LoggerWrapper] (INFO) : End of episode, recording summary (progress.csv) if logger is active[0m
[38;20m[ENVIRONMENT] (INFO) : Environment closed. [SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05_EVALUATION][0m
[38;20m[WRAPPER NormalizeObservation] (INFO) : Saving normalization calibration data... [SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05_EVALUATION][0m
#----------------------------------------------------------------------------------------------#
[38;20m[ENVIRONMENT] (INFO) : Starting a new episode... [SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05] [Episode 5][0m
#----------------------------------------------------------------------------------------------#
[38;20m[MODELING] (INFO) : Episode directory created [/workspaces/sinergym/examples/Eplus

  epw_content = self._headers_to_epw(use_datetimes=use_datetimes) + df.to_csv(


[38;20m[WRAPPER NormalizeObservation] (INFO) : Saving normalization calibration data... [SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05][0m
[38;20m[WRAPPER LoggerWrapper] (INFO) : Creating monitor.csv for current episode (episode 5) if logger is active[0m
Eval num_timesteps=140154, episode_reward=-7683.78 +/- 0.00
Episode length: 35040.00 +/- 0.00
--------------------------------------------------------
| action_network/                        |             |
|    Cooling_Setpoint_RL                 | 0.38579977  |
|    Heating_Setpoint_RL                 | 0.2952625   |
| action_simulation/                     |             |
|    Cooling_Setpoint_RL                 | 27.927074   |
|    Heating_Setpoint_RL                 | 19.28585    |
| eval/                                  |             |
|    comfort_violation(%)                | 21.3        |
|    cumulative_absolute_comfort_penalty | -5.41e+03   |
|    cumulative_absolute_energy_penalty  | -

  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  gym.logger.warn("Casting input x to numpy array.")


-------------------------------------------------
| action_network/                 |             |
|    Cooling_Setpoint_RL          | -1.0        |
|    Heating_Setpoint_RL          | 0.106759846 |
| action_simulation/              |             |
|    Cooling_Setpoint_RL          | 23.25       |
|    Heating_Setpoint_RL          | 18.225525   |
| normalized_observation/         |             |
|    HVAC_electricity_demand_rate | -0.20701692 |
|    air_humidity                 | -0.9421152  |
|    air_temperature              | 0.40453672  |
|    clg_setpoint                 | -1.0988007  |
|    co2_emission                 | 0.0         |
|    day_of_month                 | -1.6724828  |
|    diffuse_solar_radiation      | 1.1782382   |
|    direct_solar_radiation       | 0.30694818  |
|    hour                         | -0.07192415 |
|    htg_setpoint                 | -0.70636636 |
|    month                        | -1.6018252  |
|    outdoor_humidity             | -1.6473752  |


  epw_content = self._headers_to_epw(use_datetimes=use_datetimes) + df.to_csv(


[38;20m[SIMULATOR] (INFO) : handlers are ready.[0m
[38;20m[SIMULATOR] (INFO) : System is ready.[0m
[38;20m[WRAPPER NormalizeObservation] (INFO) : Saving normalization calibration data... [SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05][0m
[38;20m[WRAPPER LoggerWrapper] (INFO) : Creating monitor.csv for current episode (episode 6) if logger is active[0m
-----------------------------------------------------
| action_network/                     |             |
|    Cooling_Setpoint_RL              | 1.0         |
|    Heating_Setpoint_RL              | 1.0         |
| action_simulation/                  |             |
|    Cooling_Setpoint_RL              | 30.0        |
|    Heating_Setpoint_RL              | 23.25       |
| episode/                            |             |
|    comfort_violation_time(%)        | 15.6        |
|    cumulative_abs_comfort_penalty   | -6.12e+03   |
|    cumulative_abs_energy_penalty    | -2.17e+08   |
|    cumula

  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  gym.logger.warn("Casting input x to numpy array.")


--------------------------------------------------------------------------------------------------------------| 1%
| action_network/                 |             |
|    Cooling_Setpoint_RL          | -0.30490056 |
|    Heating_Setpoint_RL          | 0.36055952  |
| action_simulation/              |             |
|    Cooling_Setpoint_RL          | 25.59596    |
|    Heating_Setpoint_RL          | 19.653147   |
| normalized_observation/         |             |
|    HVAC_electricity_demand_rate | 4.112525    |
|    air_humidity                 | -1.130085   |
|    air_temperature              | -1.2624166  |
|    clg_setpoint                 | 0.42736623  |
|    co2_emission                 | 0.0         |
|    day_of_month                 | -1.5579975  |
|    diffuse_solar_radiation      | -0.7460236  |
|    direct_solar_radiation       | -0.614521   |
|    hour                         | -1.372267   |
|    htg_setpoint                 | 1.2547963   |
|    month                        |

<stable_baselines3.ppo.ppo.PPO at 0x7efe0a39d0f0>

Now, we save the current model (the model version when training has finished). We will save the mean and var normalization calibration in order to use it in model evaluation, although these values can be consulted later in a txt saved in Sinergym training output. Visit NormalizeObservation wrapper documentation for more information.

In [11]:
model.save(str(env.get_wrapper_attr('timestep_per_episode'))+ '/' + experiment_name)
# Save observation normalization calibration
if hasattr(env, 'mean') and hasattr(env, 'var'):
    training_mean = env.get_wrapper_attr('mean')
    training_var = env.get_wrapper_attr('var')

  logger.warn(
  logger.warn(


And as always, remember to close the environment.

In [12]:
env.close()

[38;20m[WRAPPER LoggerWrapper] (INFO) : End of episode, recording summary (progress.csv) if logger is active[0m
[38;20m[ENVIRONMENT] (INFO) : Environment closed. [SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05][0m
[38;20m[WRAPPER NormalizeObservation] (INFO) : Saving normalization calibration data... [SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05][0m


Progress: |***************************************************************************************************| 99%


Finally, we need to upload all Sinergym output as *wandb* artifacts. This output includes all sinergym_output (and LoggerWrapper CSV files) and models generated during training and evaluation episodes.

In [13]:
artifact = wandb.Artifact(
        name="experiment1",
        type="training")
artifact.add_dir(
        env.get_wrapper_attr('workspace_path'),
        name='training_output/')
artifact.add_dir(
    eval_env.get_wrapper_attr('workspace_path'),
    name='evaluation_output/')
run.log_artifact(artifact)

# wandb has finished
run.finish()

[34m[1mwandb[0m: Adding directory to artifact (/workspaces/sinergym/examples/Eplus-env-SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05-res35041)... Done. 0.2s
[34m[1mwandb[0m: Adding directory to artifact (/workspaces/sinergym/examples/Eplus-env-SB3_PPO-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:05_EVALUATION-res35041)... Done. 0.1s


0,1
action_network/Cooling_Setpoint_RL,▅▁▆█████▇▁▁▅▆█▁▁▁█▄█▃▁█▅▃█▁▁▁▁▂▆▅▆▁▅▅▅▃█
action_network/Heating_Setpoint_RL,▁▃▂▂▆▄▅▆▃▆▁▂▁▄▁▁█▁▂▄▇▆█▂▇▇▄▄▇█▄▄▅▁▇███▆▇
action_simulation/Cooling_Setpoint_RL,▅▁▆█████▇▁▁▅▆█▁▁▁█▄█▃▁█▅▃█▁▁▁▁▂▆▅▆▁▅▅▅▃█
action_simulation/Heating_Setpoint_RL,▁▃▂▂▆▄▅▆▃▆▁▂▁▄▁▁█▁▂▄▇▆█▂▇▇▄▄▇█▄▄▅▁▇███▆▇
episode/comfort_violation_time(%),█▃▁
episode/cumulative_abs_comfort_penalty,▁▄█
episode/cumulative_abs_energy_penalty,█▁▁
episode/cumulative_power_demand,▁██
episode/cumulative_reward,█▁▄
episode/cumulative_reward_comfort_term,▁▄█

0,1
action_network/Cooling_Setpoint_RL,-1.0
action_network/Heating_Setpoint_RL,-0.08584
action_simulation/Cooling_Setpoint_RL,23.25
action_simulation/Heating_Setpoint_RL,17.14215
episode/comfort_violation_time(%),15.59142
episode/cumulative_abs_comfort_penalty,-6121.24092
episode/cumulative_abs_energy_penalty,-216526173.95324
episode/cumulative_power_demand,216526173.95324
episode/cumulative_reward,-13886.92969
episode/cumulative_reward_comfort_term,-3060.62046


All experiment results are stored locally, but you can also view the execution in *wandb*:

- When you check your projects, you'll see the execution allocated:

![wandb_projects1](https://github.com/ugr-sail/sinergym/blob/main/images/wandb_projects1.png?raw=true)

- The training experiment's tracked hyperparameters:

![wandb_training_hyperparameters](https://github.com/ugr-sail/sinergym/blob/main/images/wandb_training_hyperparameters.png?raw=true)

- Registered artifacts (if evaluation is enabled, the best model is also registered):

![wandb_training_artifact](https://github.com/ugr-sail/sinergym/blob/main/images/wandb_training_artifact.png?raw=true)

- Real-time visualization of metrics:

![wandb_training_charts](https://github.com/ugr-sail/sinergym/blob/main/images/wandb_training_charts.png?raw=true)

## Loading a model

We'll be using the `load_agent.py` script located in the repository root. This script leverages all the capabilities of Sinergym for working with loaded deep reinforcement learning models and sets parameters for everything, allowing us to easily define load options via a JSON file when executing the script.

For more details on how to run `load_agent.py`, please refer to [Load a trained model](https://ugr-sail.github.io/sinergym/compilation/main/pages/deep-reinforcement-learning.html#load-a-trained-model).

First, we'll define the Sinergym environment ID where we want to test the loaded agent and the name of the evaluation experiment.

In [14]:
# Environment ID
environment = "Eplus-5zone-mixed-continuous-stochastic-v1"
# Episodes
episodes=5
# Evaluation name
evaluation_date = datetime.today().strftime('%Y-%m-%d_%H:%M')
evaluation_name = 'SB3_PPO-EVAL-' + environment + \
    '-episodes-' + str(episodes)
evaluation_name += '_' + evaluation_date

We can also use *wandb* here. We can allocate this evaluation of a loaded model to a different project to avoid merging experiments.

In [15]:

# Create wandb.config object in order to log all experiment params
experiment_params = {
    'sinergym-version': sinergym.__version__,
    'python-version': sys.version
}
experiment_params.update({'environment':environment,
                          'episodes':episodes,
                          'algorithm':'SB3_PPO'})

# Get wandb init params (you have to specify your own project and entity)
wandb_params = {"project": 'sinergym_evaluations',
                "entity": 'alex_ugr'}
# Init wandb entry
run = wandb.init(
    name=experiment_name + '_' + wandb.util.generate_id(),
    config=experiment_params,
    ** wandb_params
)

We'll create the Gym environment, but it's **important to wrap the environment with the same wrappers used during training**. We can use the evaluation experiment name to rename the environment.

**Note**: If you are loading a pre-trained model and using the observation space normalization wrapper, you should use the means and variations calibrated during the training process for a fair evaluation. The next code specifies this aspect, those mean and var values are written in Sinergym training output as txt file automatically if you want to consult it later. You can use the list/numpy array values or set the txt path directly in the field constructor. It is also important to deactivate calibration update during evaluations. Check the documentation on the wrapper for more information.

In [16]:
env = gym.make(environment, env_name=evaluation_name)
env = NormalizeObservation(env, mean = training_mean, var = training_var, automatic_update=False)
env = NormalizeAction(env)
env = LoggerWrapper(env)

[38;20m[ENVIRONMENT] (INFO) : Creating Gymnasium environment... [SB3_PPO-EVAL-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:12][0m
[38;20m[MODELING] (INFO) : Experiment working directory created [/workspaces/sinergym/examples/Eplus-env-SB3_PPO-EVAL-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:12-res35041][0m
[38;20m[MODELING] (INFO) : Model Config is correct.[0m
[38;20m[MODELING] (INFO) : Updated building model with whole Output:Variable available names[0m
[38;20m[MODELING] (INFO) : Updated building model with whole Output:Meter available names[0m
[38;20m[MODELING] (INFO) : runperiod established: {'start_day': 1, 'start_month': 1, 'start_year': 1991, 'end_day': 31, 'end_month': 12, 'end_year': 1991, 'start_weekday': 0, 'n_steps_per_hour': 4}[0m
[38;20m[MODELING] (INFO) : Episode length (seconds): 31536000.0[0m
[38;20m[MODELING] (INFO) : timestep size (seconds): 900.0[0m
[38;20m[MODELING] (INFO) : timesteps per episode: 35040

We'll load the Stable Baselines 3 DQN model from our local computer, but we could also use a remote model stored in *wandb* from another training experiment.

In [18]:
# get wandb artifact path (to load model)
load_artifact_entity = 'alex_ugr'
load_artifact_project = 'sinergym'
load_artifact_name = 'experiment1'
load_artifact_tag = 'latest'
load_artifact_model_path = 'evaluation_output/best_model/model.zip'
wandb_path = load_artifact_entity + '/' + load_artifact_project + \
    '/' + load_artifact_name + ':' + load_artifact_tag
# Download artifact
artifact = run.use_artifact(wandb_path)
artifact.get_path(load_artifact_model_path).download('.')
# Set model path to local wandb file downloaded
model_path = './' + load_artifact_model_path
model = DQN.load(model_path)

As you can see, the *wandb* model we want to load can come from an artifact of a different entity or project than the one we're using to register the evaluation of the loaded model, as long as it's accessible.
The next step is to use the model to predict actions and interact with the environment to collect data for model evaluation.

In [17]:
for i in range(episodes):
    obs, info = env.reset()
    rewards = []
    truncated = terminated = False
    current_month = 0
    while not (terminated or truncated):
        a, _ = model.predict(obs)
        obs, reward, terminated, truncated, info = env.step(a)
        rewards.append(reward)
        if info['month'] != current_month:
            current_month = info['month']
            print(info['month'], sum(rewards))
    print(
        'Episode ',
        i,
        'Mean reward: ',
        np.mean(rewards),
        'Cumulative reward: ',
        sum(rewards))
env.close()

#----------------------------------------------------------------------------------------------#
[38;20m[ENVIRONMENT] (INFO) : Starting a new episode... [SB3_PPO-EVAL-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:12] [Episode 1][0m
#----------------------------------------------------------------------------------------------#
[38;20m[MODELING] (INFO) : Episode directory created [/workspaces/sinergym/examples/Eplus-env-SB3_PPO-EVAL-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:12-res35041/Eplus-env-sub_run1][0m
[38;20m[MODELING] (INFO) : Weather file USA_NY_New.York-J.F.Kennedy.Intl.AP.744860_TMY3.epw used.[0m
[38;20m[MODELING] (INFO) : Adapting weather to building model. [USA_NY_New.York-J.F.Kennedy.Intl.AP.744860_TMY3.epw][0m
[38;20m[ENVIRONMENT] (INFO) : Saving episode output path... [/workspaces/sinergym/examples/Eplus-env-SB3_PPO-EVAL-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:12-res35041/Eplus-env-sub_r

  epw_content = self._headers_to_epw(use_datetimes=use_datetimes) + df.to_csv(


[38;20m[SIMULATOR] (INFO) : handlers initialized.[0m
[38;20m[SIMULATOR] (INFO) : handlers are ready.[0m
[38;20m[SIMULATOR] (INFO) : System is ready.[0m
[38;20m[WRAPPER NormalizeObservation] (INFO) : Saving normalization calibration data... [SB3_PPO-EVAL-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:12][0m
[38;20m[WRAPPER LoggerWrapper] (INFO) : Creating monitor.csv for current episode (episode 1) if logger is active[0m
1 -1.8158328975728835
Progress: |*--------------------------------------------------------------------------------------------------| 1%

  logger.warn(
  gym.logger.warn("Casting input x to numpy array.")


2 -825.6461114436989------------------------------------------------------------------------------------------| 9%
3 -1583.674482071334*******-----------------------------------------------------------------------------------| 16%
4 -2114.65310683112*****************--------------------------------------------------------------------------| 25%
5 -2518.8444798274136***********************------------------------------------------------------------------| 33%
6 -2921.5352530776304*******************************----------------------------------------------------------| 41%
7 -3482.58416838421******************************************-------------------------------------------------| 50%
8 -4116.343134653357************************************************------------------------------------------| 58%
9 -4691.630422285105*********************************************************---------------------------------| 66%
10 -5247.38708141432*****************************************************

  epw_content = self._headers_to_epw(use_datetimes=use_datetimes) + df.to_csv(


[38;20m[SIMULATOR] (INFO) : handlers are ready.[0m
[38;20m[SIMULATOR] (INFO) : System is ready.[0m
[38;20m[WRAPPER NormalizeObservation] (INFO) : Saving normalization calibration data... [SB3_PPO-EVAL-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:12][0m
[38;20m[WRAPPER LoggerWrapper] (INFO) : Creating monitor.csv for current episode (episode 2) if logger is active[0m
1 -1.8579781542716776
Progress: |*--------------------------------------------------------------------------------------------------| 1%

  logger.warn(
  gym.logger.warn("Casting input x to numpy array.")


2 -849.6974282789123------------------------------------------------------------------------------------------| 9%
3 -1617.9671499241917******-----------------------------------------------------------------------------------| 16%
4 -2156.5752483040997***************--------------------------------------------------------------------------| 25%
5 -2552.019842192101************************------------------------------------------------------------------| 33%
6 -2952.356589942441********************************----------------------------------------------------------| 41%
7 -3548.4779550246153****************************************-------------------------------------------------| 50%
8 -4170.818276142834************************************************------------------------------------------| 58%
9 -4726.033888119112*********************************************************---------------------------------| 66%
10 -5269.103372582413****************************************************

  epw_content = self._headers_to_epw(use_datetimes=use_datetimes) + df.to_csv(


[38;20m[SIMULATOR] (INFO) : handlers are ready.[0m
[38;20m[SIMULATOR] (INFO) : System is ready.[0m
[38;20m[WRAPPER NormalizeObservation] (INFO) : Saving normalization calibration data... [SB3_PPO-EVAL-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:12][0m
[38;20m[WRAPPER LoggerWrapper] (INFO) : Creating monitor.csv for current episode (episode 3) if logger is active[0m
1 -1.8406577642818436
Progress: |*--------------------------------------------------------------------------------------------------| 1%

  logger.warn(
  gym.logger.warn("Casting input x to numpy array.")


2 -858.8870751040168------------------------------------------------------------------------------------------| 9%
3 -1640.2572212169603******-----------------------------------------------------------------------------------| 16%
4 -2173.3615273516752***************--------------------------------------------------------------------------| 25%
5 -2567.4942835104775***********************------------------------------------------------------------------| 33%
6 -2968.6280354362952*******************************----------------------------------------------------------| 41%
7 -3538.5972325183766****************************************-------------------------------------------------| 50%
8 -4171.6446925544315***********************************************------------------------------------------| 58%
9 -4735.831685025527*********************************************************---------------------------------| 66%
10 -5284.723659082678****************************************************

  epw_content = self._headers_to_epw(use_datetimes=use_datetimes) + df.to_csv(


[38;20m[SIMULATOR] (INFO) : handlers are ready.[0m
[38;20m[SIMULATOR] (INFO) : System is ready.[0m
[38;20m[WRAPPER NormalizeObservation] (INFO) : Saving normalization calibration data... [SB3_PPO-EVAL-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:12][0m
[38;20m[WRAPPER LoggerWrapper] (INFO) : Creating monitor.csv for current episode (episode 4) if logger is active[0m
1 -1.802620036649992
Progress: |*--------------------------------------------------------------------------------------------------| 1%

  logger.warn(
  gym.logger.warn("Casting input x to numpy array.")


2 -865.0126318855628------------------------------------------------------------------------------------------| 9%
3 -1634.1087758950055******-----------------------------------------------------------------------------------| 16%
4 -2166.249354525793****************--------------------------------------------------------------------------| 25%
5 -2564.899154157984************************------------------------------------------------------------------| 33%
6 -2954.509030649346********************************----------------------------------------------------------| 41%
7 -3534.052090008717*****************************************-------------------------------------------------| 50%
8 -4181.1340755771425***********************************************------------------------------------------| 58%
9 -4753.69884696025**********************************************************---------------------------------| 66%
10 -5287.422183004328****************************************************

  epw_content = self._headers_to_epw(use_datetimes=use_datetimes) + df.to_csv(


[38;20m[SIMULATOR] (INFO) : handlers are ready.[0m
[38;20m[SIMULATOR] (INFO) : System is ready.[0m
[38;20m[WRAPPER NormalizeObservation] (INFO) : Saving normalization calibration data... [SB3_PPO-EVAL-Eplus-5zone-mixed-continuous-stochastic-v1-episodes-5_2024-04-25_14:12][0m
[38;20m[WRAPPER LoggerWrapper] (INFO) : Creating monitor.csv for current episode (episode 5) if logger is active[0m
1 -1.9032154002097952
Progress: |*--------------------------------------------------------------------------------------------------| 1%

  logger.warn(
  gym.logger.warn("Casting input x to numpy array.")


2 -861.1198148661161------------------------------------------------------------------------------------------| 9%
3 -1636.334966684399*******-----------------------------------------------------------------------------------| 16%
4 -2173.085233547737****************--------------------------------------------------------------------------| 25%
5 -2565.576821878193************************------------------------------------------------------------------| 33%
6 -2966.3778237083693*******************************----------------------------------------------------------| 41%
7 -3554.5977618101956****************************************-------------------------------------------------| 50%
8 -4197.099320556758************************************************------------------------------------------| 58%
9 -4779.131939877218*********************************************************---------------------------------| 66%
10 -5329.1488627496865***************************************************

Finally, we'll register the evaluation data in *wandb* as an artifact for preservation.

In [None]:
artifact = wandb.Artifact(
    name="evaluation1",
    type="evaluating")
artifact.add_dir(
    env.experiment_path,
    name='evaluation_output/')

run.log_artifact(artifact)

# wandb has finished
run.finish()

[34m[1mwandb[0m: Adding directory to artifact (/workspaces/sinergym/examples/Eplus-env-SB3_DQN-EVAL-Eplus-5zone-mixed-discrete-stochastic-v1-episodes-5_2023-08-03_13:36-res1)... Done. 0.1s


The results from the loaded model are stored locally, but you can also view the execution in *wandb*:

- When you check the wandb project list, you'll see that the sinergym_evaluations project has a new run:

![wandb_project2](https://github.com/ugr-sail/sinergym/blob/main/images/wandb_project2.png?raw=true)

- The evaluation experiment's tracked hyperparameters, and the previous training artifact used to load the model:

![wandb_evaluating_hyperparameters](https://github.com/ugr-sail/sinergym/blob/main/images/wandb_evaluating_hyperparameters.png?raw=true)

- The registered artifact with Sinergym Output (and CSV files generated with the Logger Wrapper):

![wandb_evaluating_artifact](https://github.com/ugr-sail/sinergym/blob/main/images/wandb_evaluating_artifact.png?raw=true)