# Reinforcement Learning control strategies for Electric Vehicles fleet Virtual Power Plants
Thesis based on the development of a RL agent that manages a VPP through EVs charging stations in an household environment. Main optimization objectives of the VPP are: Valley filling, peak shaving and zero resulting load over time. Main action performed to reach objectives are: storage of Renewable energy resources and power push in the grid at high demand times. The development of the Virtual Power Plant environment is based on the ELVIS (Electric Vehicles Infrastructure Simulator) open library from DAI-Labor: https://github.com/dailab/elvis The thesis code is currently available at: (https://github.com/francescomaldonato/RL_VPP_Thesis)

Author: Francesco Maldonato

## VPP agent trainer Notebook for the StableBaselines3 model (A2C)

Installing required packages and dependencies

In [None]:
%%capture
!pip install py-elvis==0.2.1
!pip install pyyaml==5.4
!pip install plotly==5.9.0
!pip install -U kaleido==0.2.1

!pip install stable-baselines3[extra]==1.6.1
!pip install stable-baselines==1.6.1
!pip install sb3-contrib==1.6.1
!pip install gym==0.20.0
!pip install -q wandb==0.13.4

In [None]:
#Cloning repository and changing directory
!git clone https://github.com/francescomaldonato/RL_VPP_Thesis.git
%cd RL_VPP_Thesis/
%ls

In [None]:
import yaml
import torch
from VPP_environment import VPPEnv, VPP_Scenario_config
from elvis.config import ScenarioConfig
import os
import wandb
from wandb.integration.sb3 import WandbCallback
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3 import A2C #The available algoritmhs in sb3-contrib for the custom environment with MultiInputPolicy
from sb3_contrib.common.maskable.utils import get_action_masks
import stable_baselines3 as sb3
from stable_baselines3.common.env_checker import check_env
import random

#Check if cuda device is available for training
print("Torch-Cuda available device:", torch.cuda.is_available())
print(sb3.get_system_info())
!wandb --version

Torch-Cuda available device: True
OS: Linux-5.10.133+-x86_64-with-Ubuntu-18.04-bionic #1 SMP Fri Aug 26 08:44:51 UTC 2022
Python: 3.7.14
Stable-Baselines3: 1.6.1
PyTorch: 1.12.1+cu113
GPU Enabled: True
Numpy: 1.21.6
Gym: 0.21.0

({'OS': 'Linux-5.10.133+-x86_64-with-Ubuntu-18.04-bionic #1 SMP Fri Aug 26 08:44:51 UTC 2022', 'Python': '3.7.14', 'Stable-Baselines3': '1.6.1', 'PyTorch': '1.12.1+cu113', 'GPU Enabled': 'True', 'Numpy': '1.21.6', 'Gym': '0.21.0'}, 'OS: Linux-5.10.133+-x86_64-with-Ubuntu-18.04-bionic #1 SMP Fri Aug 26 08:44:51 UTC 2022\nPython: 3.7.14\nStable-Baselines3: 1.6.1\nPyTorch: 1.12.1+cu113\nGPU Enabled: True\nNumpy: 1.21.6\nGym: 0.21.0\n')
wandb, version 0.13.3


In [None]:
# Ensure deterministic behavior
torch.backends.cudnn.deterministic = True
random.seed(0)
torch.manual_seed(0)
torch.cuda.manual_seed_all(0)

In [None]:
#Loading paths for input data
current_folder = ''
VPP_data_input_path = current_folder + 'data/data_training/environment_table/' + 'Environment_data_2019.csv'
elvis_input_folder = current_folder + 'data/config_builder/'

case = 'wohnblock_household_simulation_adaptive.yaml' #(loaded by default, 20 EVs arrivals per week with 50% average battery)

#Try different simulation parameters, uncomment below
#case = 'wohnblock_household_simulation_adaptive_10.yaml' #(10 EVs arrivals per week with 50% average battery) 
#case = 'wohnblock_household_simulation_adaptive_15.yaml' #(15 EVs arrivals per week with 50% average battery)
#case = 'wohnblock_household_simulation_adaptive_25.yaml' #(25 EVs arrivals per week with 50% average battery) 
#case = 'wohnblock_household_simulation_adaptive_30.yaml' #(30 EVs arrivals per week with 50% average battery) 
#case = 'wohnblock_household_simulation_adaptive_35.yaml' #(35 EVs arrivals per week with 50% average battery) 

with open(elvis_input_folder + case, 'r') as file:
    yaml_str = yaml.full_load(file)

elvis_config_file = ScenarioConfig.from_yaml(yaml_str)
VPP_config_file = VPP_Scenario_config(yaml_str)

print(elvis_config_file)
print(VPP_config_file)

Vehicle types: <generator object ScenarioConfig.__str__.<locals>.<genexpr> at 0x7fd94608dbd0>Mean parking time: 23.9
Std deviation of parking time: 1
Mean value of the SOC distribution: 0.5
Std deviation of the SOC distribution: 0.1
Max parking time: 24
Number of charging events per week: 20
Vehicles are disconnected only depending on their parking time
Queue length: 0
Opening hours: None
Scheduling policy: Uncontrolled

{'start_date': '2022-01-01T00:00:00', 'end_date': '2023-01-01T00:00:00', 'resolution': '0:15:00', 'num_households': 4, 'solar_power': 16, 'wind_power': 12, 'EV_types': [{'battery': {'capacity': 100, 'efficiency': 1, 'max_charge_power': 150, 'min_charge_power': 0}, 'brand': 'Tesla', 'model': 'Model S', 'probability': 1}], 'charging_stations_n': 4, 'EVs_n': 20, 'EVs_n_max': 1044, 'mean_park': 23.9, 'std_deviation_park': 1, 'EVs_mean_soc': 50.0, 'EVs_std_deviation_soc': 10.0, 'EV_load_max': 44, 'EV_load_rated': 14.8, 'EV_load_min': 1, 'houseRWload_max': 10, 'av_max_energy

In [None]:
#Environment initialization
env = VPPEnv(VPP_data_input_path, elvis_config_file, VPP_config_file)
env.plot_ELVIS_data()

Charging event: 1, Arrival time: 2022-01-01 12:15:00, Parking_time: 24, Leaving_time: 2022-01-02 12:15:00, SOC: 0.4976989760065436, SOC target: 1.0, Connected car: Tesla, Model S 
 ... 
 Charging event: 1043, Arrival time: 2022-12-31 14:00:00, Parking_time: 24, Leaving_time: 2023-01-01 14:00:00, SOC: 0.5099958004276397, SOC target: 1.0, Connected car: Tesla, Model S 

-DATASET: House&RW_energy_sum=kWh  -34117.7 , over-consume=kWh  1556.25 , under-consume=kWh  -35673.95 , Total_cost=€  -1196.64 , overcost=€  97.86
- ELVIS.Simulation (Av.EV_SOC=  50.0 %):
 Sum_Energy=kWh  7830.72 , over-consume=kWh  31616.34 , under-consume=kWh  23785.62 , Total_cost=€  470.46 , overcost=€  1335.08 , Charging_events=  1043 
- Exp.VPP_goals: Energy_consumed=kWh 0, Av.load=kW 0, Std.load=kW 0, Total_cost=€ 0 , Av.EV_en_left=kWh  84.2


In [None]:
#Function to check custom environment and output additional warnings if needed
check_env(env)
env.plot_reward_functions()

- ELVIS.Simulation (Av.EV_SOC=  50.0 %):
 Sum_Energy=kWh  8940.12 , over-consume=kWh  32460.23 , under-consume=kWh  23520.11 , Total_cost=€  538.97 , overcost=€  1390.81 , Av.EV_en_left=kWh  100.0 , Charging_events=  1043 
- Exp.VPP_goals: Energy_consumed=kWh 0, Av.load=kW 0, Std.load=kW 0, Total_cost=€ 0 , Av.EV_en_left=kWh  84.2
Simulating VPP....


In [None]:
A2C_path = "trained_models/A2C_models/"

#In Colab, uncomment below:
%env "WANDB_DISABLE_CODE" True
%env "WANDB_NOTEBOOK_NAME" "Agent_trainer_notebooks/A2C_VPP_agent_trainer.ipynb"
os.environ['WANDB_NOTEBOOK_NAME'] = 'Agent_trainer_notebooks/A2C_VPP_agent_trainer.ipynb'
wandb.login(relogin=True)

#In local notebook, uncomment below:
#your_wandb_login_code = 0123456789abcdefghijklmnopqrstwxyzàèìòù0 #example length
#!wandb login {your_wandb_login_code}



env: "WANDB_DISABLE_CODE"=True
env: "WANDB_NOTEBOOK_NAME"="A2C_VPP_agent_trainer.ipynb"


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

In [None]:
#wandb model configuration
config = {
    "policy_type": "MultiInputPolicy",
    "n_steps": 8760,
    "batch_size": 8760,
    "total_timesteps": 1000000,
    "learning_rate": 0.0007145030954379823,
    "gamma": 0.9159078953021682,
    "gae_lambda": 0.8,
    #"clip_range": 0.4,
    "ent_coef": 1.5005326968113368e-7,
    "vf_coef": 0.011059086790668691,
    "ortho_init": True,
    "activation_fn": torch.nn.modules.activation.Tanh,
    "optimizer_class": torch.optim.RMSprop,
    "net_arch": [64, dict(pi=[256, 256], vf=[256, 256])],
    "use_rms_prop": False,
    "normalize_advantage": True,
            #"values":  [True, False]
    "max_grad_norm": 0.7,
}

#wandb.tensorboard.patch(root_logdir="log_path")
run = wandb.init(
    project="RL_VPP_Thesis",
    #entity="user_avocado",
    config=config,
    sync_tensorboard=True,  # auto-upload sb3's tensorboard metrics
    monitor_gym=False,  # auto-upload the videos of agents playing the game
    save_code=False # optional
)

In [None]:
#ENVIRONMENT WRAPPING
X_env = Monitor(env)
#Vectorized environment wrapper
X_env = DummyVecEnv([lambda: X_env])

#Sync custom tensorboard patch
#wandb.tensorboard.patch(root_logdir=wandb.run.dir, pytorch=True)
tensorboard_log_path = "wandb/tensorboard_log/"

#model = A2C(config["policy_type"], X_env, verbose=1)
policy_kwargs =  dict(
            ortho_init = config["ortho_init"],
            net_arch = config["net_arch"],
            activation_fn = config["activation_fn"],
            #optimizer_class = config["optimizer_class"]
        )

#model definition
model = A2C(config["policy_type"], X_env,
                    learning_rate = config["learning_rate"],
                    n_steps = config["n_steps"],
                    #batch_size = batch_size,
                    #n_epochs = config["n_epochs"],
                    gamma = config["gamma"],
                    gae_lambda = config["gae_lambda"],
                    #clip_range = config["clip_range"],
                    ent_coef = config["ent_coef"],
                    vf_coef = config["vf_coef"],
                    normalize_advantage = config["normalize_advantage"],
                    max_grad_norm = config["max_grad_norm"],
                    use_rms_prop = config["use_rms_prop"],
                    #create_eval_env = False,
                    policy_kwargs = policy_kwargs,
                    verbose=0,
                    #tensorboard_log= os.path.join(tensorboard_log_path,f'A2C_{run.id}_1')
                    tensorboard_log = tensorboard_log_path
                    )

#wandb.watch(model)

In [None]:
#%%time

model.learn(total_timesteps=config["total_timesteps"],
    tb_log_name='A2C',
    callback=WandbCallback(
        gradient_save_freq=10000,
        #model_save_path=f"trained_models/{run.id}",
        verbose=1)
    )

- ELVIS.Simulation (Av.EV_SOC=  50.0 %):
 Sum_Energy=kWh  7561.53 , over-consume=kWh  31275.28 , under-consume=kWh  23713.75 , Total_cost=€  425.01 , overcost=€  1297.81 , Av.EV_en_left=kWh  100.0 , Charging_events=  1043 
- Exp.VPP_goals: Energy_consumed=kWh 0, Av.load=kW 0, Std.load=kW 0, Total_cost=€ 0 , Av.EV_en_left=kWh  84.2
Simulating VPP....
- VPP.Simulation results
 LOAD_INFO: Sum_Energy=KWh  -20206.79 , over-consume=KWh  3841.14 , under-consume=KWh  24047.93 , Total_cost=€  -713.41 , Overcost=€  164.34 
 EV_INFO: Av.EV_energy_leaving=kWh  63.5 , Std.EV_energy_leaving=kWh  13.67 , EV_departures =  1039 , EV_queue_left =  0
SCORE:  Cumulative_reward= 93896.59 - Step_rewars (load_t= 84677.03, EVs_energy_t= 21905.39)
 - Final_rewards (EVs_energy= 5784.83, Overconsume= -4196.62, Underconsume= -15140.92, Overcost= 866.87)
- ELVIS.Simulation (Av.EV_SOC=  50.0 %):
 Sum_Energy=kWh  8524.32 , over-consume=kWh  32970.09 , under-consume=kWh  24445.77 , Total_cost=€  511.58 , overcost=€  

<stable_baselines3.a2c.a2c.A2C at 0x7fd92eeb0050>

In [None]:
!wandb sync wandb/tensorboard_log/A2C_{run.id}_1
#wandb.save(f"model.{run.id}")
model.save(current_folder + A2C_path + f"model_A2C_{run.id}")
model.save(os.path.join(wandb.run.dir, f"model_A2C_{run.id}"))
wandb.save(f"model_A2C_{run.id}")
#wandb.save(f'wandb/tensorboard_log/A2C_{run.id}_1')

[]

In [None]:
#EVALUATION of the trained model
cumulative_reward, std_reward = evaluate_policy(model, X_env, n_eval_episodes=1, render=False)
print(f"Average reward: {cumulative_reward}, St.dev: {std_reward}")

- ELVIS.Simulation (Av.EV_SOC=  50.0 %):
 Sum_Energy=kWh  9237.92 , over-consume=kWh  32394.11 , under-consume=kWh  23156.18 , Total_cost=€  480.7 , overcost=€  1335.54 , Av.EV_en_left=kWh  100.0 , Charging_events=  1043 
- Exp.VPP_goals: Energy_consumed=kWh 0, Av.load=kW 0, Std.load=kW 0, Total_cost=€ 0 , Av.EV_en_left=kWh  84.2
Simulating VPP....
- VPP.Simulation results
 LOAD_INFO: Sum_Energy=KWh  -16280.14 , over-consume=KWh  2717.55 , under-consume=KWh  18997.7 , Total_cost=€  -554.68 , Overcost=€  116.55 
 EV_INFO: Av.EV_energy_leaving=kWh  66.62 , Std.EV_energy_leaving=kWh  33.67 , EV_departures =  1043 , EV_queue_left =  0
SCORE:  Cumulative_reward= 147378.99 - Step_rewars (load_t= 176281.09, EVs_energy_t= -27520.82)
 - Final_rewards (EVs_energy= 9439.78, Overconsume= -2607.77, Underconsume= -11632.79, Overcost= 3419.51)
- ELVIS.Simulation (Av.EV_SOC=  50.0 %):
 Sum_Energy=kWh  9201.06 , over-consume=kWh  32987.92 , under-consume=kWh  23786.86 , Total_cost=€  538.78 , overcost=

## VPP Simulation testing the trained model

In [None]:
#TEST Model
episodes = 1
for episode in range(1, episodes+1):
    obs = env.reset()
    done = False
    score = 0
    # cell and hidden state of the LSTM
    lstm_states = None
    num_envs = 1
    # Episode start signals are used to reset the lstm states
    episode_starts = [True]
    while not done:
        # Retrieve current action mask
        action_masks = get_action_masks(env)
        action, lstm_states = model.predict(obs, state=lstm_states, episode_start=episode_starts, deterministic = True) #Now using our trained model with deterministic prediction [should improve performances]
        #env.lstm_state = lstm_states
        obs, reward, done, info = env.step(action)
        episode_starts = done
        score+=reward
    print('Episode:{} Score:{}'.format(episode, score))

VPP_table = env.VPP_table

- ELVIS.Simulation (Av.EV_SOC=  50.0 %):
 Sum_Energy=kWh  8242.33 , over-consume=kWh  31589.29 , under-consume=kWh  23346.96 , Total_cost=€  462.62 , overcost=€  1321.56 , Av.EV_en_left=kWh  100.0 , Charging_events=  1043 
- Exp.VPP_goals: Energy_consumed=kWh 0, Av.load=kW 0, Std.load=kW 0, Total_cost=€ 0 , Av.EV_en_left=kWh  84.2
Simulating VPP....
- VPP.Simulation results
 LOAD_INFO: Sum_Energy=KWh  -16884.7 , over-consume=KWh  2548.87 , under-consume=KWh  19433.57 , Total_cost=€  -566.78 , Overcost=€  115.35 
 EV_INFO: Av.EV_energy_leaving=kWh  66.68 , Std.EV_energy_leaving=kWh  33.15 , EV_departures =  1044 , EV_queue_left =  0
SCORE:  Cumulative_reward= 147207.64 - Step_rewars (load_t= 173476.33, EVs_energy_t= -24855.37)
 - Final_rewards (EVs_energy= 9504.46, Overconsume= -2477.06, Underconsume= -11857.15, Overcost= 3416.42)
Episode:1 Score:147207.6366034435


In [None]:
env.plot_VPP_energies()

Output hidden; open in https://colab.research.google.com to view.

In [None]:
env.plot_Elvis_results()

Output hidden; open in https://colab.research.google.com to view.

In [None]:
env.plot_VPP_results()

Output hidden; open in https://colab.research.google.com to view.

In [None]:
env.plot_VPP_supply_demand()

In [None]:
env.plot_rewards_stats()

In [None]:
env.plot_rewards_results()

Output hidden; open in https://colab.research.google.com to view.

In [None]:
env.plot_VPP_Elvis_comparison()

In [None]:
env.plot_actions_kpi()

In [None]:
env.plot_EVs_kpi()

In [None]:
env.plot_load_kpi()

In [None]:
env.plot_EVs_kpi()

In [None]:
env.plot_yearly_load_log()

Output hidden; open in https://colab.research.google.com to view.

In [None]:
#plot_VPP_input_data = env.plot_VPP_input_data()
#plot_VPP_input_data.show()

In [None]:
#implement loading VPP_results_plot as artifact

env.close()
run.finish()
wandb.finish()