It is strongly suggested that this tutorial is run in its own environment (e.g. conda or pyenv), as it will require dependencies not required by the rest of ACN-Portal.

If running in a new enviroment, such as Google Colab, run this first.

In [1]:
# !git clone https://github.com/zach401/acnportal.git
# !pip install acnportal/.[gym]
# !pip install stable-baselines

# ACN-Sim Tutorial: Lesson X
## Training and Running a Reinforcement Learning Agent on ACN-Sim
### by Sunash Sharma
#### Last updated: 02/26/2020

In this lesson we will learn how to train a reinforcement learning (RL) agent and run it using OpenAI Gym environments that wrap ACN-Sim. For this example we will be using the stable-baselines proximal policy optimization (PPO2) algorithm. As such, running this tutorial requires the stable-baselines package.

For this lesson, we will use a simple example. Imagine we have a single charger, with EV's plugging in over a set duration. Each EV has a random arrival and departure time, requesting an amount of energy that equates to a laxity of $d/2$, where $d$ is staying duration. (i.e. we may charge at half the maximum rate for the entire staying time and deliver all the energy requested). First, let's make some functions to generate Simulation instances that simulate this scenario. We'll start by defining a function which generates random plugins for a single EVSE.

In [2]:
import random
from acnportal.acnsim import models
from acnportal.acnsim import events

def random_plugin(num, time_limit, evse, laxity_ratio=1/2,
                  max_rate=32, voltage=208, period=1):
    """ Returns a list of num random plugin events occurring anytime from time 0
    to time_limit. Each plugin has a random arrival and departure under the time
    limit, and a satisfiable requested energy assuming no other cars plugged in.
    Each EV has initial laxity equal to half the staying duration unless
    otherwise specified.
    
    The plugins occur for a single EVSE, whose maximal rate and voltage are
    assumed to be 32 A and  208 V, respectively, unless otherwise specified.
    
    Args: TODO
    Return: TODO
    """
    out_event_lst = [None] * num
    times = []
    i = 0
    while i < 2 * num:
        rnum = random.randint(0, time_limit)
        if rnum not in times:
            times.append(rnum)
            i += 1
    times = sorted(times)
    battery = models.Battery(100, 0, 100)
    for i in range(num):
        arrival_time = times[2*i]
        departure_time = times[2*i+1]
        requested_energy = (
            (departure_time - arrival_time) / (60 / period) 
            * max_rate * voltage / (1 / laxity_ratio)
        )
        ev = models.EV(arrival_time, departure_time, requested_energy,
                       evse, f'rs-{evse}-{i}', battery)
        out_event_lst[i] = events.PluginEvent(arrival_time, ev)
    return out_event_lst

Since the above event generation is stochastic, we'll want to completely rebuild the simulation each time the environment is reset, so that the next simulation has a new event queue. As such, we will define a simulation generating function.

In [3]:
import pytz
from acnportal import acnsim
from datetime import datetime
from acnportal import algorithms
from  copy import deepcopy

def interface_generating_function():
    """
    Initializes a simulation with random events on a 1 phase, 1
    constraint ACN (simple_acn), with 1 EVSE
    """
    timezone = pytz.timezone('America/Los_Angeles')
    start = timezone.localize(datetime(2018, 9, 5))
    period = 1

    # Make random event queue
    cn = acnsim.sites.simple_acn(['EVSE-001'],
                                 aggregate_cap=20*208/1000)
    event_list = []
    for station_id in cn.station_ids:
        event_list.extend(random_plugin(10, 100, station_id))
    event_queue = events.EventQueue(event_list)

    # For training, this algorithm isn't run. So, we need not provide any
    # arguments.
    schrl = algorithms.GymAlgorithm()

    # Simulation to be wrapped
    _ = acnsim.Simulator(deepcopy(cn), schrl, deepcopy(event_queue),
                         start, period=period, verbose=False)
    return schrl.interface

ACN-Sim gym environments wrap an interface to an ACN-Sim Simulation. These environments allow for customizable observations, reward functions, and actions through the CustomSimEnv class, and for rebuilding through the RebuildingSimEnv class (the RebuildingSimEnv class extends the CustomSimEnv class, and so has all the customizability of the latter). As an example, let's make a rebuilding simulation environment with the following characteristics:

- Observations:
    - Arrival times of all currently plugged-in EVs.
    - Departure times of all currently plugged-in EVs.
    - Remaining demand of all currently plugged-in EVs.
    - Constraint matrix of the network.
    - Limiting constraint magnitudes of the network.
    - Current timestep of the simulation
- Action:
    - A zero-centered array of pilot signals. A 0 entry in the array corresponds to a charging rate of 16 A.
- Rewards:
    - A negative reward for each amp of violation of individual EVSE constraints.
    - A negative reward for each amp of pilot signal delivered to an EVSE with no EV plugged in.
    - A negative reward for each amp of network constraint violation.
    - A positive charging reward for each amp of charge delivered if the above penalties are all 0.

The observations, actions, and rewards listed here are all already encoded in the `gym_acnsim` package; see the package documentation for more details. Broadly, each observation object has space and observation generating functions. Each action is an object with space and schedule generating functions. Each reward is a function of the environment, outputting a number. The environment described here is generated by the make_rebuilding_default_env function from the gym_acnsim object; see the code there for more details. The `gym_acnsim` package provides `'default-rebuilding-acnsim-v0'`, a registered gym environment that provides this functionality. To make this environment, we need to input as a `kwarg` the `sim_gen_func` we defined earlier.

In [4]:
from stable_baselines import PPO2
from stable_baselines.common.vec_env import DummyVecEnv
from acnportal.acnsim import gym_acnsim
import gym
from gym.wrappers import FlattenObservation

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



In [5]:
env = DummyVecEnv([lambda: FlattenObservation(gym.make('default-rebuilding-acnsim-v0', interface_generating_function=interface_generating_function))])
model = PPO2('MlpPolicy', env, verbose=2).learn(10000)





Instructions for updating:
Use keras.layers.flatten instead.
Instructions for updating:
Please use `layer.__call__` method instead.




Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where







TypeError: float() argument must be a string or a number, not 'Box'