# SimEnv Demo

This notebook is intended to show the functionality of the SimEnv which creates an environment for simulation of power-electronic driven microgrids with ad-hoc data generation.
It can be used to to train and test reinforcement learing agents (e.g., from 
https://juliareinforcementlearning.org/).
These agents can learn to handle different control tasks and can be compared to classical control approaches.

The dynmaic bahaviour of the envorinment is simulated using linear state-space systems.
It interacts step-wise with the agent/controller like shown in the figure below.
Based on the input/action `u` at timestep `k` the state `x` is calculated.

  
![](figures/RL_env.png "")


To use the Dare tool the Dare package has to be used:

In [1]:
using Dare

## Simplest initialisation
The easiest way to initialize an environment is as follows:

In [2]:
env = SimEnv(num_sources = 2, num_loads = 1)


******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit https://github.com/coin-or/Ipopt
******************************************************************************

This is Ipopt version 3.14.4, running with linear solver MUMPS 5.4.1.

Number of nonzeros in equality constraint Jacobian...:       64
Number of nonzeros in inequality constraint Jacobian.:       18
Number of nonzeros in Lagrangian Hessian.............:      407

Total number of variables............................:       16
                     variables with only lower bounds:        0
                variables with lower and upper bounds:       16
                     variables with only upper bounds:        0
Total number of equality constraints.................:        6
Total number of inequality co

┌ Info: 3 Current limits set to 1000 A - please define in nc.parameters -> source -> i_limit! What???
└ @ Dare /home/webbah/Dokumente/GIT/dare/src/env.jl:335
┌ Info: 3 Voltage limits set to 1.05*nc.parameters[grid][v_rms] - please define in nc.parameters -> source -> v_limit! Whatt???
└ @ Dare /home/webbah/Dokumente/GIT/dare/src/env.jl:336


# SimEnv

## Traits

| Trait Type        |                                            Value |
|:----------------- | ------------------------------------------------:|
| NumAgentStyle     |          ReinforcementLearningBase.SingleAgent() |
| DynamicStyle      |           ReinforcementLearningBase.Sequential() |
| InformationStyle  | ReinforcementLearningBase.ImperfectInformation() |
| ChanceStyle       |           ReinforcementLearningBase.Stochastic() |
| RewardStyle       |           ReinforcementLearningBase.StepReward() |
| UtilityStyle      |           ReinforcementLearningBase.GeneralSum() |
| ActionStyle       |     ReinforcementLearningBase.MinimalActionSet() |
| StateStyle        |     ReinforcementLearningBase.Observation{Any}() |
| DefaultStateStyle |     ReinforcementLearningBase.Observation{Any}() |

## Is Environment Terminated?

No

## State Space

`ReinforcementLearningBase.Space{Vector{IntervalSets.ClosedInterval{Float64}}}(IntervalSets.ClosedInterval{Float64}[-1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0])`

## Action Space

`ReinforcementLearningBase.Space{Vector{IntervalSets.ClosedInterval{Float64}}}(IntervalSets.ClosedInterval{Float64}[-1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0])`

## Current State

```
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
```


This creates an environment consisting of an electrical power grid with two sources `num_sources = 2` - which could be for example an inverter fed by a PV plant - supplying one load `num_sources = 1`.
An easy exemplary example is shown in the figure below, where a load (an electric car to be charged) is supplied by 2 sources (inverters, fed PV plant and wind turbine) via two transmition lines.

![](figures/ExampleGrid1.png "")

For better visuablilty the exemplary shaded electircal circut in the background is displayed as single phase diagram.
(By default a three-phase four wire system is created).
If it is not defined during the initialization of the env, all parameters (connections between the different sources and loads, parameters of the electric components,...) are drawn randomly, while a few are set to fixed value per default.
One of the latter would be for example the stepsize `ts`. After the initialization a step-wise interaction with the environment is possible. 
As can be seen in the first picture, an action can be selected and the env can be executed with it. 
Based on that action `u_k` and the internal state-space system (defined depending on the electric components - for more information about the odernary differential equation,... see NodeConstructor_DEMO.ipynb) the system is evovled for one timestep and the new states `x_k+1` of the system are calulated.
First, the current state of the environment is checked:


In [3]:
env.state

30-element Vector{Float64}:
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 ⋮
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0

If the state is not zero, but should be in the beginning, the reset method can be used which sets the state to the internally defined `x0` (which consists of zeros per default).
If we do not want to not start from zero, we could set `x0` in the initialisation of the env:

In [4]:
using ReinforcementLearning
env.x0 = 0.1 * ones(length(env.state_space))
reset!(env)
env.state

30-element Vector{Float64}:
 0.1
 0.1
 0.1
 0.1
 0.1
 0.1
 0.1
 0.1
 0.1
 0.1
 ⋮
 0.1
 0.1
 0.1
 0.1
 0.1
 0.1
 0.1
 0.1
 0.1

To interact wiht the env, first it has to be figured out how many actions are needed. Therefore the length of the action space can be checked:




In [5]:
n_a = length(env.action_space)

6

The six action requested by the environment belong to the 2 sources. Since per default the env produces a three-phase system we need one action per phase per source -> 6 actions.
To exite the env by an action the following command can be used:

In [6]:
env([0.2, 0.2, 0.2, 0.3, 0.3, 0.3])

30-element Vector{Float64}:
 -0.13639092171526726
  3.224892495980788e-5
 -3.880694037558475e-12
 -0.36594994587855945
 -2.8379915813107303e-12
  7.775182074065927e-6
 -5.968296088921655e-5
 -4.52327260887284e-5
  0.09020609305217389
  0.10000014785009308
  ⋮
  3.224892495980788e-5
 -3.880718183691352e-12
 -0.36594994587855945
 -2.8379674351778556e-12
  7.775182074065927e-6
 -5.968296088921655e-5
 -4.52327260887284e-5
  0.09020609305217388
  0.10000014785009308

Here, the first source got an action of `0.2` to all three phases, while the secound source got an action of `0.3` to all three phases.
As can be seen, the states have changed from 0.1 to different values.
The get a little bit more intuition about the different states, the state_ids can be investigated:

In [7]:
env.state_ids

30-element Vector{String}:
 "source1_i_L1_a"
 "source1_v_C_filt_a"
 "source1_v_C_cables_a"
 "source2_i_L1_a"
 "source2_v_C_cables_a"
 "cable1_i_L_a"
 "cable2_i_L_a"
 "cable3_i_L_a"
 "load1_v_C_total_a"
 "load1_i_L_a"
 ⋮
 "source1_v_C_filt_c"
 "source1_v_C_cables_c"
 "source2_i_L1_c"
 "source2_v_C_cables_c"
 "cable1_i_L_c"
 "cable2_i_L_c"
 "cable3_i_L_c"
 "load1_v_C_total_c"
 "load1_i_L_c"

The labels define to which source the state belongs and what it is about. 
For example the first state is called `"source1_i_L1_a"`. The tells, it belongs to the first source (in the picture abouve the PV plant) and represents the current `i` through the incductor `L1` of phase `a`.
For example, this information can be used to control the current through the filter inductance (or to learn this control task).

In [8]:
env.state_space

Space{Vector{IntervalSets.ClosedInterval{Float64}}}(IntervalSets.ClosedInterval{Float64}[-1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0  …  -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0])

Since the state space of the env tells, that it ranges from -1.0..1.0, the current through the filter inductor in the example is normalized by the maximal current allowed to flow through the inductor.
If this parameter is not defined it it set per default based on the filter layout happening in the env.

All (technical) parameters needed for the simulation are defined in the parameter dict (For more detailed information see NodeConstructor_DEMO.ipynb).
It can be investigated by:

In [9]:
env.nc.parameters


Dict{Any, Any} with 4 entries:
  "source" => Any[Dict{Any, Any}("L1"=>0.00195161, "C"=>2.69715e-5, "mode"=>"Dr…
  "grid"   => Dict{Any, Any}("f_grid"=>50, "Δfmax"=>0.005, "fs"=>10000.0, "proc…
  "load"   => Any[Dict{Any, Any}("Z"=>1.12003e-13-4.94869e-5im, "C"=>64.322, "L…
  "cable"  => Any[Dict{Any, Any}("Cb"=>5.06381e-11, "Lb"=>5.39443e-7, "Rb"=>0.0…

The limit of the filter inductor current can be found using:

In [10]:
env.nc.parameters["source"][1]["i_limit"]

121.76152238967438

Which returns the current limit (belonging to the inductor) of source one.
The voltage limit for normalization is depending on the filter capacitance.
The same concept holds for the cables and loads (parametrization can be found in the parameter dict, too).

Since the action space it defined in a range -1.0..1.0, the actions are "normalized", too, by the DC-link voltage of the specific source. 
In the simulation the chosen action is multiplied by half of the DC-link voltage (and can be interpreted as modulation index in an electrical engineering context).
The DC-link voltage can be found in (or set via) the parameter dict, too:

In [11]:
env.nc.parameters["source"][1]["vdc"]

800

## Interact with the Env
To interact with the env, the above decribed function (`env(action)`) can be called in a loop and the state logged during this process:

In [12]:
env.x0 = zeros(length(env.state_space))   # set initial values back to zero

# run 3 steps
for _ in 1:3
    env([0.2, 0.2, 0.2, 0.3, 0.3, 0.3])
end

env.state   # print state

30-element Vector{Float64}:
 0.09859605862117994
 0.00010290244997245628
 0.00010555178441328813
 0.10315537667280711
 0.0003260113086840901
 0.24601784060569806
 0.24143832812484953
 0.2245522168461066
 0.0002674243459521656
 0.00010000038428713425
 ⋮
 0.00011296674819405862
 0.00011694223364493259
 0.15216060654613744
 0.00034912717178715765
 0.36806529291830975
 0.3599567047439114
 0.3366486409901768
 0.0002674672687043454
 0.00010000036668461791

The Dare toolbox provides a more enhanced methode to run an experiment with a specific number of steps and even more episodes.
It is based in the `run` command provided by the ReinforcementLeaning.jl (https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/master/src/ReinforcementLearningCore/src/core/run.jl) toolbox and therefore can be used to train RL agents.

To examine this functionality in more detail the example environment is reduced to a setting with 1 source and 1 load as shown in the figure below:

![](figures/ExampleGrid2.png "")

Shown is a 3-phase inverter with an LC filter supplying a load via a cable (model parameters not shown).
To get that specific setting with the correct filter type, the parameter dict is defined in beforehand an handed over to the env.
(The IGBTs are not simulated. Since average models are used, the action is multiplied with half of `vdc`.)

Instead of `num_sorces` and `num_loads`, now the parameter dict and the connectivity matrix CM is used which defines if there is a connection between two nodes (e.g., source <-> load) (-> !=0) or if there is no connection (in that case the entry is `0`). For more information about the CM matrix see NodeConstructor_DEMO.ipynb.

In [13]:
CM = [0. 1.
    -1. 0.]

parameters = Dict{Any, Any}(
        "source" => Any[
                        Dict{Any, Any}("pwr" => 200e3, "control_type" => "classic", "mode" => "Swing", "fltr" => "LC"),
                        ],
        "load"   => Any[
                        Dict{Any, Any}("impedance" => "R", "R" => 2),
                        ],
        "cable"   => Any[
                        Dict{Any, Any}("R" => 1e-3, "L" => 1e-4, "C" => 1e-4, "i_limit" => 10e8,),
                        ],
        "grid" => Dict{Any, Any}("fs"=>1e4, "phase"=>3, "v_rms"=>230, "f_grid" => 50, "ramp_end"=>0.0)
    )


env = SimEnv(CM = CM, parameters = parameters)

┌ Info: 3 Voltage limits set to 1.05*nc.parameters[grid][v_rms] - please define in nc.parameters -> source -> v_limit! Whatt???
└ @ Dare /home/webbah/Dokumente/GIT/dare/src/env.jl:336


# SimEnv

## Traits

| Trait Type        |                  Value |
|:----------------- | ----------------------:|
| NumAgentStyle     |          SingleAgent() |
| DynamicStyle      |           Sequential() |
| InformationStyle  | ImperfectInformation() |
| ChanceStyle       |           Stochastic() |
| RewardStyle       |           StepReward() |
| UtilityStyle      |           GeneralSum() |
| ActionStyle       |     MinimalActionSet() |
| StateStyle        |     Observation{Any}() |
| DefaultStateStyle |     Observation{Any}() |

## Is Environment Terminated?

No

## State Space

`Space{Vector{IntervalSets.ClosedInterval{Float64}}}(IntervalSets.ClosedInterval{Float64}[-1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0, -1.0..1.0])`

## Action Space

`Space{Vector{IntervalSets.ClosedInterval{Float64}}}(IntervalSets.ClosedInterval{Float64}[-1.0..1.0, -1.0..1.0, -1.0..1.0])`

## Current State

```
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
```
