# Visual data exploration and extraction

The rest of the process, ie visual data exploration, data extraction and formatting for the paper can be reproduced thanks to this notebook.

We don't provide the part on training and testing in the reproducibility package due to the size of the data and the computation budget it requires to rerun the experiments. All the results of these experiments are in the file `aggregated_all.csv` (that must be extracted from `data/aggregated_all.7z`).

## Import the packages

In [None]:
import make_hiplot as mh
%matplotlib inline

## Unzip the aggregated_all.csv

In [None]:
from pathlib import Path
import py7zr
p = Path("tmp")
p.mkdir(exist_ok=True)
with py7zr.SevenZipFile("data/aggregated_all.7z", 'r') as archive:
    archive.extractall(path="tmp")
source = Path("tmp/aggregated_all.csv")
destination = Path("aggregated_all.csv")
source.replace(destination)
p.rmdir()

## Create the directories for the results

In [None]:
from pathlib import Path
p1 = Path("network_selections/")
p2 = Path("generated_tables/")
p3 = Path("generated_figures/")
for p in [p1,p2,p3]:
    p.mkdir(exist_ok=True)

## Create the dataframe mean_aggregated_all.csv and the html mean_by_checkpoint.html

In [None]:
mh.create_hiplot()

Open the generated html file "mean_by_checkpoint.html" with web browser (Chrome is prefered). The columns are separated into two types: 
* the parameters used on the left : ["nb_neurons", "nb_layers", "algo", "pid_rates", "thrust", "p", "training_windgust_magnitude_max", "test_windgust_magnitude_max", "training_saturation_motor", "test_saturation_motor", "nof_training_iterations"] 
* the performances associated on the right : ["OK rising t.", "OK off.", "OK overshoot", "avg rising t.", "avg off.", "avg overshoot", "max rising t.", "max off.", "max overshoot"]. 
*****
Here is an explanation for how to use the parameters in the hiplot:
* "algo" represents the algorithm used: can be "sac", "ddpg", "ppo" or "td3", however in order to take only sac, you also need to set the pid_rates to "None".
* "p" is the boolean that defines the presence of "p" in the observable states.
* "thrust" is the boolean that defines the presence of "thrust" in the observable states.
* Here are the different observation spaces available:
     * for a 3D observation_space : {"p": False, "thrust": False}
     * for a 6D observation_space : {"p": True, "thrust": False}
     * for a 7D observation_space : {"p": True, "thrust": True}
* "pid_rates" is the pid used, it can take different values:
     * "None" for any RL algo,
     * "pid_rates_original" for pid1, 
     * "pid_rates_better" for pid2.
* Both for training and test parameters:
     * the "windgust_magnitude_max" should be set to 10 for windgust mode and 1 for nominal mode (1 is the default value but the windgust is not taken into account at all in the nominal mode)
     * the "saturaion" should be set to 0.8 in saturation mode and 1 in nominal mode
* "nof_training_iterations" corresponds to the number of iterations the RL controller got before being evaluated.
***
For the performances, there are three types of metrics:
* rising t. is the rising time
    * OK rising t. corresponds to the percentage of times the signal reaches the tube of 5 % around the query
    * avg rising t. is the average time it took to reach the tube whenever it succeeded to reach it
    * max rising t. is the maximum time it took to reach the tube whenever it succeeded to reach it
* off. is the offset
    * OK off. is the percentage of times it doesn't go beyond the tube of 10% around the query after a stabilisation time.
    * avg off. is the average of the maximum differences between the signal and the query after a stabilisation time.
    * max off. is the maximum of the maximum differences between the signal and the query after a stabilisation time.
* overshoot is the overshoot
    * OK overshoot is the percentage of times it doesn't go beyond the tube of 10% around the query before a stabilisation time.
    * avg overshoot is the average of the maximum differences between the signal and the query before a stabilisation time.
    * max overshoot is the maximum of the maximum differences between the signal and the query before a stabilisation time.
    

## Import the code to generate the tables and the figures

In [None]:
import generation_tables as gt
import gen_nn_sel as gns
import generation_figures as gf

## Network selections
The name of each function is the same as the one of the file generated

In [None]:
gns.architecture_sac_perfo()
gns.ddpg_best()
gns.ppo_best()
gns.td3_best()
gns.sac_best()
gns.pid_best()
gns.ddpg_sac_saturation_test_saturation()
gns.ddpg_sac_windgust_test_nominal()
gns.ddpg_sac_windgust_test_windgust()
gns.ddpg_saturation_test_nominal()
gns.nominal_test_saturation()
gns.nominal_test_windgust()
gns.pid_test_saturation()
gns.pid_test_windgust()
gns.sac_3D()
gns.sac_6D()
gns.sac_7D()
gns.sac_32_32()
gns.sac_saturation_test_nominal()




## Generation of the tables

In [None]:
gt.table1()
gt.table2()
gt.table3()
gt.table4()
gt.table5()
gt.table5()
gt.table6()
gt.table7()

## Generation of the figures

In [None]:
gf.figure3()
gf.figure4()
gf.figure5()
gf.figure6()