# 1 - Startup 

Code repo: https://github.com/tuanpham96/DeepLearningSystems-Fall2021/tree/main/HW2

Source codes are in `src`, output files in `output` where `_conf.yml` files indicate experiment configurations/variations while `_stat.csv` files are the different metrics (accuracy, loss, running time) during training. 

Experiments were run separately from their visualization. 

Please refer to the latter for the discussion of the results.

Consulted sources from:

- [HW 2 colab notebook](https://colab.research.google.com/drive/1kyFRtM70oZ28ERx4ey5Qd4aLw_AR7gRx?usp=sharing)

## 1.1. Initialization

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
%cd "/content/drive/MyDrive/Courses/Fall 2021/dlsys/DeepLearningSystems-Fall2021/HW2"

/content/drive/MyDrive/Courses/Fall 2021/dlsys/DeepLearningSystems-Fall2021/HW2


## 1.2. Download CIFAR10 data 



Download CIFAR10 & CIFAR10.1 v6

Run only once

In [None]:
%%bash

if [ -f 'data/cifar-10-binary.tar.gz' ]; then
    ls data
    exit 1
fi

cd data
wget "https://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz" -O "cifar-10-binary.tar.gz"
wget "https://github.com/modestyachts/CIFAR-10.1/raw/master/datasets/cifar10.1_v6_data.npy" -O "cifar10.1_v6_data.npy"
wget "https://github.com/modestyachts/CIFAR-10.1/raw/master/datasets/cifar10.1_v6_labels.npy" -O "cifar10.1_v6_labels.npy"
tar -xvzf "cifar-10-binary.tar.gz"
cd ..


cifar10.1_v6_data.npy
cifar10.1_v6_labels.npy
cifar-10-batches-bin
cifar-10-binary.tar.gz
test1-1_stat.csv


## 1.3. Requirements 

In [None]:
# Only need to run if in environment that doesn't have pytorch or plotly

# pytorch packages setup: 
# !pip3 install --no-cache-dir torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html

# plotly for plotting 
# !pip install plotly

# 2 - Experiments

## 2.0. Initialization

In [None]:
import os, time, glob
import pandas as pd
import numpy as np
from tqdm.notebook import tqdm

from src.experiment_routines import *

In [None]:
# common variables
BATCH_SIZE = 64
EPOCHS = 15
SEED = 3456

In [None]:
data_loaders = get_data_loaders(batch_size = BATCH_SIZE)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

In [None]:
mod_arch_conf = ModelArchConfig()
mod_arch_conf.print_opts()

{'activation_functions': ['sigmoid', 'tanh', 'relu', 'elu', 'lrelu', 'silu'],
 'batchnorm_layer_locs': ['none', 'conv', 'fc', 'all'],
 'batchnorm_locs_rel_to_act': ['before', 'after'],
 'skip_connections': [0, 1, 2]}


In [None]:
optimsched_conf = OptimAndSched()
optimsched_conf.print_opts()

{'optimizers': ['sgd', 'adam'],
 'schedulers': ['const', 'step', 'exp', 'reduce_on_plateau', 'anneal']}


## 2.1. Experiment 1: Activation functions

In [None]:
exp_config = dict(
    name            = 'exp1-actfun',
    param   = dict(
        act         = mod_arch_conf.act_opts,
    ),
    const = dict( 
        skip        = 0, 
        bn_layer    = 'none',
        bn_act      = 'before',
        optim       = 'adam', 
        sched       = 'const',
        lr          = 1e-3, 
        num_epoch   = EPOCHS,
        seed        = SEED
    )
)

In [None]:
run_experiment(exp_config, device, data_loaders, out_path='output')

Main:   0%|                                                                                                   …

ID=0:   0%|                                                                                                   …

  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)


exp1-actfun-0 (vary: sigmoid) 	 || EPOCH: 01/15 | iter = 47.6s | total = 0.79m || TRAIN: acc = 19.4 | loss = 2.243 | time = 44.1s || VALID: acc = 27.4 | loss = 2.173 | time = 3.4s || 
exp1-actfun-0 (vary: sigmoid) 	 || EPOCH: 02/15 | iter = 47.1s | total = 1.58m || TRAIN: acc = 27.1 | loss = 2.177 | time = 43.7s || VALID: acc = 29.6 | loss = 2.152 | time = 3.4s || 
exp1-actfun-0 (vary: sigmoid) 	 || EPOCH: 03/15 | iter = 46.8s | total = 2.36m || TRAIN: acc = 31.2 | loss = 2.139 | time = 43.3s || VALID: acc = 35.3 | loss = 2.101 | time = 3.5s || 
exp1-actfun-0 (vary: sigmoid) 	 || EPOCH: 04/15 | iter = 46.9s | total = 3.14m || TRAIN: acc = 34.3 | loss = 2.111 | time = 43.5s || VALID: acc = 36.8 | loss = 2.088 | time = 3.5s || 
exp1-actfun-0 (vary: sigmoid) 	 || EPOCH: 05/15 | iter = 46.7s | total = 3.92m || TRAIN: acc = 36.3 | loss = 2.092 | time = 43.2s || VALID: acc = 39.9 | loss = 2.060 | time = 3.5s || 
exp1-actfun-0 (vary: sigmoid) 	 || EPOCH: 06/15 | iter = 46.7s | total = 4.70m |

ID=1:   0%|                                                                                                   …

exp1-actfun-1 (vary: tanh) 	 || EPOCH: 01/15 | iter = 48.0s | total = 0.80m || TRAIN: acc = 39.3 | loss = 2.064 | time = 44.5s || VALID: acc = 50.5 | loss = 1.955 | time = 3.5s || 
exp1-actfun-1 (vary: tanh) 	 || EPOCH: 02/15 | iter = 47.8s | total = 1.60m || TRAIN: acc = 51.7 | loss = 1.941 | time = 44.4s || VALID: acc = 55.9 | loss = 1.898 | time = 3.4s || 
exp1-actfun-1 (vary: tanh) 	 || EPOCH: 03/15 | iter = 47.7s | total = 2.39m || TRAIN: acc = 56.0 | loss = 1.900 | time = 44.2s || VALID: acc = 60.1 | loss = 1.858 | time = 3.5s || 
exp1-actfun-1 (vary: tanh) 	 || EPOCH: 04/15 | iter = 47.7s | total = 3.19m || TRAIN: acc = 58.2 | loss = 1.877 | time = 44.3s || VALID: acc = 61.7 | loss = 1.843 | time = 3.4s || 
exp1-actfun-1 (vary: tanh) 	 || EPOCH: 05/15 | iter = 47.7s | total = 3.98m || TRAIN: acc = 60.0 | loss = 1.858 | time = 44.2s || VALID: acc = 61.3 | loss = 1.845 | time = 3.4s || 
exp1-actfun-1 (vary: tanh) 	 || EPOCH: 06/15 | iter = 48.1s | total = 4.78m || TRAIN: acc = 61.

ID=2:   0%|                                                                                                   …

exp1-actfun-2 (vary: relu) 	 || EPOCH: 01/15 | iter = 48.0s | total = 0.80m || TRAIN: acc = 34.0 | loss = 2.112 | time = 44.6s || VALID: acc = 45.3 | loss = 2.004 | time = 3.5s || 
exp1-actfun-2 (vary: relu) 	 || EPOCH: 02/15 | iter = 47.9s | total = 1.60m || TRAIN: acc = 46.3 | loss = 1.995 | time = 44.4s || VALID: acc = 48.3 | loss = 1.975 | time = 3.5s || 
exp1-actfun-2 (vary: relu) 	 || EPOCH: 03/15 | iter = 48.0s | total = 2.40m || TRAIN: acc = 50.5 | loss = 1.952 | time = 44.4s || VALID: acc = 55.1 | loss = 1.907 | time = 3.5s || 
exp1-actfun-2 (vary: relu) 	 || EPOCH: 04/15 | iter = 48.3s | total = 3.20m || TRAIN: acc = 54.0 | loss = 1.918 | time = 44.7s || VALID: acc = 56.6 | loss = 1.890 | time = 3.5s || 
exp1-actfun-2 (vary: relu) 	 || EPOCH: 05/15 | iter = 48.4s | total = 4.01m || TRAIN: acc = 56.3 | loss = 1.896 | time = 44.8s || VALID: acc = 59.2 | loss = 1.866 | time = 3.6s || 
exp1-actfun-2 (vary: relu) 	 || EPOCH: 06/15 | iter = 48.3s | total = 4.81m || TRAIN: acc = 57.

ID=3:   0%|                                                                                                   …

exp1-actfun-3 (vary: elu) 	 || EPOCH: 01/15 | iter = 48.5s | total = 0.81m || TRAIN: acc = 38.8 | loss = 2.067 | time = 45.0s || VALID: acc = 49.8 | loss = 1.959 | time = 3.5s || 
exp1-actfun-3 (vary: elu) 	 || EPOCH: 02/15 | iter = 48.5s | total = 1.62m || TRAIN: acc = 50.0 | loss = 1.957 | time = 45.1s || VALID: acc = 54.0 | loss = 1.915 | time = 3.4s || 
exp1-actfun-3 (vary: elu) 	 || EPOCH: 03/15 | iter = 48.3s | total = 2.42m || TRAIN: acc = 54.0 | loss = 1.918 | time = 44.9s || VALID: acc = 58.3 | loss = 1.876 | time = 3.4s || 
exp1-actfun-3 (vary: elu) 	 || EPOCH: 04/15 | iter = 48.3s | total = 3.23m || TRAIN: acc = 56.4 | loss = 1.893 | time = 44.7s || VALID: acc = 58.0 | loss = 1.879 | time = 3.6s || 
exp1-actfun-3 (vary: elu) 	 || EPOCH: 05/15 | iter = 48.7s | total = 4.04m || TRAIN: acc = 57.6 | loss = 1.881 | time = 45.1s || VALID: acc = 61.3 | loss = 1.845 | time = 3.6s || 
exp1-actfun-3 (vary: elu) 	 || EPOCH: 06/15 | iter = 48.8s | total = 4.85m || TRAIN: acc = 59.5 | lo

ID=4:   0%|                                                                                                   …

exp1-actfun-4 (vary: lrelu) 	 || EPOCH: 01/15 | iter = 48.7s | total = 0.81m || TRAIN: acc = 34.4 | loss = 2.110 | time = 45.1s || VALID: acc = 44.9 | loss = 2.005 | time = 3.6s || 
exp1-actfun-4 (vary: lrelu) 	 || EPOCH: 02/15 | iter = 48.1s | total = 1.61m || TRAIN: acc = 46.5 | loss = 1.991 | time = 44.6s || VALID: acc = 50.7 | loss = 1.952 | time = 3.5s || 
exp1-actfun-4 (vary: lrelu) 	 || EPOCH: 03/15 | iter = 48.7s | total = 2.43m || TRAIN: acc = 50.7 | loss = 1.951 | time = 45.2s || VALID: acc = 53.0 | loss = 1.928 | time = 3.6s || 
exp1-actfun-4 (vary: lrelu) 	 || EPOCH: 04/15 | iter = 48.4s | total = 3.23m || TRAIN: acc = 54.4 | loss = 1.915 | time = 44.8s || VALID: acc = 56.9 | loss = 1.888 | time = 3.6s || 
exp1-actfun-4 (vary: lrelu) 	 || EPOCH: 05/15 | iter = 48.4s | total = 4.04m || TRAIN: acc = 56.3 | loss = 1.895 | time = 44.8s || VALID: acc = 59.5 | loss = 1.862 | time = 3.6s || 
exp1-actfun-4 (vary: lrelu) 	 || EPOCH: 06/15 | iter = 48.8s | total = 4.85m || TRAIN: acc

ID=5:   0%|                                                                                                   …

exp1-actfun-5 (vary: silu) 	 || EPOCH: 01/15 | iter = 48.4s | total = 0.81m || TRAIN: acc = 36.2 | loss = 2.094 | time = 44.9s || VALID: acc = 45.4 | loss = 2.002 | time = 3.6s || 
exp1-actfun-5 (vary: silu) 	 || EPOCH: 02/15 | iter = 48.5s | total = 1.62m || TRAIN: acc = 48.1 | loss = 1.978 | time = 44.9s || VALID: acc = 53.8 | loss = 1.923 | time = 3.6s || 
exp1-actfun-5 (vary: silu) 	 || EPOCH: 03/15 | iter = 49.1s | total = 2.43m || TRAIN: acc = 53.0 | loss = 1.929 | time = 45.5s || VALID: acc = 56.5 | loss = 1.892 | time = 3.6s || 
exp1-actfun-5 (vary: silu) 	 || EPOCH: 04/15 | iter = 49.3s | total = 3.26m || TRAIN: acc = 56.2 | loss = 1.898 | time = 45.7s || VALID: acc = 58.1 | loss = 1.878 | time = 3.6s || 
exp1-actfun-5 (vary: silu) 	 || EPOCH: 05/15 | iter = 49.7s | total = 4.08m || TRAIN: acc = 58.1 | loss = 1.878 | time = 46.0s || VALID: acc = 61.1 | loss = 1.848 | time = 3.7s || 
exp1-actfun-5 (vary: silu) 	 || EPOCH: 06/15 | iter = 49.2s | total = 4.90m || TRAIN: acc = 59.

Unnamed: 0,exp_begin,model_id,epoch,train_loss,train_acc,train_time,valid_loss,valid_acc,valid_time,test_loss,test_acc,test_time,exp_name,skip,bn_layer,bn_act,optim,sched,lr,num_epoch,seed,act,stat_file,model_file
0,04:10:21,0,1,2.242718,19.435342,44.100960,2.173099,27.358678,3.449552,2.068355,38.623047,0.775935,exp1-actfun,0,none,before,adam,const,0.001,15,3456,sigmoid,output/exp1-actfun_model-0_stat.csv,output/exp1-actfun_model-0_model.pt
1,04:10:21,0,2,2.176533,27.137948,43.689262,2.151777,29.637739,3.399743,2.068355,38.623047,0.775935,exp1-actfun,0,none,before,adam,const,0.001,15,3456,sigmoid,output/exp1-actfun_model-0_stat.csv,output/exp1-actfun_model-0_model.pt
2,04:10:21,0,3,2.139449,31.222027,43.344910,2.100513,35.320462,3.457334,2.068355,38.623047,0.775935,exp1-actfun,0,none,before,adam,const,0.001,15,3456,sigmoid,output/exp1-actfun_model-0_stat.csv,output/exp1-actfun_model-0_model.pt
3,04:10:21,0,4,2.110794,34.345029,43.466002,2.088031,36.813296,3.481153,2.068355,38.623047,0.775935,exp1-actfun,0,none,before,adam,const,0.001,15,3456,sigmoid,output/exp1-actfun_model-0_stat.csv,output/exp1-actfun_model-0_model.pt
4,04:10:21,0,5,2.091779,36.263187,43.234859,2.060115,39.868631,3.462404,2.068355,38.623047,0.775935,exp1-actfun,0,none,before,adam,const,0.001,15,3456,sigmoid,output/exp1-actfun_model-0_stat.csv,output/exp1-actfun_model-0_model.pt
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
85,05:10:33,5,11,1.802030,65.768862,44.671666,1.785134,67.316879,3.561624,1.918414,53.759766,0.800328,exp1-actfun,0,none,before,adam,const,0.001,15,3456,silu,output/exp1-actfun_model-5_stat.csv,output/exp1-actfun_model-5_model.pt
86,05:10:33,5,12,1.797749,66.190457,44.653393,1.788647,67.048169,3.565570,1.918414,53.759766,0.800328,exp1-actfun,0,none,before,adam,const,0.001,15,3456,silu,output/exp1-actfun_model-5_stat.csv,output/exp1-actfun_model-5_model.pt
87,05:10:33,5,13,1.793544,66.673993,44.607698,1.782266,67.615446,3.436088,1.918414,53.759766,0.800328,exp1-actfun,0,none,before,adam,const,0.001,15,3456,silu,output/exp1-actfun_model-5_stat.csv,output/exp1-actfun_model-5_model.pt
88,05:10:33,5,14,1.791745,66.731937,44.728545,1.769822,69.008758,3.514384,1.918414,53.759766,0.800328,exp1-actfun,0,none,before,adam,const,0.001,15,3456,silu,output/exp1-actfun_model-5_stat.csv,output/exp1-actfun_model-5_model.pt


## 2.2. Experiment 2: Skip connection

In [None]:
exp_config = dict(
    name            = 'exp2-skip',
    param   = dict(
        skip        = [1, 2],
        act         = ['tanh', 'silu']
    ),
    const = dict( 
        bn_layer    = 'none',
        bn_act      = 'before',
        optim       = 'adam', 
        sched       = 'const',
        lr          = 1e-3, 
        num_epoch   = EPOCHS,
        seed        = SEED
    )
)

In [None]:
run_experiment(exp_config, device, data_loaders, out_path='output')

Main:   0%|                                                                                                   …

ID=0:   0%|                                                                                                   …

exp2-skip-0 (vary: tanh-1) 	 || EPOCH: 01/15 | iter = 49.6s | total = 0.83m || TRAIN: acc = 35.3 | loss = 2.101 | time = 46.1s || VALID: acc = 45.1 | loss = 2.004 | time = 3.6s || 
exp2-skip-0 (vary: tanh-1) 	 || EPOCH: 02/15 | iter = 49.2s | total = 1.65m || TRAIN: acc = 47.4 | loss = 1.983 | time = 45.7s || VALID: acc = 54.0 | loss = 1.919 | time = 3.5s || 
exp2-skip-0 (vary: tanh-1) 	 || EPOCH: 03/15 | iter = 49.2s | total = 2.47m || TRAIN: acc = 52.2 | loss = 1.935 | time = 45.6s || VALID: acc = 54.2 | loss = 1.913 | time = 3.6s || 
exp2-skip-0 (vary: tanh-1) 	 || EPOCH: 04/15 | iter = 49.3s | total = 3.29m || TRAIN: acc = 55.3 | loss = 1.905 | time = 45.7s || VALID: acc = 57.7 | loss = 1.882 | time = 3.6s || 
exp2-skip-0 (vary: tanh-1) 	 || EPOCH: 05/15 | iter = 49.6s | total = 4.11m || TRAIN: acc = 57.3 | loss = 1.885 | time = 46.1s || VALID: acc = 60.6 | loss = 1.854 | time = 3.5s || 
exp2-skip-0 (vary: tanh-1) 	 || EPOCH: 06/15 | iter = 49.6s | total = 4.94m || TRAIN: acc = 58.

ID=1:   0%|                                                                                                   …

exp2-skip-1 (vary: tanh-2) 	 || EPOCH: 01/15 | iter = 48.9s | total = 0.82m || TRAIN: acc = 38.5 | loss = 2.070 | time = 45.4s || VALID: acc = 48.2 | loss = 1.974 | time = 3.6s || 
exp2-skip-1 (vary: tanh-2) 	 || EPOCH: 02/15 | iter = 48.9s | total = 1.63m || TRAIN: acc = 49.9 | loss = 1.959 | time = 45.4s || VALID: acc = 52.7 | loss = 1.928 | time = 3.6s || 
exp2-skip-1 (vary: tanh-2) 	 || EPOCH: 03/15 | iter = 48.3s | total = 2.44m || TRAIN: acc = 53.8 | loss = 1.920 | time = 44.7s || VALID: acc = 56.8 | loss = 1.891 | time = 3.6s || 
exp2-skip-1 (vary: tanh-2) 	 || EPOCH: 04/15 | iter = 48.5s | total = 3.24m || TRAIN: acc = 56.4 | loss = 1.894 | time = 45.0s || VALID: acc = 56.1 | loss = 1.895 | time = 3.5s || 
exp2-skip-1 (vary: tanh-2) 	 || EPOCH: 05/15 | iter = 48.8s | total = 4.06m || TRAIN: acc = 58.3 | loss = 1.876 | time = 45.2s || VALID: acc = 59.7 | loss = 1.862 | time = 3.6s || 
exp2-skip-1 (vary: tanh-2) 	 || EPOCH: 06/15 | iter = 48.9s | total = 4.87m || TRAIN: acc = 59.

ID=2:   0%|                                                                                                   …

exp2-skip-2 (vary: silu-1) 	 || EPOCH: 01/15 | iter = 49.4s | total = 0.82m || TRAIN: acc = 35.7 | loss = 2.096 | time = 45.8s || VALID: acc = 45.2 | loss = 2.002 | time = 3.6s || 
exp2-skip-2 (vary: silu-1) 	 || EPOCH: 02/15 | iter = 49.4s | total = 1.65m || TRAIN: acc = 47.0 | loss = 1.988 | time = 45.9s || VALID: acc = 53.2 | loss = 1.926 | time = 3.6s || 
exp2-skip-2 (vary: silu-1) 	 || EPOCH: 03/15 | iter = 49.8s | total = 2.48m || TRAIN: acc = 51.7 | loss = 1.941 | time = 46.2s || VALID: acc = 54.4 | loss = 1.912 | time = 3.6s || 
exp2-skip-2 (vary: silu-1) 	 || EPOCH: 04/15 | iter = 49.0s | total = 3.29m || TRAIN: acc = 55.0 | loss = 1.909 | time = 45.4s || VALID: acc = 58.4 | loss = 1.876 | time = 3.6s || 
exp2-skip-2 (vary: silu-1) 	 || EPOCH: 05/15 | iter = 49.6s | total = 4.12m || TRAIN: acc = 57.0 | loss = 1.889 | time = 46.1s || VALID: acc = 60.4 | loss = 1.857 | time = 3.6s || 
exp2-skip-2 (vary: silu-1) 	 || EPOCH: 06/15 | iter = 50.0s | total = 4.95m || TRAIN: acc = 58.

ID=3:   0%|                                                                                                   …

exp2-skip-3 (vary: silu-2) 	 || EPOCH: 01/15 | iter = 48.7s | total = 0.81m || TRAIN: acc = 37.2 | loss = 2.082 | time = 45.2s || VALID: acc = 46.5 | loss = 1.991 | time = 3.6s || 
exp2-skip-3 (vary: silu-2) 	 || EPOCH: 02/15 | iter = 49.2s | total = 1.63m || TRAIN: acc = 46.9 | loss = 1.988 | time = 45.7s || VALID: acc = 52.3 | loss = 1.934 | time = 3.5s || 
exp2-skip-3 (vary: silu-2) 	 || EPOCH: 03/15 | iter = 49.6s | total = 2.46m || TRAIN: acc = 51.6 | loss = 1.941 | time = 46.0s || VALID: acc = 54.6 | loss = 1.911 | time = 3.6s || 
exp2-skip-3 (vary: silu-2) 	 || EPOCH: 04/15 | iter = 49.7s | total = 3.29m || TRAIN: acc = 55.0 | loss = 1.910 | time = 46.1s || VALID: acc = 58.6 | loss = 1.872 | time = 3.6s || 
exp2-skip-3 (vary: silu-2) 	 || EPOCH: 05/15 | iter = 49.2s | total = 4.11m || TRAIN: acc = 57.4 | loss = 1.884 | time = 45.7s || VALID: acc = 60.0 | loss = 1.858 | time = 3.5s || 
exp2-skip-3 (vary: silu-2) 	 || EPOCH: 06/15 | iter = 49.4s | total = 4.93m || TRAIN: acc = 59.

Unnamed: 0,exp_begin,model_id,epoch,train_loss,train_acc,train_time,valid_loss,valid_acc,valid_time,test_loss,test_acc,test_time,exp_name,bn_layer,bn_act,optim,sched,lr,num_epoch,seed,act,skip,stat_file,model_file
0,05:31:45,0,1,2.10097,35.256154,46.051029,2.003989,45.123408,3.572781,1.929146,52.587891,0.806079,exp2-skip,none,before,adam,const,0.001,15,3456,tanh,1,output/exp2-skip_model-0_stat.csv,output/exp2-skip_model-0_model.pt
1,05:31:45,0,2,1.983118,47.378517,45.651103,1.918733,54.010748,3.547631,1.929146,52.587891,0.806079,exp2-skip,none,before,adam,const,0.001,15,3456,tanh,1,output/exp2-skip_model-0_stat.csv,output/exp2-skip_model-0_model.pt
2,05:31:45,0,3,1.935375,52.181905,45.5854,1.913165,54.23965,3.603044,1.929146,52.587891,0.806079,exp2-skip,none,before,adam,const,0.001,15,3456,tanh,1,output/exp2-skip_model-0_stat.csv,output/exp2-skip_model-0_model.pt
3,05:31:45,0,4,1.905214,55.304907,45.661315,1.881747,57.732882,3.620283,1.929146,52.587891,0.806079,exp2-skip,none,before,adam,const,0.001,15,3456,tanh,1,output/exp2-skip_model-0_stat.csv,output/exp2-skip_model-0_model.pt
4,05:31:45,0,5,1.884898,57.330962,46.127845,1.854208,60.57922,3.474415,1.929146,52.587891,0.806079,exp2-skip,none,before,adam,const,0.001,15,3456,tanh,1,output/exp2-skip_model-0_stat.csv,output/exp2-skip_model-0_model.pt
5,05:31:45,0,6,1.871721,58.621723,45.948558,1.854941,60.280653,3.604928,1.929146,52.587891,0.806079,exp2-skip,none,before,adam,const,0.001,15,3456,tanh,1,output/exp2-skip_model-0_stat.csv,output/exp2-skip_model-0_model.pt
6,05:31:45,0,7,1.861663,59.586797,45.954837,1.841099,61.853105,3.618554,1.929146,52.587891,0.806079,exp2-skip,none,before,adam,const,0.001,15,3456,tanh,1,output/exp2-skip_model-0_stat.csv,output/exp2-skip_model-0_model.pt
7,05:31:45,0,8,1.849806,60.809623,46.039577,1.820779,63.833599,3.543272,1.929146,52.587891,0.806079,exp2-skip,none,before,adam,const,0.001,15,3456,tanh,1,output/exp2-skip_model-0_stat.csv,output/exp2-skip_model-0_model.pt
8,05:31:45,0,9,1.839147,62.032449,45.812149,1.824366,63.335987,3.578202,1.929146,52.587891,0.806079,exp2-skip,none,before,adam,const,0.001,15,3456,tanh,1,output/exp2-skip_model-0_stat.csv,output/exp2-skip_model-0_model.pt
9,05:31:45,0,10,1.838853,61.912564,46.124181,1.819942,63.933121,3.587903,1.929146,52.587891,0.806079,exp2-skip,none,before,adam,const,0.001,15,3456,tanh,1,output/exp2-skip_model-0_stat.csv,output/exp2-skip_model-0_model.pt


## 2.3. Experiment 3: Batch-normalization

In [None]:
exp_config = dict(
    name            = 'exp3-bn',
    param   = dict(
        bn_layer    = ['conv', 'fc', 'all'],
        bn_act      = mod_arch_conf.bn_loc2act_opts
    ),
    const = dict( 
        act         = 'silu', 
        skip        = 2,
        optim       = 'adam', 
        sched       = 'const',
        lr          = 1e-3, 
        num_epoch   = EPOCHS,
        seed        = SEED
    )
)

In [None]:
run_experiment(exp_config, device, data_loaders, out_path='output')

Main:   0%|                                                                                                   …

ID=0:   0%|                                                                                                   …

exp3-bn-0 (vary: before-conv) 	 || EPOCH: 01/15 | iter = 51.2s | total = 0.85m || TRAIN: acc = 43.3 | loss = 2.026 | time = 47.6s || VALID: acc = 54.5 | loss = 1.914 | time = 3.6s || 
exp3-bn-0 (vary: before-conv) 	 || EPOCH: 02/15 | iter = 51.2s | total = 1.71m || TRAIN: acc = 54.4 | loss = 1.916 | time = 47.6s || VALID: acc = 57.8 | loss = 1.882 | time = 3.6s || 
exp3-bn-0 (vary: before-conv) 	 || EPOCH: 03/15 | iter = 51.5s | total = 2.57m || TRAIN: acc = 58.3 | loss = 1.878 | time = 47.9s || VALID: acc = 62.1 | loss = 1.839 | time = 3.6s || 
exp3-bn-0 (vary: before-conv) 	 || EPOCH: 04/15 | iter = 51.4s | total = 3.42m || TRAIN: acc = 61.0 | loss = 1.850 | time = 47.7s || VALID: acc = 64.2 | loss = 1.817 | time = 3.6s || 
exp3-bn-0 (vary: before-conv) 	 || EPOCH: 05/15 | iter = 51.3s | total = 4.28m || TRAIN: acc = 62.8 | loss = 1.833 | time = 47.6s || VALID: acc = 66.3 | loss = 1.800 | time = 3.7s || 
exp3-bn-0 (vary: before-conv) 	 || EPOCH: 06/15 | iter = 51.3s | total = 5.13m |

ID=1:   0%|                                                                                                   …

exp3-bn-1 (vary: before-fc) 	 || EPOCH: 01/15 | iter = 49.3s | total = 0.82m || TRAIN: acc = 37.3 | loss = 2.082 | time = 45.7s || VALID: acc = 46.0 | loss = 1.993 | time = 3.6s || 
exp3-bn-1 (vary: before-fc) 	 || EPOCH: 02/15 | iter = 49.6s | total = 1.65m || TRAIN: acc = 47.1 | loss = 1.986 | time = 46.0s || VALID: acc = 49.8 | loss = 1.958 | time = 3.6s || 
exp3-bn-1 (vary: before-fc) 	 || EPOCH: 03/15 | iter = 49.6s | total = 2.48m || TRAIN: acc = 52.1 | loss = 1.937 | time = 45.9s || VALID: acc = 56.4 | loss = 1.896 | time = 3.7s || 
exp3-bn-1 (vary: before-fc) 	 || EPOCH: 04/15 | iter = 49.2s | total = 3.30m || TRAIN: acc = 55.1 | loss = 1.907 | time = 45.7s || VALID: acc = 58.1 | loss = 1.879 | time = 3.5s || 
exp3-bn-1 (vary: before-fc) 	 || EPOCH: 05/15 | iter = 49.3s | total = 4.12m || TRAIN: acc = 57.8 | loss = 1.880 | time = 45.6s || VALID: acc = 61.4 | loss = 1.846 | time = 3.7s || 
exp3-bn-1 (vary: before-fc) 	 || EPOCH: 06/15 | iter = 49.4s | total = 4.94m || TRAIN: acc

ID=2:   0%|                                                                                                   …

exp3-bn-2 (vary: before-all) 	 || EPOCH: 01/15 | iter = 51.5s | total = 0.86m || TRAIN: acc = 43.1 | loss = 2.028 | time = 47.9s || VALID: acc = 54.6 | loss = 1.915 | time = 3.6s || 
exp3-bn-2 (vary: before-all) 	 || EPOCH: 02/15 | iter = 51.5s | total = 1.72m || TRAIN: acc = 54.2 | loss = 1.919 | time = 47.8s || VALID: acc = 58.9 | loss = 1.872 | time = 3.7s || 
exp3-bn-2 (vary: before-all) 	 || EPOCH: 03/15 | iter = 51.6s | total = 2.58m || TRAIN: acc = 57.8 | loss = 1.883 | time = 47.9s || VALID: acc = 60.9 | loss = 1.849 | time = 3.7s || 
exp3-bn-2 (vary: before-all) 	 || EPOCH: 04/15 | iter = 51.2s | total = 3.43m || TRAIN: acc = 60.9 | loss = 1.852 | time = 47.6s || VALID: acc = 63.0 | loss = 1.829 | time = 3.5s || 
exp3-bn-2 (vary: before-all) 	 || EPOCH: 05/15 | iter = 51.7s | total = 4.29m || TRAIN: acc = 62.3 | loss = 1.837 | time = 48.0s || VALID: acc = 63.9 | loss = 1.821 | time = 3.7s || 
exp3-bn-2 (vary: before-all) 	 || EPOCH: 06/15 | iter = 51.5s | total = 5.15m || TRAI

ID=3:   0%|                                                                                                   …

exp3-bn-3 (vary: after-conv) 	 || EPOCH: 01/15 | iter = 52.3s | total = 0.87m || TRAIN: acc = 40.0 | loss = 2.054 | time = 48.6s || VALID: acc = 49.7 | loss = 1.959 | time = 3.7s || 
exp3-bn-3 (vary: after-conv) 	 || EPOCH: 02/15 | iter = 51.9s | total = 1.74m || TRAIN: acc = 50.1 | loss = 1.957 | time = 48.2s || VALID: acc = 55.8 | loss = 1.901 | time = 3.8s || 
exp3-bn-3 (vary: after-conv) 	 || EPOCH: 03/15 | iter = 51.8s | total = 2.60m || TRAIN: acc = 54.1 | loss = 1.917 | time = 48.1s || VALID: acc = 59.7 | loss = 1.861 | time = 3.7s || 
exp3-bn-3 (vary: after-conv) 	 || EPOCH: 04/15 | iter = 51.5s | total = 3.46m || TRAIN: acc = 57.2 | loss = 1.885 | time = 47.9s || VALID: acc = 61.6 | loss = 1.841 | time = 3.7s || 
exp3-bn-3 (vary: after-conv) 	 || EPOCH: 05/15 | iter = 51.6s | total = 4.32m || TRAIN: acc = 59.5 | loss = 1.864 | time = 47.9s || VALID: acc = 63.5 | loss = 1.822 | time = 3.6s || 
exp3-bn-3 (vary: after-conv) 	 || EPOCH: 06/15 | iter = 51.2s | total = 5.17m || TRAI

ID=4:   0%|                                                                                                   …

exp3-bn-4 (vary: after-fc) 	 || EPOCH: 01/15 | iter = 49.1s | total = 0.82m || TRAIN: acc = 37.3 | loss = 2.082 | time = 45.6s || VALID: acc = 46.1 | loss = 1.993 | time = 3.5s || 
exp3-bn-4 (vary: after-fc) 	 || EPOCH: 02/15 | iter = 48.5s | total = 1.63m || TRAIN: acc = 47.3 | loss = 1.984 | time = 45.0s || VALID: acc = 51.9 | loss = 1.940 | time = 3.5s || 
exp3-bn-4 (vary: after-fc) 	 || EPOCH: 03/15 | iter = 48.9s | total = 2.44m || TRAIN: acc = 52.0 | loss = 1.938 | time = 45.2s || VALID: acc = 56.3 | loss = 1.896 | time = 3.6s || 
exp3-bn-4 (vary: after-fc) 	 || EPOCH: 04/15 | iter = 49.0s | total = 3.26m || TRAIN: acc = 55.4 | loss = 1.905 | time = 45.4s || VALID: acc = 58.6 | loss = 1.872 | time = 3.6s || 
exp3-bn-4 (vary: after-fc) 	 || EPOCH: 05/15 | iter = 48.6s | total = 4.07m || TRAIN: acc = 58.1 | loss = 1.879 | time = 45.1s || VALID: acc = 60.6 | loss = 1.853 | time = 3.5s || 
exp3-bn-4 (vary: after-fc) 	 || EPOCH: 06/15 | iter = 48.8s | total = 4.88m || TRAIN: acc = 60.

ID=5:   0%|                                                                                                   …

exp3-bn-5 (vary: after-all) 	 || EPOCH: 01/15 | iter = 51.0s | total = 0.85m || TRAIN: acc = 39.6 | loss = 2.058 | time = 47.4s || VALID: acc = 50.5 | loss = 1.952 | time = 3.6s || 
exp3-bn-5 (vary: after-all) 	 || EPOCH: 02/15 | iter = 50.9s | total = 1.70m || TRAIN: acc = 49.7 | loss = 1.961 | time = 47.2s || VALID: acc = 55.0 | loss = 1.909 | time = 3.7s || 
exp3-bn-5 (vary: after-all) 	 || EPOCH: 03/15 | iter = 50.4s | total = 2.54m || TRAIN: acc = 54.3 | loss = 1.916 | time = 46.7s || VALID: acc = 59.3 | loss = 1.867 | time = 3.7s || 
exp3-bn-5 (vary: after-all) 	 || EPOCH: 04/15 | iter = 51.2s | total = 3.39m || TRAIN: acc = 57.5 | loss = 1.883 | time = 47.5s || VALID: acc = 61.6 | loss = 1.844 | time = 3.7s || 
exp3-bn-5 (vary: after-all) 	 || EPOCH: 05/15 | iter = 50.6s | total = 4.24m || TRAIN: acc = 59.0 | loss = 1.868 | time = 46.9s || VALID: acc = 63.2 | loss = 1.828 | time = 3.7s || 
exp3-bn-5 (vary: after-all) 	 || EPOCH: 06/15 | iter = 50.4s | total = 5.08m || TRAIN: acc

Unnamed: 0,exp_begin,model_id,epoch,train_loss,train_acc,train_time,valid_loss,valid_acc,valid_time,test_loss,test_acc,test_time,exp_name,act,skip,optim,sched,lr,num_epoch,seed,bn_act,bn_layer,stat_file,model_file
0,06:21:04,0,1,2.026274,43.260470,47.559724,1.914375,54.488455,3.590279,1.867223,59.130859,0.838452,exp3-bn,silu,2,adam,const,0.001,15,3456,before,conv,output/exp3-bn_model-0_stat.csv,output/exp3-bn_model-0_model.pt
1,06:21:04,0,2,1.916066,54.419757,47.574446,1.881937,57.842357,3.643978,1.867223,59.130859,0.838452,exp3-bn,silu,2,adam,const,0.001,15,3456,before,conv,output/exp3-bn_model-0_stat.csv,output/exp3-bn_model-0_model.pt
2,06:21:04,0,3,1.877779,58.250080,47.903779,1.839086,62.091959,3.636797,1.867223,59.130859,0.838452,exp3-bn,silu,2,adam,const,0.001,15,3456,before,conv,output/exp3-bn_model-0_stat.csv,output/exp3-bn_model-0_model.pt
3,06:21:04,0,4,1.850166,61.007433,47.719651,1.817442,64.171975,3.645831,1.867223,59.130859,0.838452,exp3-bn,silu,2,adam,const,0.001,15,3456,before,conv,output/exp3-bn_model-0_stat.csv,output/exp3-bn_model-0_model.pt
4,06:21:04,0,5,1.832777,62.769741,47.638752,1.800287,66.281847,3.672426,1.867223,59.130859,0.838452,exp3-bn,silu,2,adam,const,0.001,15,3456,before,conv,output/exp3-bn_model-0_stat.csv,output/exp3-bn_model-0_model.pt
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
85,07:24:15,5,11,1.806366,65.233376,46.842077,1.800062,65.963376,3.600542,1.893835,56.542969,0.826534,exp3-bn,silu,2,adam,const,0.001,15,3456,after,all,output/exp3-bn_model-5_stat.csv,output/exp3-bn_model-5_model.pt
86,07:24:15,5,12,1.801118,65.838795,46.733866,1.772858,68.789809,3.635389,1.893835,56.542969,0.826534,exp3-bn,silu,2,adam,const,0.001,15,3456,after,all,output/exp3-bn_model-5_stat.csv,output/exp3-bn_model-5_model.pt
87,07:24:15,5,13,1.792333,66.644022,46.856124,1.768272,69.058519,3.566513,1.893835,56.542969,0.826534,exp3-bn,silu,2,adam,const,0.001,15,3456,after,all,output/exp3-bn_model-5_stat.csv,output/exp3-bn_model-5_model.pt
88,07:24:15,5,14,1.791468,66.733935,46.887515,1.756448,70.312500,3.599561,1.893835,56.542969,0.826534,exp3-bn,silu,2,adam,const,0.001,15,3456,after,all,output/exp3-bn_model-5_stat.csv,output/exp3-bn_model-5_model.pt


## 2.4. Experiment 4: Optimizer

In [None]:
exp_config = dict(
    name            = 'exp4-optim',
    param   = dict(
        optim       = optimsched_conf.optim_opts,
        lr          = [1e-3, 1e-1]
    ),
    const = dict( 
        act         = 'silu',
        skip        = 2,
        bn_layer    = 'all', 
        bn_act      = 'before',
        sched       = 'const',
        num_epoch   = EPOCHS,
        seed        = SEED
    )
)

In [None]:
run_experiment(exp_config, device, data_loaders, out_path='output')

Main:   0%|                                                                                                   …

ID=0:   0%|                                                                                                   …

  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)


exp4-optim-0 (vary: 0.001-sgd) 	 || EPOCH: 01/15 | iter = 47.7s | total = 0.79m || TRAIN: acc = 9.3 | loss = 2.319 | time = 44.1s || VALID: acc = 11.9 | loss = 2.300 | time = 3.6s || 
exp4-optim-0 (vary: 0.001-sgd) 	 || EPOCH: 02/15 | iter = 48.3s | total = 1.60m || TRAIN: acc = 12.5 | loss = 2.298 | time = 44.7s || VALID: acc = 15.5 | loss = 2.277 | time = 3.6s || 
exp4-optim-0 (vary: 0.001-sgd) 	 || EPOCH: 03/15 | iter = 48.8s | total = 2.41m || TRAIN: acc = 14.9 | loss = 2.281 | time = 45.1s || VALID: acc = 18.7 | loss = 2.257 | time = 3.7s || 
exp4-optim-0 (vary: 0.001-sgd) 	 || EPOCH: 04/15 | iter = 48.8s | total = 3.22m || TRAIN: acc = 17.2 | loss = 2.266 | time = 45.1s || VALID: acc = 21.5 | loss = 2.243 | time = 3.6s || 
exp4-optim-0 (vary: 0.001-sgd) 	 || EPOCH: 05/15 | iter = 48.8s | total = 4.04m || TRAIN: acc = 18.9 | loss = 2.254 | time = 45.3s || VALID: acc = 22.5 | loss = 2.233 | time = 3.6s || 
exp4-optim-0 (vary: 0.001-sgd) 	 || EPOCH: 06/15 | iter = 48.6s | total = 4.

ID=1:   0%|                                                                                                   …

exp4-optim-1 (vary: 0.001-adam) 	 || EPOCH: 01/15 | iter = 51.0s | total = 0.85m || TRAIN: acc = 43.3 | loss = 2.027 | time = 47.3s || VALID: acc = 54.5 | loss = 1.917 | time = 3.7s || 
exp4-optim-1 (vary: 0.001-adam) 	 || EPOCH: 02/15 | iter = 50.4s | total = 1.69m || TRAIN: acc = 54.2 | loss = 1.919 | time = 46.8s || VALID: acc = 59.3 | loss = 1.873 | time = 3.6s || 
exp4-optim-1 (vary: 0.001-adam) 	 || EPOCH: 03/15 | iter = 50.0s | total = 2.52m || TRAIN: acc = 57.8 | loss = 1.882 | time = 46.5s || VALID: acc = 62.2 | loss = 1.839 | time = 3.5s || 
exp4-optim-1 (vary: 0.001-adam) 	 || EPOCH: 04/15 | iter = 50.2s | total = 3.36m || TRAIN: acc = 60.6 | loss = 1.855 | time = 46.7s || VALID: acc = 63.9 | loss = 1.822 | time = 3.5s || 
exp4-optim-1 (vary: 0.001-adam) 	 || EPOCH: 05/15 | iter = 50.1s | total = 4.20m || TRAIN: acc = 62.4 | loss = 1.837 | time = 46.4s || VALID: acc = 64.6 | loss = 1.813 | time = 3.7s || 
exp4-optim-1 (vary: 0.001-adam) 	 || EPOCH: 06/15 | iter = 50.5s | tot

ID=2:   0%|                                                                                                   …

exp4-optim-2 (vary: 0.1-sgd) 	 || EPOCH: 01/15 | iter = 49.4s | total = 0.82m || TRAIN: acc = 35.4 | loss = 2.106 | time = 45.8s || VALID: acc = 47.3 | loss = 1.991 | time = 3.6s || 
exp4-optim-2 (vary: 0.1-sgd) 	 || EPOCH: 02/15 | iter = 49.4s | total = 1.65m || TRAIN: acc = 47.2 | loss = 1.992 | time = 45.6s || VALID: acc = 52.5 | loss = 1.940 | time = 3.8s || 
exp4-optim-2 (vary: 0.1-sgd) 	 || EPOCH: 03/15 | iter = 49.2s | total = 2.47m || TRAIN: acc = 52.1 | loss = 1.943 | time = 45.6s || VALID: acc = 56.6 | loss = 1.895 | time = 3.6s || 
exp4-optim-2 (vary: 0.1-sgd) 	 || EPOCH: 04/15 | iter = 49.0s | total = 3.28m || TRAIN: acc = 55.8 | loss = 1.906 | time = 45.4s || VALID: acc = 58.0 | loss = 1.883 | time = 3.7s || 
exp4-optim-2 (vary: 0.1-sgd) 	 || EPOCH: 05/15 | iter = 49.6s | total = 4.11m || TRAIN: acc = 58.2 | loss = 1.882 | time = 46.0s || VALID: acc = 59.5 | loss = 1.867 | time = 3.7s || 
exp4-optim-2 (vary: 0.1-sgd) 	 || EPOCH: 06/15 | iter = 49.3s | total = 4.93m || TRAI

ID=3:   0%|                                                                                                   …

exp4-optim-3 (vary: 0.1-adam) 	 || EPOCH: 01/15 | iter = 51.7s | total = 0.86m || TRAIN: acc = 14.3 | loss = 2.318 | time = 48.1s || VALID: acc = 14.4 | loss = 2.317 | time = 3.7s || 
exp4-optim-3 (vary: 0.1-adam) 	 || EPOCH: 02/15 | iter = 51.6s | total = 1.72m || TRAIN: acc = 13.1 | loss = 2.330 | time = 48.0s || VALID: acc = 13.3 | loss = 2.328 | time = 3.6s || 
exp4-optim-3 (vary: 0.1-adam) 	 || EPOCH: 03/15 | iter = 51.3s | total = 2.58m || TRAIN: acc = 12.3 | loss = 2.338 | time = 47.6s || VALID: acc = 13.5 | loss = 2.326 | time = 3.7s || 
exp4-optim-3 (vary: 0.1-adam) 	 || EPOCH: 04/15 | iter = 51.5s | total = 3.44m || TRAIN: acc = 12.4 | loss = 2.337 | time = 47.8s || VALID: acc = 14.4 | loss = 2.317 | time = 3.7s || 
exp4-optim-3 (vary: 0.1-adam) 	 || EPOCH: 05/15 | iter = 51.5s | total = 4.29m || TRAIN: acc = 12.4 | loss = 2.338 | time = 47.8s || VALID: acc = 13.7 | loss = 2.324 | time = 3.7s || 
exp4-optim-3 (vary: 0.1-adam) 	 || EPOCH: 06/15 | iter = 51.6s | total = 5.15m |

Unnamed: 0,exp_begin,model_id,epoch,train_loss,train_acc,train_time,valid_loss,valid_acc,valid_time,test_loss,test_acc,test_time,exp_name,act,skip,bn_layer,bn_act,sched,num_epoch,seed,lr,optim,stat_file,model_file
0,00:30:19,0,1,2.319151,9.331042,44.097884,2.300377,11.87301,3.557032,2.187573,26.269531,0.804541,exp4-optim,silu,2,all,before,const,15,3456,0.001,sgd,output/exp4-optim_model-0_stat.csv,output/exp4-optim_model-0_model.pt
1,00:30:19,0,2,2.29779,12.511988,44.670603,2.276769,15.465764,3.606107,2.187573,26.269531,0.804541,exp4-optim,silu,2,all,before,const,15,3456,0.001,sgd,output/exp4-optim_model-0_stat.csv,output/exp4-optim_model-0_model.pt
2,00:30:19,0,3,2.280821,14.8957,45.077822,2.257245,18.730096,3.682173,2.187573,26.269531,0.804541,exp4-optim,silu,2,all,before,const,15,3456,0.001,sgd,output/exp4-optim_model-0_stat.csv,output/exp4-optim_model-0_model.pt
3,00:30:19,0,4,2.266023,17.201487,45.127819,2.243041,21.457006,3.644417,2.187573,26.269531,0.804541,exp4-optim,silu,2,all,before,const,15,3456,0.001,sgd,output/exp4-optim_model-0_stat.csv,output/exp4-optim_model-0_model.pt
4,00:30:19,0,5,2.253658,18.943814,45.275253,2.232728,22.482086,3.567362,2.187573,26.269531,0.804541,exp4-optim,silu,2,all,before,const,15,3456,0.001,sgd,output/exp4-optim_model-0_stat.csv,output/exp4-optim_model-0_model.pt
5,00:30:19,0,6,2.244743,19.956841,44.956711,2.218734,24.104299,3.635164,2.187573,26.269531,0.804541,exp4-optim,silu,2,all,before,const,15,3456,0.001,sgd,output/exp4-optim_model-0_stat.csv,output/exp4-optim_model-0_model.pt
6,00:30:19,0,7,2.231909,21.924952,45.322627,2.207244,25.716561,3.579876,2.187573,26.269531,0.804541,exp4-optim,silu,2,all,before,const,15,3456,0.001,sgd,output/exp4-optim_model-0_stat.csv,output/exp4-optim_model-0_model.pt
7,00:30:19,0,8,2.221902,23.035886,45.094291,2.196266,27.040207,3.566835,2.187573,26.269531,0.804541,exp4-optim,silu,2,all,before,const,15,3456,0.001,sgd,output/exp4-optim_model-0_stat.csv,output/exp4-optim_model-0_model.pt
8,00:30:19,0,9,2.210205,24.660326,45.317766,2.17899,28.702229,3.576454,2.187573,26.269531,0.804541,exp4-optim,silu,2,all,before,const,15,3456,0.001,sgd,output/exp4-optim_model-0_stat.csv,output/exp4-optim_model-0_model.pt
9,00:30:19,0,10,2.199905,25.851183,45.282645,2.168786,30.04578,3.6545,2.187573,26.269531,0.804541,exp4-optim,silu,2,all,before,const,15,3456,0.001,sgd,output/exp4-optim_model-0_stat.csv,output/exp4-optim_model-0_model.pt


## 2.5. Experiment 5: Schedulers

Increase number of epochs to observe the effects of the scheduler, since the scheduler usually works on a longer time scale 

Increase batch size to 128 to save around 3 secs per epoch. 

Many sources online say that with using adam, scheduler is not necessary as the learning rate is adaptive already inside adam, and SGD probably gains more benefits from that (like the paper with cosine annealing with warm restarts was, I believe, using SGD). However, I wanted to observe whether there is any benefit of using the schedulers with ADAM, since SGD, previously shown, was bad. 

https://spell.ml/blog/lr-schedulers-and-adaptive-optimizers-YHmwMhAAACYADm6F


In [None]:
EPOCHS = 60
BATCH_SIZE = 128
data_loaders = get_data_loaders(batch_size = BATCH_SIZE)

In [None]:
exp_config = dict(
    name            = 'exp5-sgdsched',
    param   = dict(
        sched       = optimsched_conf.sched_opts
    ),
    const = dict( 
        act         = 'silu', 
        skip        = 2,
        bn_layer    = 'all', 
        bn_act      = 'before',
        optim       = 'sgd',
        lr          = 1e-1, 
        num_epoch   = EPOCHS,
        seed        = SEED
    )
)

run_experiment(exp_config, device, data_loaders, out_path='output')

Main:   0%|                                                                                                   …

ID=0:   0%|                                                                                                   …

  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)


exp5-sgdsched-0 (vary: const) 	 || EPOCH: 01/60 | iter = 45.2s | total = 0.75m || TRAIN: acc = 30.9 | loss = 2.148 | time = 41.9s || VALID: acc = 44.0 | loss = 2.031 | time = 3.4s || 
exp5-sgdsched-0 (vary: const) 	 || EPOCH: 02/60 | iter = 45.7s | total = 1.52m || TRAIN: acc = 42.8 | loss = 2.035 | time = 42.4s || VALID: acc = 48.9 | loss = 1.980 | time = 3.3s || 
exp5-sgdsched-0 (vary: const) 	 || EPOCH: 03/60 | iter = 45.3s | total = 2.27m || TRAIN: acc = 47.8 | loss = 1.987 | time = 42.1s || VALID: acc = 51.8 | loss = 1.951 | time = 3.3s || 
exp5-sgdsched-0 (vary: const) 	 || EPOCH: 04/60 | iter = 45.7s | total = 3.03m || TRAIN: acc = 51.4 | loss = 1.952 | time = 42.4s || VALID: acc = 56.2 | loss = 1.902 | time = 3.3s || 
exp5-sgdsched-0 (vary: const) 	 || EPOCH: 05/60 | iter = 45.4s | total = 3.79m || TRAIN: acc = 54.6 | loss = 1.921 | time = 42.1s || VALID: acc = 54.4 | loss = 1.917 | time = 3.3s || 
exp5-sgdsched-0 (vary: const) 	 || EPOCH: 06/60 | iter = 45.5s | total = 4.55m |

ID=1:   0%|                                                                                                   …

exp5-sgdsched-1 (vary: step) 	 || EPOCH: 01/60 | iter = 45.6s | total = 0.76m || TRAIN: acc = 30.8 | loss = 2.149 | time = 42.2s || VALID: acc = 43.6 | loss = 2.033 | time = 3.3s || 
exp5-sgdsched-1 (vary: step) 	 || EPOCH: 02/60 | iter = 45.8s | total = 1.52m || TRAIN: acc = 42.8 | loss = 2.035 | time = 42.4s || VALID: acc = 49.2 | loss = 1.975 | time = 3.3s || 
exp5-sgdsched-1 (vary: step) 	 || EPOCH: 03/60 | iter = 45.9s | total = 2.29m || TRAIN: acc = 47.9 | loss = 1.986 | time = 42.4s || VALID: acc = 52.6 | loss = 1.943 | time = 3.5s || 
exp5-sgdsched-1 (vary: step) 	 || EPOCH: 04/60 | iter = 46.1s | total = 3.06m || TRAIN: acc = 51.3 | loss = 1.951 | time = 42.7s || VALID: acc = 55.4 | loss = 1.913 | time = 3.4s || 
exp5-sgdsched-1 (vary: step) 	 || EPOCH: 05/60 | iter = 45.7s | total = 3.82m || TRAIN: acc = 54.4 | loss = 1.922 | time = 42.4s || VALID: acc = 56.8 | loss = 1.895 | time = 3.2s || 
exp5-sgdsched-1 (vary: step) 	 || EPOCH: 06/60 | iter = 45.9s | total = 4.58m || TRAI

ID=2:   0%|                                                                                                   …

exp5-sgdsched-2 (vary: exp) 	 || EPOCH: 01/60 | iter = 46.1s | total = 0.77m || TRAIN: acc = 31.0 | loss = 2.148 | time = 42.7s || VALID: acc = 44.0 | loss = 2.031 | time = 3.3s || 
exp5-sgdsched-2 (vary: exp) 	 || EPOCH: 02/60 | iter = 45.9s | total = 1.53m || TRAIN: acc = 43.0 | loss = 2.034 | time = 42.6s || VALID: acc = 48.3 | loss = 1.984 | time = 3.3s || 
exp5-sgdsched-2 (vary: exp) 	 || EPOCH: 03/60 | iter = 45.9s | total = 2.30m || TRAIN: acc = 47.8 | loss = 1.986 | time = 42.6s || VALID: acc = 52.6 | loss = 1.938 | time = 3.3s || 
exp5-sgdsched-2 (vary: exp) 	 || EPOCH: 04/60 | iter = 45.9s | total = 3.06m || TRAIN: acc = 51.6 | loss = 1.951 | time = 42.6s || VALID: acc = 55.6 | loss = 1.908 | time = 3.3s || 
exp5-sgdsched-2 (vary: exp) 	 || EPOCH: 05/60 | iter = 45.5s | total = 3.82m || TRAIN: acc = 54.3 | loss = 1.923 | time = 42.2s || VALID: acc = 57.1 | loss = 1.893 | time = 3.3s || 
exp5-sgdsched-2 (vary: exp) 	 || EPOCH: 06/60 | iter = 46.0s | total = 4.59m || TRAIN: acc

In [None]:
# continue experiment due to break (last model (exp) was saved for both stat and model file, but somehow didn't print last progress)
exp_config = dict(
    name            = 'exp5-sgdsched',
    param   = dict(
        sched       = optimsched_conf.sched_opts
    ),
    const = dict( 
        act         = 'silu', 
        skip        = 2,
        bn_layer    = 'all', 
        bn_act      = 'before',
        optim       = 'sgd',
        lr          = 1e-1, 
        num_epoch   = EPOCHS,
        seed        = SEED
    )
)

run_experiment(exp_config, device, data_loaders, out_path='output', 
               restart_exp=False)

Main:   0%|                                                                                                   …

ID=3:   0%|                                                                                                   …

exp5-sgdsched-3 (vary: reduce_on_plateau) 	 || EPOCH: 01/60 | iter = 44.9s | total = 0.75m || TRAIN: acc = 30.9 | loss = 2.148 | time = 41.7s || VALID: acc = 43.6 | loss = 2.033 | time = 3.2s || 
exp5-sgdsched-3 (vary: reduce_on_plateau) 	 || EPOCH: 02/60 | iter = 45.2s | total = 1.50m || TRAIN: acc = 43.0 | loss = 2.035 | time = 41.9s || VALID: acc = 48.4 | loss = 1.986 | time = 3.3s || 
exp5-sgdsched-3 (vary: reduce_on_plateau) 	 || EPOCH: 03/60 | iter = 45.3s | total = 2.26m || TRAIN: acc = 47.9 | loss = 1.986 | time = 42.1s || VALID: acc = 50.8 | loss = 1.954 | time = 3.2s || 
exp5-sgdsched-3 (vary: reduce_on_plateau) 	 || EPOCH: 04/60 | iter = 45.0s | total = 3.01m || TRAIN: acc = 51.4 | loss = 1.951 | time = 41.8s || VALID: acc = 55.5 | loss = 1.907 | time = 3.2s || 
exp5-sgdsched-3 (vary: reduce_on_plateau) 	 || EPOCH: 05/60 | iter = 45.3s | total = 3.76m || TRAIN: acc = 54.5 | loss = 1.922 | time = 42.0s || VALID: acc = 57.6 | loss = 1.892 | time = 3.3s || 
exp5-sgdsched-3 (var

ID=4:   0%|                                                                                                   …

exp5-sgdsched-4 (vary: anneal) 	 || EPOCH: 01/60 | iter = 45.6s | total = 0.76m || TRAIN: acc = 30.8 | loss = 2.149 | time = 42.3s || VALID: acc = 43.4 | loss = 2.036 | time = 3.3s || 
exp5-sgdsched-4 (vary: anneal) 	 || EPOCH: 02/60 | iter = 45.4s | total = 1.52m || TRAIN: acc = 43.0 | loss = 2.035 | time = 42.2s || VALID: acc = 48.5 | loss = 1.981 | time = 3.3s || 
exp5-sgdsched-4 (vary: anneal) 	 || EPOCH: 03/60 | iter = 45.4s | total = 2.27m || TRAIN: acc = 47.8 | loss = 1.987 | time = 42.1s || VALID: acc = 52.3 | loss = 1.945 | time = 3.3s || 
exp5-sgdsched-4 (vary: anneal) 	 || EPOCH: 04/60 | iter = 45.5s | total = 3.03m || TRAIN: acc = 51.1 | loss = 1.954 | time = 42.2s || VALID: acc = 55.6 | loss = 1.909 | time = 3.3s || 
exp5-sgdsched-4 (vary: anneal) 	 || EPOCH: 05/60 | iter = 45.3s | total = 3.79m || TRAIN: acc = 53.6 | loss = 1.931 | time = 42.1s || VALID: acc = 56.1 | loss = 1.905 | time = 3.3s || 
exp5-sgdsched-4 (vary: anneal) 	 || EPOCH: 06/60 | iter = 45.5s | total = 4

Unnamed: 0,exp_begin,model_id,epoch,train_loss,train_acc,train_time,valid_loss,valid_acc,valid_time,test_loss,test_acc,test_time,exp_name,act,skip,bn_layer,bn_act,optim,lr,num_epoch,seed,sched,stat_file,model_file
0,02:45:54,0,1,2.147925,30.892743,41.873250,2.031472,43.957674,3.362828,1.837847,62.568359,0.759630,exp5-sgdsched,silu,2,all,before,sgd,0.1,60,3456,const,output/exp5-sgdsched_model-0_stat.csv,output/exp5-sgdsched_model-0_model.pt
1,02:45:54,0,2,2.035267,42.833280,42.413561,1.980490,48.931962,3.265342,1.837847,62.568359,0.759630,exp5-sgdsched,silu,2,all,before,sgd,0.1,60,3456,const,output/exp5-sgdsched_model-0_stat.csv,output/exp5-sgdsched_model-0_model.pt
2,02:45:54,0,3,1.986746,47.824488,42.066447,1.950785,51.770174,3.280089,1.837847,62.568359,0.759630,exp5-sgdsched,silu,2,all,before,sgd,0.1,60,3456,const,output/exp5-sgdsched_model-0_stat.csv,output/exp5-sgdsched_model-0_model.pt
3,02:45:54,0,4,1.951734,51.383871,42.413907,1.902116,56.240111,3.257626,1.837847,62.568359,0.759630,exp5-sgdsched,silu,2,all,before,sgd,0.1,60,3456,const,output/exp5-sgdsched_model-0_stat.csv,output/exp5-sgdsched_model-0_model.pt
4,02:45:54,0,5,1.921464,54.602781,42.137986,1.916794,54.430380,3.302676,1.837847,62.568359,0.759630,exp5-sgdsched,silu,2,all,before,sgd,0.1,60,3456,const,output/exp5-sgdsched_model-0_stat.csv,output/exp5-sgdsched_model-0_model.pt
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
295,06:11:46,4,56,1.723793,74.120045,42.580025,1.712535,74.950554,3.293361,1.844863,61.445312,0.741636,exp5-sgdsched,silu,2,all,before,sgd,0.1,60,3456,anneal,output/exp5-sgdsched_model-4_stat.csv,output/exp5-sgdsched_model-4_model.pt
296,06:11:46,4,57,1.721523,74.248322,42.864576,1.710830,75.039557,3.334551,1.844863,61.445312,0.741636,exp5-sgdsched,silu,2,all,before,sgd,0.1,60,3456,anneal,output/exp5-sgdsched_model-4_stat.csv,output/exp5-sgdsched_model-4_model.pt
297,06:11:46,4,58,1.722359,74.089674,42.918209,1.709222,75.454905,3.340949,1.844863,61.445312,0.741636,exp5-sgdsched,silu,2,all,before,sgd,0.1,60,3456,anneal,output/exp5-sgdsched_model-4_stat.csv,output/exp5-sgdsched_model-4_model.pt
298,06:11:46,4,59,1.720756,74.368207,42.495395,1.709994,75.059335,3.369193,1.844863,61.445312,0.741636,exp5-sgdsched,silu,2,all,before,sgd,0.1,60,3456,anneal,output/exp5-sgdsched_model-4_stat.csv,output/exp5-sgdsched_model-4_model.pt


In [None]:
# run adam with constant LR for comparison
exp_config = dict(
    name            = 'exp5-adamconst',
    param   = dict(
        sched       = ['const']
    ),
    const = dict( 
        act         = 'silu', 
        skip        = 2,
        bn_layer    = 'all', 
        bn_act      = 'before',
        optim       = 'adam',
        lr          = 1e-3, 
        num_epoch   = EPOCHS,
        seed        = SEED
    )
)
run_experiment(exp_config, device, data_loaders, out_path='output')


Main:   0%|                                                                                                   …

ID=0:   0%|                                                                                                   …

exp5-adamconst-0 (vary: const) 	 || EPOCH: 01/60 | iter = 47.0s | total = 0.78m || TRAIN: acc = 41.6 | loss = 2.043 | time = 43.6s || VALID: acc = 52.3 | loss = 1.938 | time = 3.3s || 
exp5-adamconst-0 (vary: const) 	 || EPOCH: 02/60 | iter = 46.8s | total = 1.56m || TRAIN: acc = 52.9 | loss = 1.932 | time = 43.5s || VALID: acc = 59.1 | loss = 1.875 | time = 3.3s || 
exp5-adamconst-0 (vary: const) 	 || EPOCH: 03/60 | iter = 46.8s | total = 2.34m || TRAIN: acc = 57.1 | loss = 1.890 | time = 43.5s || VALID: acc = 61.3 | loss = 1.851 | time = 3.4s || 
exp5-adamconst-0 (vary: const) 	 || EPOCH: 04/60 | iter = 46.7s | total = 3.12m || TRAIN: acc = 60.2 | loss = 1.860 | time = 43.3s || VALID: acc = 63.6 | loss = 1.826 | time = 3.4s || 
exp5-adamconst-0 (vary: const) 	 || EPOCH: 05/60 | iter = 46.9s | total = 3.90m || TRAIN: acc = 62.4 | loss = 1.838 | time = 43.5s || VALID: acc = 66.1 | loss = 1.802 | time = 3.4s || 
exp5-adamconst-0 (vary: const) 	 || EPOCH: 06/60 | iter = 47.0s | total = 4

Unnamed: 0,exp_begin,model_id,epoch,train_loss,train_acc,train_time,valid_loss,valid_acc,valid_time,test_loss,test_acc,test_time,exp_name,act,skip,bn_layer,bn_act,optim,lr,num_epoch,seed,sched,stat_file,model_file
0,06:57:22,0,1,2.043378,41.564498,43.614804,1.937715,52.274525,3.345185,1.803288,65.957031,0.729932,exp5-adamconst,silu,2,all,before,adam,0.001,60,3456,const,output/exp5-adamconst_model-0_stat.csv,output/exp5-adamconst_model-0_model.pt
1,06:57:22,0,2,1.932327,52.904012,43.505244,1.875161,59.068434,3.309614,1.803288,65.957031,0.729932,exp5-adamconst,silu,2,all,before,adam,0.001,60,3456,const,output/exp5-adamconst_model-0_stat.csv,output/exp5-adamconst_model-0_model.pt
2,06:57:22,0,3,1.890165,57.091592,43.486237,1.851314,61.273734,3.356297,1.803288,65.957031,0.729932,exp5-adamconst,silu,2,all,before,adam,0.001,60,3456,const,output/exp5-adamconst_model-0_stat.csv,output/exp5-adamconst_model-0_model.pt
3,06:57:22,0,4,1.86015,60.179827,43.315954,1.82607,63.617484,3.375034,1.803288,65.957031,0.729932,exp5-adamconst,silu,2,all,before,adam,0.001,60,3456,const,output/exp5-adamconst_model-0_stat.csv,output/exp5-adamconst_model-0_model.pt
4,06:57:22,0,5,1.837748,62.432465,43.499665,1.801703,66.089794,3.376854,1.803288,65.957031,0.729932,exp5-adamconst,silu,2,all,before,adam,0.001,60,3456,const,output/exp5-adamconst_model-0_stat.csv,output/exp5-adamconst_model-0_model.pt
5,06:57:22,0,6,1.821816,63.936221,43.663832,1.796311,66.505142,3.349619,1.803288,65.957031,0.729932,exp5-adamconst,silu,2,all,before,adam,0.001,60,3456,const,output/exp5-adamconst_model-0_stat.csv,output/exp5-adamconst_model-0_model.pt
6,06:57:22,0,7,1.81304,64.85414,43.411788,1.792002,67.009494,3.363091,1.803288,65.957031,0.729932,exp5-adamconst,silu,2,all,before,adam,0.001,60,3456,const,output/exp5-adamconst_model-0_stat.csv,output/exp5-adamconst_model-0_model.pt
7,06:57:22,0,8,1.802175,65.855978,43.541416,1.773769,68.571994,3.39477,1.803288,65.957031,0.729932,exp5-adamconst,silu,2,all,before,adam,0.001,60,3456,const,output/exp5-adamconst_model-0_stat.csv,output/exp5-adamconst_model-0_model.pt
8,06:57:22,0,9,1.790269,67.15673,43.669981,1.765489,69.748813,3.382493,1.803288,65.957031,0.729932,exp5-adamconst,silu,2,all,before,adam,0.001,60,3456,const,output/exp5-adamconst_model-0_stat.csv,output/exp5-adamconst_model-0_model.pt
9,06:57:22,0,10,1.783094,67.837676,43.439249,1.759057,70.193829,3.342194,1.803288,65.957031,0.729932,exp5-adamconst,silu,2,all,before,adam,0.001,60,3456,const,output/exp5-adamconst_model-0_stat.csv,output/exp5-adamconst_model-0_model.pt


# 3 - Visualization

In [None]:
import plotly.express as px
from src.plot_utils import *

In [None]:
%%bash 
echo '+ Experiment configuration files:'
ls output/*exp_conf.yml
echo '-------------------------------'
echo '+ Aggregated stat data files:'
ls output/*exp_stat.csv

+ Experiment configuration files:
output/exp1-actfun_exp_conf.yml
output/exp2-skip_exp_conf.yml
output/exp3-bn_exp_conf.yml
output/exp4-optim_exp_conf.yml
output/exp5-adamconst_exp_conf.yml
output/exp5-sgdsched_exp_conf.yml
-------------------------------
+ Aggregated stat data files:
output/exp1-actfun_exp_stat.csv
output/exp2-skip_exp_stat.csv
output/exp3-bn_exp_stat.csv
output/exp4-optim_exp_stat.csv
output/exp5-adamconst_exp_stat.csv
output/exp5-sgdsched_exp_stat.csv


## 3.1 Experiment 1: Activation functions

In [None]:
!cat output/exp1-actfun_exp_conf.yml

const:
  bn_act: before
  bn_layer: none
  lr: 0.001
  num_epoch: 15
  optim: adam
  sched: const
  seed: 3456
  skip: 0
name: exp1-actfun
param:
  act:
  - sigmoid
  - tanh
  - relu
  - elu
  - lrelu
  - silu


In [None]:
exp_file_pref = 'output/exp1-actfun'

df1, conf, param_keys = load_data(exp_file_pref)
construct_expvar(df1, param_keys)

plot_benchmark(df1, 
               main_title = 'Variation of activation functions',
               variation_label =  '-'.join(param_keys))

## 3.2. Experiment 2: Skip connections

In [None]:
!cat output/exp2-skip_exp_conf.yml

const:
  bn_act: before
  bn_layer: none
  lr: 0.001
  num_epoch: 15
  optim: adam
  sched: const
  seed: 3456
name: exp2-skip
param:
  act:
  - tanh
  - silu
  skip:
  - 1
  - 2


In [None]:
df1 = df1.query('act == "silu" | act == "tanh"') # no skip runs in exp1 
df2, conf, param_keys = load_data('output/exp2-skip')
df2 = pd.concat([df1, df2], ignore_index=True)

construct_expvar(df2, param_keys)

plot_benchmark(df2, 
               main_title = 'Variation of skip connections with two best activation functions',
               variation_label =  '-'.join(param_keys))


## 3.3. Experiment 3: Batch-normalization

In [None]:
!cat output/exp3-bn_exp_conf.yml

const:
  act: silu
  lr: 0.001
  num_epoch: 15
  optim: adam
  sched: const
  seed: 3456
  skip: 2
name: exp3-bn
param:
  bn_act:
  - before
  - after
  bn_layer:
  - conv
  - fc
  - all


In [None]:
df2 = df2.query('skip == 2 & act == "silu"') # skip = 2 with same act (silu) is in df2
df3, conf, param_keys = load_data('output/exp3-bn')
df3 = pd.concat([df3, df2], ignore_index=True)
construct_expvar(df3, param_keys)
df3.replace({'exp_var': '.*none'}, {'exp_var': 'none'}, regex=True, inplace=True)

plot_benchmark(df3, 
               main_title = 'Variation of batchnormalization configuration',
               variation_label =  '-'.join(param_keys),
               layout_args = dict(width=1600, height = 900))

## 3.4 Experiment 4: Optimizer

In [None]:
!cat output/exp4-optim_exp_conf.yml

const:
  act: silu
  bn_act: before
  bn_layer: all
  num_epoch: 15
  sched: const
  seed: 3456
  skip: 2
name: exp4-optim
param:
  lr:
  - 0.001
  - 0.1
  optim:
  - sgd
  - adam


In [None]:
df4, conf, param_keys = load_data('output/exp4-optim')
construct_expvar(df4, param_keys)

plot_benchmark(df4, 
               main_title = 'Variation of batchnormalization configuration',
               variation_label =  '-'.join(param_keys))

## 3.5. Schedulers

In [None]:
!cat output/exp5-adamconst_exp_conf.yml

const:
  act: silu
  bn_act: before
  bn_layer: all
  lr: 0.001
  num_epoch: 60
  optim: adam
  seed: 3456
  skip: 2
name: exp5-adamconst
param:
  sched:
  - const


In [None]:
!cat output/exp5-sgdsched_exp_conf.yml

const:
  act: silu
  bn_act: before
  bn_layer: all
  lr: 0.1
  num_epoch: 60
  optim: sgd
  seed: 3456
  skip: 2
name: exp5-sgdsched
param:
  sched:
  - const
  - step
  - exp
  - reduce_on_plateau
  - anneal


In [None]:
df5_adam, _, _ = load_data('output/exp5-adamconst') # baseline const from adam
df5_main, conf, _ = load_data('output/exp5-sgdsched') # main variations in sgd
df5 = pd.concat([df5_adam, df5_main], ignore_index=True)
df5.replace({'sched': 'reduce_on_plateau'}, {'sched': 'rop'}, inplace=True)

param_keys = ['optim', 'sched']
construct_expvar(df5, param_keys)

plot_benchmark(df5, 
               main_title = 'Variation of batchnormalization configuration',
               variation_label =  '-'.join(param_keys),
               layout_args = dict(width=1600, height = 900))