# Training Demo

In this notebook we will run training script for the work [*Unsupervised Change Detection of Extreme Events Using ML On-Board*](http://arxiv.org/abs/2111.02995). This work was conducted at the [FDL Europe 2021](https://fdleurope.org/fdl-europe-2021) research accelerator program. 

**These instructions are meant to work on your local machine** (we don't use the Google Colab environment)

*Note that in practice this takes long time, so this should serve only as an orientational demo.*

## 1 Preparation

- Get the dataset (for this demo we also provide a tiny training dataset subset - see below)

- For better visualizations log into weights and biases with: wandb init



In [3]:
!pwd

/home/lucap/l46/l46-project/RaVAEn-master


## 2 Libraries

**Run these:**

```
make requirements
conda activate ravaen_env
conda install nb_conda
jupyter notebook
# start this notebook
```

In [4]:
!pip install --quiet --upgrade gdown

In [5]:
!conda info | grep 'active environment'

     active environment : ravaen_env


In [6]:
!nvidia-smi

Sun Dec 24 13:45:12 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.04              Driver Version: 546.17       CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA GeForce RTX 3060 ...    On  | 00000000:01:00.0  On |                  N/A |
| N/A   37C    P8              10W /  80W |    494MiB /  6144MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                         

In [4]:
# The official training dataset is much larger, for the purpose of the demo, we provide a small subset:
!gdown https://drive.google.com/uc?id=1rl3Clf0c7HlXnlPXO837Pjr2iCjwak0Y -O train_minisubset.zip
!unzip -q train_minisubset.zip
!rm train_minisubset.zip

Downloading...
From (uriginal): https://drive.google.com/uc?id=1rl3Clf0c7HlXnlPXO837Pjr2iCjwak0Y
From (redirected): https://drive.google.com/uc?id=1rl3Clf0c7HlXnlPXO837Pjr2iCjwak0Y&confirm=t&uuid=8b072281-51b9-4cfd-86b2-ea2bf201629d
To: /home/lucap/l46/l46-project/RaVAEn-master/notebooks/train_minisubset.zip
100%|████████████████████████████████████████| 658M/658M [00:30<00:00, 21.7MB/s]


**Change working directory to RaVAEn-master**

In [9]:
import os
os.chdir('/home/lucap/l46/l46-project/RaVAEn-master')

**Edit the paths in config/config.yaml**

```
log_dir: "/home/<USER>/results"
cache_dir: "/home/<USER>/cache"
```

In [10]:
!cat config/config.yaml
"""
Fill in:
log_dir: "/home/<USER>/results"
cache_dir: "/home/<USER>/cache"
"""
pass

---
entity: "mlpayloads"

log_dir: "/home/lucap/l46/l46-project/RaVAEn-master/outputs/results"
cache_dir: "/home/lucap/l46/l46-project/RaVAEn-master/outputs/cache"


In [18]:
# ===== Parameters to adjust =====
epochs = 100
dataset_root_folder = "/home/lucap/l46/l46-project/RaVAEn-master/notebooks/train_minisubset"
dataset="alpha_multiscene_tiny" # for the demo, for the full training dataset we would use: dataset="alpha_multiscene"

name="VAE_128small" # note "small" uses these settings > module.model_cls_args.latent_dim=128 module.model_cls_args.extra_depth_on_scale=0 module.model_cls_args.hidden_channels=[16,32,64]

# ===== Parameters to keep the same ======
training="simple_vae"
module="deeper_vae"

# ========================================

!python3 -m scripts.train_model +dataset=$dataset ++dataset.root_folder="{dataset_root_folder}" \
         +normalisation=log_scale +channels=high_res +training=$training +module=$module +project=train_VAE_128small +name="{name}" \
         module.model_cls_args.latent_dim=128 module.model_cls_args.extra_depth_on_scale=0 module.model_cls_args.hidden_channels=[16,32,64] \
         training.epochs=$epochs


Global seed set to 42

LATENT SPACE size: 128
  rank_zero_warn(
ModelCheckpoint(save_last=True, save_top_k=-1, monitor=None) will duplicate the last checkpoint saved.
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
Using native 16bit precision.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
NVIDIA GeForce RTX 3060 Laptop GPU with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
If you want to use the NVIDIA GeForce RTX 3060 Laptop GPU GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

[34m[1mwandb[0m: (1) Create a W&B account
[34m[1mwandb[0m: (2) Use an existing W&B account
[34m[1mwandb[0m: (3) Don't visualize my results
[34m[1mwandb[0m: Enter your choice: ^C
Traceback (most recent call last):
  File "/home/lucap/anaconda3/envs/ravaen_env/lib/python3.9/runpy.py", line 197, in _run_m

In [None]:
python3 -m scripts.train_model +dataset="alpha_multiscene_tiny" ++dataset.root_folder="/home/lucap/l46/l46-project/RaVAEn-master/notebooks/train_minisubset" +normalisation=log_scale +channels=high_res +training="simple_vae" +module="deeper_vae" +project=train_VAE_128small +name="VAE_128small" module.model_cls_args.latent_dim=128 module.model_cls_args.extra_depth_on_scale=0 module.model_cls_args.hidden_channels=[16,32,64] training.epochs=100

### More advanced settings:

See the possible options using --help and then looking at the individual configuration files.

In [10]:
!python3 -m scripts.train_model --help

/home/lucap/anaconda3/envs/ravaen_env/bin/python3: Error while finding module specification for 'scripts.train_model' (ModuleNotFoundError: No module named 'scripts')


In [None]:
# to see the detiled options for "training: da, simple_ae, simple_vae"
!cat config/training/simple_vae.yaml
# for example we would then set epochs with adding this to the main command:
# training.epochs=1

---
gpus: -1
epochs: 400
grad_batches: 1
distr_backend: 'dp'
use_amp: true # ... true = 16 precision / false = 32 precision

# The check_val_every_n_epoch and val_check_interval settings overlap, see:
#     https://github.com/PyTorchLightning/pytorch-lightning/issues/6385
val_check_interval: 0.2  # either in to check after that many batches or float to check that fraction of epoch
check_val_every_n_epoch: 1 

fast_dev_run: false

num_workers: 16

batch_size_train: 256
batch_size_valid: 256
batch_size_test: 256

lr: 0.001
weight_decay: 0.0
# scheduler_gamma: 0.95

# auto_batch_size: 'binsearch'
#auto_lr: 'lr'
