# Training Demo

In this notebook we will run training script for the work [*Unsupervised Change Detection of Extreme Events Using ML On-Board*](http://arxiv.org/abs/2111.02995). This work was conducted at the [FDL Europe 2021](https://fdleurope.org/fdl-europe-2021) research accelerator program. 

**These instructions are meant to work on your local machine** (we don't use the Google Colab environment)

*Note that in practice this takes long time, so this should serve only as an orientational demo.*

## 1 Preparation

- Get the dataset (for this demo we also provide a tiny training dataset subset - see below)

- For better visualizations log into weights and biases with: wandb init



## 2 Libraries

**Run these:**

```
make requirements
conda activate ravaen_env
conda install nb_conda
jupyter notebook
# start this notebook
```

In [None]:
!pip install --quiet --upgrade gdown

In [None]:
!conda info | grep 'active environment'

     active environment : ravaen_env


In [None]:
!nvidia-smi

Tue Mar  1 20:35:33 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01   Driver Version: 470.103.01   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   48C    P8     3W /  N/A |    256MiB /  7982MiB |     10%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+---------------------------------------------------------------------------

In [None]:
# The official training dataset is much larger, for the purpose of the demo, we provide a small subset:
!gdown https://drive.google.com/uc?id=1rl3Clf0c7HlXnlPXO837Pjr2iCjwak0Y -O train_minisubset.zip
!unzip -q train_minisubset.zip
!rm train_minisubset.zip

**Edit the paths in config/config.yaml**

```
log_dir: "/home/<USER>/results"
cache_dir: "/home/<USER>/cache"
```

In [None]:
!cat config/config.yaml
"""
Fill in:
log_dir: "/home/<USER>/results"
cache_dir: "/home/<USER>/cache"
"""
pass

---
entity: "mlpayloads"

log_dir: "/home/vitek/fdl_tmp/results"
cache_dir: "/home/vitek/fdl_tmp/cache"



In [None]:
# ===== Parameters to adjust =====
epochs = 100
dataset_root_folder = "<where we downloaded the data>/train_minisubset"
dataset="alpha_multiscene_tiny" # for the demo, for the full training dataset we would use: dataset="alpha_multiscene"

name="VAE_128small" # note "small" uses these settings > module.model_cls_args.latent_dim=128 module.model_cls_args.extra_depth_on_scale=0 module.model_cls_args.hidden_channels=[16,32,64]

# ===== Parameters to keep the same ======
training="simple_vae"
module="deeper_vae"

# ========================================

!python3 -m scripts.train_model +dataset=$dataset ++dataset.root_folder="{dataset_root_folder}" \
         +normalisation=log_scale +channels=high_res +training=$training +module=$module +project=train_VAE_128small +name="{name}" \
         module.model_cls_args.latent_dim=128 module.model_cls_args.extra_depth_on_scale=0 module.model_cls_args.hidden_channels=[16,32,64] \
         training.epochs=$epochs


Global seed set to 42

LATENT SPACE size: 128
  rank_zero_warn(
ModelCheckpoint(save_last=True, save_top_k=-1, monitor=None) will duplicate the last checkpoint saved.
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
Using native 16bit precision.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[34m[1mwandb[0m: Currently logged in as: [33mmlpayloads[0m (use `wandb login --relogin` to force relogin)
[34m[1mwandb[0m: wandb version 0.12.11 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade
[34m[1mwandb[0m: Tracking run with wandb version 0.12.10
[34m[1mwandb[0m: Syncing run [33msrc.models.module.Module/train_VAE_128small[0m
[34m[1mwandb[0m: ⭐️ View project at [34m[4mhttps://wandb.ai/mlpayloads/train_VAE_128small[0m
[34m[1mwandb[0m: 🚀 View run at [34m[4mhttps://wandb.ai/mlpayloads/train_VAE_128small/runs/3leewg5y[0m
[34m[1mwandb[0m: Run data is saved locally in /home/vitek/fdl_tmp/results/wandb/run-202203

### More advanced settings:

See the possible options using --help and then looking at the individual configuration files.

In [None]:
!python3 -m scripts.train_model --help

train_model is powered by Hydra.

== Configuration groups ==
Compose your configuration from those groups (group=option)

channels: all, high_res, high_res_phisat2overlap, rgb, rgb_nir, rgb_nir_b11, rgb_nir_b11_b12_landsat, rgb_nir_b12
dataset: alpha_multiscene, alpha_multiscene_tiny, alpha_singlescene, dataloader_test, eval, fire, fires, floods_evaluation, hurricanes, landslides, landslides_2, oilspills, preliminary, preliminary_da, preliminary_multiscene, preliminary_sequential, preliminary_sequential_bigger, preliminary_sequential_bigger_9k, preliminary_sequential_bigger_multiEval, preliminary_sequential_bigger_multiEval_Germany, samples_for_gui, temporal_analysis, volcanos
evaluation: ae_base, ae_fewer, vae_base, vae_da, vae_da_8px, vae_fewer, vae_paper
module: deeper_ae, deeper_ae_bigger_latent, deeper_vae, grx, simple_ae, simple_ae_with_linear, simple_vae
normalisation: log_scale, none
training: da, simple_ae, simple_vae
transform: eval_da, eval_da_8px, eval_nda, eval_

In [None]:
# to see the detiled options for "training: da, simple_ae, simple_vae"
!cat config/training/simple_vae.yaml
# for example we would then set epochs with adding this to the main command:
# training.epochs=1

---
gpus: -1
epochs: 400
grad_batches: 1
distr_backend: 'dp'
use_amp: true # ... true = 16 precision / false = 32 precision

# The check_val_every_n_epoch and val_check_interval settings overlap, see:
#     https://github.com/PyTorchLightning/pytorch-lightning/issues/6385
val_check_interval: 0.2  # either in to check after that many batches or float to check that fraction of epoch
check_val_every_n_epoch: 1 

fast_dev_run: false

num_workers: 16

batch_size_train: 256
batch_size_valid: 256
batch_size_test: 256

lr: 0.001
weight_decay: 0.0
# scheduler_gamma: 0.95

# auto_batch_size: 'binsearch'
#auto_lr: 'lr'
