# Getting started with Clara Train SDK V4.0 PyTorch using MONAI 
Clara Train SDK simply allows researcher to train AI models using configuration files. 
It is simple to use, modular and flexible. Allowing researchers to focus on innovation, 
while leaving acceleration and performance issue for NVIDIA's engineers. 

Clara Train SDK consists of different modules as shown below 
<br><img src="screenShots/TrainBlock.png" alt="Drawing" style="height: 600px;"/><br>
   
By the end of this notebook you will:
1. Understand components of [Medical Model ARchive (MMAR)](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v4.0/nvmidl/mmar.html)
2. Know how to configure train config json to train a CNN
3. Train a CNN with single and multiple GPUs
4. Fine tune a model
5. Export a model 
6. Perform inference on testing dataset 


## Prerequisites
- Nvidia GPU with 8Gb of memory   


## Resources
You could watch the free GTC 2021 talks covering Clara Train SDK
- [Clara Train 4.0 - 101 Getting Started [SE2688]](https://gtc21.event.nvidia.com/media/Clara%20Train%204.0%20-%20101%20Getting%20Started%20%5BSE2688%5D/1_0qgfrql2)
- [Clara Train 4.0 - 201 Federated Learning [SE3208]](https://gtc21.event.nvidia.com/media/Clara%20Train%204.0%20-%20201%20Federated%20Learning%20%5BSE3208%5D/1_m48t6b3y)
- [What’s New in Clara Train 4.0 [D3114]](https://gtc21.event.nvidia.com/media/What%E2%80%99s%20New%20in%20Clara%20Train%204.0%20%5BD3114%5D/1_umvjidt2)
- [Take Medical AI from Concept to Production using Clara Imaging [S32482]](https://gtc21.event.nvidia.com/media/Take%20Medical%20AI%20from%20Concept%20to%20Production%20using%20Clara%20Imaging%20%20%5BS32482%5D/1_6bvnvyg7)
- [Federated Learning for Medical AI [S32530]](https://gtc21.event.nvidia.com/media/Federated%20Learning%20for%20Medical%20AI%20%5BS32530%5D/1_z26u15uk)
- [Get Started Now on Medical Imaging AI with Clara Train on Google Cloud Platform [S32518]](https://gtc21.event.nvidia.com/media/Get%20Started%20Now%20on%20Medical%20Imaging%20AI%20with%20Clara%20Train%20on%20Google%20Cloud%20Platform%20%5BS32518%5D/1_2yjdekmi)
- [Automate 3D Medical Imaging Segmentation with AutoML and Neural Architecture Search [S32083]](https://gtc21.event.nvidia.com/media/Automate%203D%20Medical%20Imaging%20Segmentation%20with%20AutoML%20and%20Neural%20Architecture%20Search%20%5BS32083%5D/1_r5swh2jn)
- [A Platform for Rapid Development and Clinical Translation of ML Models for Applications in Radiology at UCSF [S31619]](https://gtc21.event.nvidia.com/media/A%20Platform%20for%20Rapid%20Development%20and%20Clinical%20Translation%20of%20ML%20Models%20for%20Applications%20in%20Radiology%20at%20UCSF%20%5BS31619%5D/1_oz8qop5a)


# 1. Background

Clara Train is built using a component-based architecture with using components from [MONAI](https://monai.io/) :
MONAI’s [training workflows](https://docs.monai.io/en/latest/highlights.html#workflows) 
are based off of [PyTorch Ignite’s engine](https://pytorch.org/ignite/engine.html). 
Below is a list of different components used:
- Training Data Pipeline
- Validation Data Pipeline
- [Applications](https://docs.monai.io/en/latest/apps.html)
- [Transforms](https://docs.monai.io/en/latest/transforms.html)
- [Data](https://docs.monai.io/en/latest/data.html)
- [Engines](https://docs.monai.io/en/latest/engines.html)
- [Inference methods](https://docs.monai.io/en/latest/inferers.html)
- [Event handlers](https://docs.monai.io/en/latest/handlers.html)
- [Network architectures](https://docs.monai.io/en/latest/networks.html)
- [Loss functions](https://docs.monai.io/en/latest/losses.html)
- [Optimizers](https://docs.monai.io/en/latest/optimizers.html)
- [Metrics](https://docs.monai.io/en/latest/metrics.html)
- [Visualizations](https://docs.monai.io/en/latest/visualize.html)
- [Utilities](https://docs.monai.io/en/latest/utils.html)   


# Lets get started
Before we get started lets check that we have an NVIDIA GPU available in the docker by running the cell below

In [21]:
# following command should show all gpus available 
!nvidia-smi

Wed Aug  4 11:33:24 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-SXM2...  Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   40C    P0    42W / 300W |      0MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

Next cell defines functions that we will use throughout the notebook

In [2]:
MMAR_ROOT="/claraDevDay/GettingStarted/"
print ("setting MMAR_ROOT=",MMAR_ROOT)
%ls $MMAR_ROOT

!chmod 777 $MMAR_ROOT/commands/*
def printFile(filePath,lnSt,lnEnd):
    print ("showing ",str(lnEnd-lnSt)," lines from file ",filePath, "starting at line",str(lnSt))
    !< $filePath head -n "$lnEnd" | tail -n +"$lnSt"
 

setting MMAR_ROOT= /claraDevDay/GettingStarted/
BYOC.ipynb            [0m[01;34mcommands[0m/  [01;34mcustom[0m/  [01;34meval[0m/    [01;34mresources[0m/
GettingStarted.ipynb  [01;34mconfig[0m/    [01;34mdocs[0m/    [01;34mmodels[0m/  [01;34mscreenShots[0m/


# 2. Medical Model ARchive (MMAR)
Clara Train SDK uses the [Medical Model ARchive (MMAR)](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v4.0/nvmidl/mmar.html). 
The MMAR defines a standard structure for organizing all artifacts produced during the model development life cycle. 
Clara Train SDK simple basic idea is to train using config file
 

**We recommend opening [config_train_Unet.json](config/config_train_Unet.json) and configuring your screen as shown below**
<br><img src="screenShots/MMAR.png" alt="Drawing" style="height: 400px;"/><br>


You can download sample models for different problems from [NGC](https://ngc.nvidia.com/catalog/models?orderBy=modifiedDESC&pageNumber=0&query=clara&quickFilter=&filters=) <br> 
All MMAR follow the structure provided in this Notebook. if you navigate to the parent folder structure it should contain the following subdirectories
```
./GettingStarted 
├── commands
├── config
├── docs
├── eval
├── models
└── resources
```

* `commands` contains a number of ready-to-run scripts for:
    - training
    - training with multiple GPU
    - fine tune
    - fine tune with multiple GPU
    - validation
    - validation with multiple GPU
    - inference (testing)
    - exporting models in TensorRT Inference Server format
* `config` contains configuration files (in JSON format) for eac training, 
validation, and deployment for [AI-assisted annotation](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v4.0/aiaa/index.html) 
(_Note:_ these configuration files are used in the scripts under the `commands` folder)
* `docs` contains local documentation for the model, but for a more complete view it is recommended you visit the NGC model page
* `eval` is used as the output directory for model evaluation (by default)
* `models` is where the PyTorch checkpoint model is stored, and the corresponding graph definition files.
* `resources` currently contains the logger configuration in `log.config` file

Some of the most important files you will need to understand to configure and use Clara Train SDK are

1. `environment.json` which has important common parameters to set the path for 
    * `DATA_ROOT` is the root folder where the data with which we would like to train, validate, or test resides in
    * `DATASET_JSON` expects the path to a JSON-formatted file 
    * `MMAR_CKPT_DIR` the path to the where the PyTorch checkpoint files reside
    * `MMAR_EVAL_OUTPUT_PATH` the path to output evaluation metrics for the neural network during training, validation, and inference
    * `PROCESSING_TASK` the type of processing task the neural net is intended to perform (currently limited to `annotation`, `segmentation`, `classification`)
    * `PRETRAIN_WEIGHTS_FILE` (_optional_) 	determines the location of the pre-trained weights file; if the file does not exist and is needed, 
    the training program will download it from predefined URL from the web


In [3]:
printFile(MMAR_ROOT+"/config/environment.json",0,30)


showing  30  lines from file  /claraDevDay/GettingStarted//config/environment.json starting at line 0
{
    "DATA_ROOT": "/PyTorch/NoteBooks/Data/sampleData/",
    "DATASET_JSON": "/PyTorch/NoteBooks/Data/sampleData/dataset.json",
    "PROCESSING_TASK": "segmentation",
    "MMAR_EVAL_OUTPUT_PATH": "eval",
    "MMAR_CKPT_DIR": "models",
    "MMAR_CKPT": "models/model.pt",
    "MMAR_TORCHSCRIPT": "models/model.ts"
}


# 3. Config.json Main Concepts 


`config_train.json` contains all the parameters necessary to define the neural net, 
how is it trained (training hyper-parameters, loss, etc.), 
pre- and post-transformation functions necessary to modify and/or augment the data before input to the neural net, etc. 
The complete documentation on the training configuration is laid out 
[here](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v4.0/nvmidl/appendix/configuration.html#training-configuration).
Configuration file defines all training related configurations. 
This is were most the researcher would spent most of his time.

Lets take some time to examine each part of it.  


## 3.1. Global configurations 

In [4]:
confFile=MMAR_ROOT+"/config/config_train_Unet.json"
printFile(confFile,0,9)


showing  9  lines from file  /claraDevDay/GettingStarted//config/config_train_Unet.json starting at line 0
{
  "epochs": 2,
  "multi_gpu": false,
  "amp": true,
  "num_interval_per_valid": 1,
  "learning_rate": 2e-4,
  "tf32": false,
  "determinism": {
    "random_seed": 0


## 3.2. Training configurations section 
This section includes:
1. Loss functions:
```
    "loss": {
      "name": "DiceLoss",
      "args":{
        "to_onehot_y": true,
        "softmax": true
      }
    },
```

2. Optimizer
```
    "optimizer": {
      "name": "Adam",
      "args": {
        "lr": "{learning_rate}"
      }
    },
```


3. Learning rate scheduler
```
    "lr_scheduler": {
      "name": "StepLR",
      "args": {
        "step_size": 5000,
        "gamma": 0.1
      }
    },
```


4. Network architecture
```
    "model": {
      "name": "UNet",
      "args": {
        "dimensions": 3,
        "in_channels": 1,
        "out_channels": 2,
        "channels": [16, 32, 64, 128, 256],
        "strides": [2, 2, 2, 2],
        "num_res_units": 2,
        "norm": "batch"
      }
    },
```


5. Pre-transforms
    1. Loading transformations
    ```
        {
        "name": "LoadImaged",
        "args": {
          "keys": ["image", "label"]
        }
      },
    ```
    2. Ensure channel first Transformation
    ```
      {
        "name": "EnsureChannelFirstd",
        "args": {
          "keys": ["image", "label"]
        }
      },    
    ```
    2. Resample Transformation
    ```
      {
        "name": "Spacingd",
        "args": {
            "keys": ["image", "label"],
            "pixdim": [1.0, 1.0, 1.0],
            "mode":["bilinear", "nearest"]
        }
      },    
    ```
    5. Intensity Transforms
    ```
      {
        "name": "ScaleIntensityRanged",
        "args": {
          "keys": "image",
          "a_min": -57,
          "a_max": 164,
          "b_min": 0.0,
          "b_max": 1.0,
          "clip": true
        }
      },    
    ```
    3. Cropping transformations
    ```
      {
        "name": "RandCropByPosNegLabeld",
        "args": {
          "keys": ["image", "label"],
          "label_key": "label",
          "spatial_size": [96, 96, 96],
          "pos": 1,
          "neg": 1,
          "num_samples": 4,
          "image_key": "image",
          "image_threshold": 0
        }
      },    
    ```
    4. Deformable transformations
    ```
    ```
    6. Augmentation Transforms
    ```
      {
        "name": "RandShiftIntensityd",
        "args": {
          "keys": "image",
          "offsets": 0.1,
          "prob": 0.5
        }
      },    
    ```
    7. Special transforms 
    ```
      {
        "name": "ToTensord",
        "args": {
          "keys": ["image", "label"]
        }
      }    
    ```


6. DataSet to use 
```
    "dataset": {
      "name": "CacheDataset",
      "data_list_file_path": "{DATASET_JSON}",
      "data_file_base_dir": "{DATA_ROOT}",
      "data_list_key": "training",
      "args": {
        "cache_num": 4,
        "cache_rate": 1.0,
        "num_workers": 2
      }
    },
```


7. DataLoader
```
    "dataloader": {
      "name": "DataLoader",
      "args": {
        "batch_size": 2,
        "shuffle": true,
        "num_workers": 4
      }
    },
```


8. inferer
```
    "inferer": {
      "name": "SimpleInferer"
    },
```


9. Handlers
There can be may handlers as:
1. CheckpointLoader
2. LrScheduleHandler
3. ValidationHandler
4. CheckpointSaver
5. StatsHandler
6. TensorBoardStatsHandler
```
    "handlers": [
      {
        "name": "CheckpointLoader",
        "disabled": "{dont_load_ckpt_model}",
        "args": {
          "load_path": "{MMAR_CKPT}",
          "load_dict": ["model"]
        }
      },
      {
        "name": "LrScheduleHandler",
        "args": {
          "print_lr": true
        }
      },
      {
        "name": "ValidationHandler",
        "args": {
          "interval": "{num_interval_per_valid}",
          "epoch_level": true
        }
      },
      {
        "name": "CheckpointSaver",
        "rank": 0,
        "args": {
          "save_dir": "{MMAR_CKPT_DIR}",
          "save_dict": ["model", "optimizer", "lr_scheduler"],
          "save_final": true,
          "save_interval": 5
        }
      },
      {
        "name": "StatsHandler",
        "rank": 0,
        "args": {
          "tag_name": "train_loss",
          "output_transform": "lambda x: x['loss']"
        }
      },
      {
        "name": "TensorBoardStatsHandler",
        "rank": 0,
        "args": {
          "log_dir": "{MMAR_CKPT_DIR}",
          "tag_name": "train_loss",
          "output_transform": "lambda x: x['loss']"
        }
      }
    ],
```


10. Post transforms
    1. Activations 
    2. Change to oneHot 
```
```

11. Metric
```
```


## 3.3. Validation config 
This contains sub sections very similar to the ones in the training section including:
1. Metric 
2. pre-transforms. Since these transforms are usually a subset from the pre-transforms in the training section, 
we can use the alias to point to these transforms by name as ` "ref": "LoadNifti"`. 
In case we use 2 transforms with the same name as `ScaleByResolution` 
we can give each an alias to refer to as `"name": "ScaleByResolution#ScaleImg"` 
then refer to it in the validation section as `ScaleImg` 
3. Image pipeline
4. Inference

In [5]:
printFile(confFile,214,250)


showing  36  lines from file  /claraDevDay/GettingStarted//config/config_train_Unet.json starting at line 214
      "args": {
        "max_epochs": "{epochs}"
      }
    }
  },
  "validate": {
    "pre_transforms": [
      {
        "ref": "LoadImaged"
      },
      {
        "ref": "Spacingd"
      },
      {
        "ref": "EnsureChannelFirstd"
      },
      {
        "ref": "ScaleIntensityRanged"
      },
      {
        "ref": "CropForegroundd"
      },
      {
        "ref": "ToTensord"
      }
    ],
    "dataset": {
      "name": "CacheDataset",
      "data_list_file_path": "{DATASET_JSON}",
      "data_file_base_dir": "{DATA_ROOT}",
      "data_list_key": "validation",
      "args": {
        "cache_num": 4,
        "cache_rate": 1.0,
        "num_workers": 2
      }
    },


# 4. Training your first Network

## 4.1 Start TensorBoard 
Before we start training or while the network is training, 
you can monitor its accuracy using tensorboard in side jupyter lab as shown below 
 <br><img src="screenShots/TensorBoard.png" alt="Drawing" style="height: 300px;"/><br>


## 4.2 Training script 
We have renamed `train.sh` to `train_W_Config` as we modified it to accept parameters with the config to use

Let's take a look at `train_W_Config.sh` by executing the following cell.

In [6]:
printFile(MMAR_ROOT+"/commands/train_W_Config.sh",0,30)

showing  30  lines from file  /claraDevDay/GettingStarted//commands/train_W_Config.sh starting at line 0
#!/usr/bin/env bash

# SPDX-License-Identifier: Apache-2.0

clear
echo running cmd $0 $1 $2 $3
CONFIG_FILE_NAME=$1
GPU2USE=$2

my_dir="$(dirname "$0")"
. $my_dir/set_env.sh
echo "MMAR_ROOT set to $MMAR_ROOT"

CONFIG_FILE=config/$CONFIG_FILE_NAME
ENVIRONMENT_FILE=config/environment.json

########################################### check on arguments
if [[ -z  CONFIG_FILE_NAME  ]] ;then
   echo Need to pass in config.json
   exit
fi
if [[ -z  $GPU2USE  ]] ;then
   GPU2USE=0
fi
export CUDA_VISIBLE_DEVICES=$GPU2USE
echo ------------------------------------
MMAR_CKPT_DIR=models/${CONFIG_FILE_NAME::-5} #remove .json from file name
if [ -d "$MMAR_ROOT/$MMAR_CKPT_DIR" ]; then
    echo deleting dir "$MMAR_ROOT/$MMAR_CKPT_DIR"
    rm -r "$MMAR_ROOT/$MMAR_CKPT_DIR"


## 4.3 Start training
Now that we have our training configuration, to start training simply run `train.sh` as below. 
Please keep in mind that we have setup a dummy data with one file to train a dummy network fast (we only train for 2 epochs). 
Please see exercises on how to easily switch data and train a real segmentation network.


In [8]:
! $MMAR_ROOT/commands/train_W_Config.sh config_train_Unet.json

[H[2Jrunning cmd /claraDevDay/GettingStarted//commands/train_W_Config.sh config_train_Unet.json
PYTHONPATH is :/opt/nvidia/medical:/opt/nvidia:/opt/nvidia:/claraDevDay/GettingStarted/commands/../custom
MMAR_ROOT set to /claraDevDay/GettingStarted/commands/..
------------------------------------
deleting dir /claraDevDay/GettingStarted/commands/../models/config_train_Unet
saving models to created dir /claraDevDay/GettingStarted/commands/../models/config_train_Unet
------------------------------------
2021-08-04 11:23:58,661 - torch.distributed.nn.jit.instantiator - INFO - Created a temporary directory at /tmp/tmpeft8hq71
2021-08-04 11:23:58,662 - torch.distributed.nn.jit.instantiator - INFO - Writing /tmp/tmpeft8hq71/_remote_module_non_sriptable.py
Loading dataset: 100%|████████████████████████████| 4/4 [00:11<00:00,  2.82s/it]
Loading dataset: 100%|████████████████████████████| 4/4 [00:01<00:00,  3.38it/s]
Num Epochs:  2
Use GPU:  True
Multi GPU:  False
Automatic Mixed Precision:  En

Now lets see the `models` directory, which would includes out models and the tensorboard files 

In [9]:
! ls -la $MMAR_ROOT/models/config_train_Unet
!echo ---------------------------------------
!echo Display content of train_stats.json
! cat $MMAR_ROOT/models/config_train_Unet/train_stats.json

total 75244
drwxr-xr-x 2 root root     4096 Aug  4 11:24 .
drwxr-xr-x 3 1000 1000     4096 Aug  4 11:23 ..
-rw-r--r-- 1 root root     6815 Aug  4 11:23 config_train_Unet.json
-rw-r--r-- 1 root root     7844 Aug  4 11:24 config_train_Unet.json.log
-rw-r--r-- 1 root root      710 Aug  4 11:24 events.out.tfevents.1628076254.0220adf0bef4.472.0
-rw-r--r-- 1 root root 57750164 Aug  4 11:24 final_model.pt
-rw-r--r-- 1 root root 19263078 Aug  4 11:24 model.pt
-rw-r--r-- 1 root root      117 Aug  4 11:24 train_stats.json
---------------------------------------
Display content of train_stats.json
{"total_epochs": 2, "total_iterations": 4, "best_validation_metric": 0.04933180287480354, "best_validation_epoch": 2}


# 5. Export Model

To export the model we simply run `export.sh` which will: 
- Create ts file
This optimized model will be used by TRITON server in AIAA and Clara Deploy.


In [10]:
! $MMAR_ROOT/commands/export.sh

PYTHONPATH is :/opt/nvidia/medical:/opt/nvidia:/opt/nvidia:/claraDevDay/GettingStarted/commands/../custom
MMAR_ROOT set to /claraDevDay/GettingStarted/commands/..
2021-08-04 11:25:03,224 - torch.distributed.nn.jit.instantiator - INFO - Created a temporary directory at /tmp/tmph6s7ffqq
2021-08-04 11:25:03,225 - torch.distributed.nn.jit.instantiator - INFO - Writing /tmp/tmph6s7ffqq/_remote_module_non_sriptable.py
Exported model has been tested with TorchScript, and the result looks good!




lets check out what was created in the folder. 
after running cell below you should see `model.ts`


In [11]:
!ls -la $MMAR_ROOT/models/config_train_Unet/*.ts

-rw-r--r-- 1 root root 19350033 Aug  4 11:25 /claraDevDay/GettingStarted//models/config_train_Unet/model.ts



# 6. Validation 
Now that we have trained our model we would like to run evaluation to get some statistics and also do inference to see the resulted output


## 6.1 Validate with single GPU 
To run evaluation on your validation dataset you should run `validate.sh`. 
This will run evaluation on the validation dataset and place it in the `MMAR_EVAL_OUTPUT_PATH` as configured in the [environment.json](config/environment.json) 
file (default is eval folder). 
This evaluation would give min, max, mean of the metric as specified in the config_validation file


In [12]:
! $MMAR_ROOT/commands/validate.sh


[H[2Jrunning cmd /claraDevDay/GettingStarted//commands/validate.sh
PYTHONPATH is :/opt/nvidia/medical:/opt/nvidia:/opt/nvidia:/claraDevDay/GettingStarted/commands/../custom
MMAR_ROOT set to /claraDevDay/GettingStarted/commands/..
------------------------------------
2021-08-04 11:25:19,799 - torch.distributed.nn.jit.instantiator - INFO - Created a temporary directory at /tmp/tmpw0grr3ah
2021-08-04 11:25:19,800 - torch.distributed.nn.jit.instantiator - INFO - Writing /tmp/tmpw0grr3ah/_remote_module_non_sriptable.py
loaded TorchScript model from path: /claraDevDay/GettingStarted/commands/../models/config_train_Unet/model.ts
Use GPU:  True
Multi GPU:  False
Automatic Mixed Precision:  Enabled
Determinism Evaluation:  Disabled
cuDNN BenchMark:  False
CUDA Matmul Allow TF32:  True
cuDNN Allow TF32:  True
Model:  <class 'torch.jit._script.RecursiveScriptModule'>
Dataset:  <class 'monai.data.dataset.Dataset'>
DataLoader:  <class 'monai.data.dataloader.DataLoader'>
Validate Transform #1: <cl

You could also run `validate_ckpt.sh` which loads the model from the checkpoint instead of the ts file

In [13]:
! $MMAR_ROOT/commands/validate_ckpt.sh


[H[2Jrunning cmd /claraDevDay/GettingStarted//commands/validate_ckpt.sh
PYTHONPATH is :/opt/nvidia/medical:/opt/nvidia:/opt/nvidia:/claraDevDay/GettingStarted/commands/../custom
MMAR_ROOT set to /claraDevDay/GettingStarted/commands/..
------------------------------------
2021-08-04 11:25:46,969 - torch.distributed.nn.jit.instantiator - INFO - Created a temporary directory at /tmp/tmpp7brj0hi
2021-08-04 11:25:46,970 - torch.distributed.nn.jit.instantiator - INFO - Writing /tmp/tmpp7brj0hi/_remote_module_non_sriptable.py
loaded model config from checkpoint: /claraDevDay/GettingStarted/commands/../models/config_train_Unet/model.pt
Use GPU:  True
Multi GPU:  False
Automatic Mixed Precision:  Enabled
Determinism Evaluation:  Disabled
cuDNN BenchMark:  False
CUDA Matmul Allow TF32:  True
cuDNN Allow TF32:  True
Model:  <class 'monai.networks.nets.unet.UNet'>
Dataset:  <class 'monai.data.dataset.Dataset'>
DataLoader:  <class 'monai.data.dataloader.DataLoader'>
Validate Transform #1: <class 

## 6.2 Validate with multiple GPUs 
You can also leverage multi-GPUs for validation using `validate_multi_gpu.sh` 

In [14]:
!$MMAR_ROOT/commands/validate_multi_gpu.sh 

[H[2Jrunning cmd /claraDevDay/GettingStarted//commands/validate_multi_gpu.sh
PYTHONPATH is :/opt/nvidia/medical:/opt/nvidia:/opt/nvidia:/claraDevDay/GettingStarted/commands/../custom
MMAR_ROOT set to /claraDevDay/GettingStarted/commands/..
------------------------------------
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
2021-08-04 11:26:14,229 - torch.distributed.nn.jit.instantiator - INFO - Created a temporary directory at /tmp/tmp8vsu52wr
2021-08-04 11:26:14,229 - torch.distributed.nn.jit.instantiator - INFO - Created a temporary directory at /tmp/tmp6rb37f7h
2021-08-04 11:26:14,230 - torch.distributed.nn.jit.instantiator - INFO - Writing /tmp/tmp8vsu52wr/_remote_module_non_sriptable.py
2021-08-04 11:26:14,230 - torch.distributed.nn.jit.

Similarly you could also run `validate_multi_gpu_ckpt.sh` which loads the model from the checkpoint instead of the ts file

In [15]:
! $MMAR_ROOT/commands/validate_multi_gpu_ckpt.sh


[H[2Jrunning cmd /claraDevDay/GettingStarted//commands/validate_multi_gpu_ckpt.sh
PYTHONPATH is :/opt/nvidia/medical:/opt/nvidia:/opt/nvidia:/claraDevDay/GettingStarted/commands/../custom
MMAR_ROOT set to /claraDevDay/GettingStarted/commands/..
------------------------------------
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
2021-08-04 11:26:21,682 - torch.distributed.nn.jit.instantiator - INFO - Created a temporary directory at /tmp/tmpm6cvou1x
2021-08-04 11:26:21,683 - torch.distributed.nn.jit.instantiator - INFO - Writing /tmp/tmpm6cvou1x/_remote_module_non_sriptable.py
2021-08-04 11:26:21,693 - torch.distributed.nn.jit.instantiator - INFO - Created a temporary directory at /tmp/tmp08doaku3
2021-08-04 11:26:21,693 - torch.distributed.nn

## 6.3 Check Validation results 
Now lets see results in the folder by running cells below. 

In [17]:
!ls -la $MMAR_ROOT/eval
for fName in ["metrics.csv","val_mean_dice_raw.csv","val_mean_dice_summary.csv"]:
    print("---------------------------------------")
    print("Display content of ",fName)
    ! cat $MMAR_ROOT/eval/$fName


total 24
drwxrwxr-x  3 1000 1000 4096 Aug  4 11:25 .
drwxrwxr-x 11 1000 1000 4096 Aug  4 11:26 ..
-rw-rw-r--  1 1000 1000    0 Aug  4 08:28 .gitignore
-rw-r--r--  1 root root   62 Aug  4 11:26 metrics.csv
drwxr-xr-x  2 root root 4096 Aug  4 11:25 spleen_8
-rw-r--r--  1 root root  325 Aug  4 11:26 val_mean_dice_raw.csv
-rw-r--r--  1 root root  136 Aug  4 11:26 val_mean_dice_summary.csv
---------------------------------------
Display content of  metrics.csv
val_mean_dice	0.06937698274850845
val_acc	0.46444148176841094
---------------------------------------
Display content of  val_mean_dice_raw.csv
filename	class0	mean
/claraDevDay/Data/sampleData/imagesTr/spleen_8.nii.gz	0.06937719	0.06937719
/claraDevDay/Data/sampleData/imagesTr/spleen_8.nii.gz	0.06937674	0.06937674
/claraDevDay/Data/sampleData/imagesTr/spleen_8.nii.gz	0.06937729	0.06937729
/claraDevDay/Data/sampleData/imagesTr/spleen_8.nii.gz	0.06937672	0.06937672
---------------------------------------
Display content of  val_mean_di

# 7. Inference  

To run inference on validation dataset or test dataset you should run `infer.sh`. 
This will run prediction on the validation dataset and place it in the `MMAR_EVAL_OUTPUT_PATH` as configured in the 
[environment.json](config/environment.json) file (default is eval folder)


In [18]:
! $MMAR_ROOT/commands/infer.sh

[H[2Jrunning cmd /claraDevDay/GettingStarted//commands/infer.sh
PYTHONPATH is :/opt/nvidia/medical:/opt/nvidia:/opt/nvidia:/claraDevDay/GettingStarted/commands/../custom
MMAR_ROOT set to /claraDevDay/GettingStarted/commands/..
------------------------------------
2021-08-04 11:26:58,542 - torch.distributed.nn.jit.instantiator - INFO - Created a temporary directory at /tmp/tmp7c8v5tcz
2021-08-04 11:26:58,543 - torch.distributed.nn.jit.instantiator - INFO - Writing /tmp/tmp7c8v5tcz/_remote_module_non_sriptable.py
loaded TorchScript model from path: /claraDevDay/GettingStarted/commands/../models/config_train_Unet/model.ts
Use GPU:  True
Multi GPU:  False
Automatic Mixed Precision:  Enabled
Determinism Evaluation:  Disabled
cuDNN BenchMark:  False
CUDA Matmul Allow TF32:  True
cuDNN Allow TF32:  True
Model:  <class 'torch.jit._script.RecursiveScriptModule'>
Dataset:  <class 'monai.data.dataset.Dataset'>
DataLoader:  <class 'monai.data.dataloader.DataLoader'>
Validate Transform #1: <class

Now lets see results in the folder

In [19]:
! ls -la $MMAR_ROOT/eval/

total 24
drwxrwxr-x  3 1000 1000 4096 Aug  4 11:25 .
drwxrwxr-x 11 1000 1000 4096 Aug  4 11:26 ..
-rw-rw-r--  1 1000 1000    0 Aug  4 08:28 .gitignore
-rw-r--r--  1 root root   62 Aug  4 11:26 metrics.csv
drwxr-xr-x  2 root root 4096 Aug  4 11:25 spleen_8
-rw-r--r--  1 root root  325 Aug  4 11:26 val_mean_dice_raw.csv
-rw-r--r--  1 root root  136 Aug  4 11:26 val_mean_dice_summary.csv


In [20]:
! ls -la $MMAR_ROOT/eval/spleen_8


total 3204
drwxr-xr-x 2 root root    4096 Aug  4 11:25 .
drwxrwxr-x 3 1000 1000    4096 Aug  4 11:25 ..
-rw-r--r-- 1 root root 3269759 Aug  4 11:27 spleen_8_seg.nii.gz


# 8.Multi-GPU Training
Clara train aims to simplify scaling and utilizing all available GPUs. 
Using the same config we already used for train we can simply invoke `train_multi_gpu.sh` to train on multiple gpus. 
Main difference between the `train.sh` and `train_multi_gpu.sh` is changing some parameters

train.sh | train_multi_gpu.sh  
 --- | --- 
python3 -u -m medl.apps.train \\<br>-m MMAR_ROOT \\<br>-c CONFIG_FILE \\<br>-e ENVIRONMENT_FILE \\<br>--write_train_stats \\<br>--set \\<br> print_conf=True | python -m torch.distributed.launch\\<br> --nproc_per_node=2 --nnodes=1 --node_rank=0 \\<br> --master_addr="localhost" --master_port=1234 \\<br>-m medl.apps.train \\<br>-m MMAR_ROOT \\<br>-c CONFIG_FILE \\<br>-e ENVIRONMENT_FILE \\<br> --write_train_stats \\<br> --set \\<br> print_conf=True \\<br> multi_gpu=True \\<br> learning_rate= 2e-4
 
Lets examine `train_multi_gpu.sh` script by running cell below. 

In [None]:
printFile(MMAR_ROOT+"/commands/train_multi_gpu.sh",0,50)

Lets give it a try and run cell below to train on 2 GPUs

In [None]:
! $MMAR_ROOT/commands/train_multi_gpu.sh


# 9. Training Vs FineTune
`train.sh` and `finetune.sh` are identical and use the same config file. 
The only difference is that `finetune.sh` enables the load of check point using the `disabled` as shown below 

except they train using different configurations files. 

_Note_: The only difference between the two configs `config_train_Unet.json` and `config_finetune.json` 
is that `config_finetune.json` specifies a `ckpt` file in section below 
while `config_train_Unet.json` does not since it is training from scratch.
```
      {
        "name": "CheckpointLoader",
        "args": {
          "disabled": "{dont_load_ckpt_model}",
          "load_path": "{MMAR_CKPT}",
          "load_dict": ["model"]
        }
      },
```


# 10. Profiling
Nvidia provides multiple tools for profiling your training in order to eliminate bottlenecks. 


## 10.1 Profiling with DLprof 
DLprof is a simple tool that does the analysis in of regular training then display results in tensorboard.
Moreover, it provides recommendations guiding user on how to improve performance 

Cell below uses DLprof tool. 
We will use same config as we used above expect it have `AMP=false` to see DLProf analysis  

In [None]:
!$MMAR_ROOT/commands/debug_dlprof.sh config_train_Unet_NoAMP.json

You then need to run tensor board manually (Not through jupyterlab) using 
```
cd /claraDevDay/GettingStarted/models/config_train_Unet_NoAMP_debug
tensorboard --logdir ./dlprof --port 5000
```
Recall we mounted port 3031 to 5000 by default for AIAA in the `docker-compose.yml` file, 
we simply are using that mapping here for simplicity 
now if you navigate to `<yourip:3031>` you should see DlProf tool as below. 
This analysis shows you the GPUs you have along improvements that you can do to train faster. 
For example this run shows multiple operations that would be accelerated from AMP.
To test this you can run cell below with AMP enabled in the configuration 

<br><img src="screenShots/Dlprof.png" alt="Drawing" style="height: 400px;"/><br>


## 10.2 Profile your model with Nsight System
Nsight System is more advanced
 

In order to train faster you would need to analyze your training loop and check for bottlenecks.
1. Download Nisght Systems locally on your machine from https://developer.nvidia.com/nsight-systems 
2. Open Nsight System locally (out side the docker)
3. Load up files from <local path>/GettingStarted/models/config_train_Unet_NoAMP_debug
you should see image analysis as below 
<br><img src="screenShots/Nsight1FromDlprof.png" alt="Drawing" style="height: 400px;"/><br>

# Next:

### 1. Load model into AIAA
We will show here how you can quickly load up the model we trained above into AIAA. 
First, you should run [AIAA Notebook](../AIAA/AIAA.ipynb) to start the server.
Section 3.1 in the AIAA notebook shows how to load trained model into AIAA server. 


### 2. Bring your own Components
In order to fully take advantage of clara train SDK you should write your own components. 
Please go to [BYOC notebook](BYOC.ipynb) for examples  


# Exercise:
Now that you are familiar with clara train, you can try to: 
1. Explore different options of clara train by changing / creating a new config file and running training: 
    1. Model architecture: Ahnet, Unet, Segresnet 
    2. Losses
    3. Transformation 

Hint: you for training segresnet you can use the configuration `config_train_segresnet.json` that only changed the network section.
you can train by running cell below     

In [None]:
!$MMAR_ROOT/commands/train_W_Config.sh config_train_segresnet.json


2. Train on real spleen data for this you should:
    1. Download spleen dataset by running the [download](../Data/DownloadDecathlonDataSet.ipynb) Notebook
    2. Switch the dataset file in the [environment.json](config/environment.json)
    3. rerun the `train.sh`


3. Experiment with multi-GPU training by changing number of gpus to train on from 2 to 3 or 4. 
You should edit [train_multi_gpu.sh](commands/train_multi_gpu.sh) then rerun the script 
