# Self_supervised training on 1-gpu with VISSL

## Generate custom dataset

Note: (1) do not change the name of these folders: "train" and "val"; (2) put all your images into "train/label1" and "train/label2" folders in any split.

```
path/to/your/dataset
├──train
├  ├── label1/
├  ├    ├── images1.jpg
├  ├    ├── images2.jpg
├  ├
├  └── label2/
├       ├── images1.jpg
├       ├── images2.jpg
├
├──val (leave it empty)
   ├── label1/
   ├    ├── images1.jpg
   ├    ├── images2.jpg
   ├
   └── label2/
       ├── images1.jpg
       ├── images2.jpg
```

Load custom dataset

(1) Modify the custom dataset path in **"tools/run_distributed_engines.py"** file;

(2) Add the project root path in **"tools/run_distributed_engines.py"** file.

In [None]:
# (1) Modify the custom dataset path in the below code in "tools/run_distributed_engines.py" file;

from vissl.data.dataset_catalog import VisslDatasetCatalog

train_path="${YOUR_PROJECT_ROOT}/data/pretrain/train"
val_path="${YOUR_PROJECT_ROOT}/data/pretrain/val"
VisslDatasetCatalog.register_data(name="Sewer_ML", data_dict={"train": train_path, "test": val_path})

In [None]:
# (2) Add the project root path in the second code in "tools/run_distributed_engines.py" file.

import sys

sys.path.append('${YOUR_PROJECT_ROOT}')

## SwAV

Steps:

(a) Pretrained ResNet101 on ImageNet-1k dataset (1k categories, 1.2 million images); The weight can be downloaded from: 

https://dl.fbaipublicfiles.com/detectron2/ImageNetPretrained/MSRA/R-101.pkl

(b) Modify the hyperparameters in **"pretrain/swav/XXX.yaml"** file if needed, e.g., data augmentation;

(c) Modify the hyperparameters in the code below, e.g., batch size, epoches, output path;

(d) Train the full model (all layers) on the custom dataset

#### ResNet101 with ImageNet weights

In [None]:
# SwAV
# backbone: RN101
# Pretrained on ImageNet, and fine tune all layers (FTAL)

!python tools/run_distributed_engines.py \
  hydra.verbose=true \
  config=pretrain/swav/swav_1_gpu_resnet101.yaml \
  config.DATA.TRAIN.DATASET_NAMES=[Sewer_ML] \
  config.DATA.TRAIN.DATA_SOURCES=[disk_folder] \
  config.DATA.TRAIN.DATA_PATHS="${YOUR_PROJECT_ROOT}/data/pretrain/train" \
  config.OPTIMIZER.num_epochs=100 \
  config.CHECKPOINT.DIR="${YOUR_PROJECT_ROOT}checkpoints/swav_weights/SW_RN101/vissl" \
  config.HOOKS.TENSORBOARD_SETUP.USE_TENSORBOARD=true \
  config.WEIGHTS_INIT.PARAMS_FILE="${YOUR_PROJECT_ROOT}/checkpoints/pretrained_model/R-101.pkl" \
  config.WEIGHTS_INIT.APPEND_PREFIX="trunk._feature_blocks."


## Training logs, checkpoints, metrics (optional)

VISSL dumps model checkpoints in the checkpoint directory specified by user. In above example, we used `./checkpoints` directory.

We notice:
- model checkpoints `.torch` files after every epoch, 
- model training log `log.txt` which has the full stdout but saved in file
- `metrics.json` if your training calculated some metrics, those metrics values will be saved there..
- `tb_logs` which are the tensorboard events

## Visualizing Tensorboard Logs (optional)

If you have enabled `config.TENSORBOARD_SETUP.USE_TENSORBOARD=true` , you will see the tensorboard events dumped in `tb_logs/` directory. You can use this to visualize the events in tensorboard as follows:

In [None]:
# Look at training curves in tensorboard:
%reload_ext tensorboard
%tensorboard --logdir /scratch/tyldzl/PythonProject/sewer_defects_SSL/checkpoints/train_weights/Self_train_bbox/SimCLR_50_epochs/vissl/tb_logs/