# StyleGAN2: training a model from scratch

References:
- this notebook: https://github.com/woctezuma/steam-stylegan2
- the original StyleGAN2 repository: https://github.com/NVlabs/stylegan2
- my fork of StyleGAN2: https://github.com/woctezuma/stylegan2


## Machine specifications

### Request more RAM (once per session)

To have access to more memory, crash the Google Colab session  once.

References:
-   https://github.com/googlecolab/colabtools/issues/253#issuecomment-551056637
-   https://colab.research.google.com/drive/1dBN-wwYUngLYVt985wGs_OKPlK_ANB9D

In [0]:
# d = [ '0' ]
# while(True):
#   d += d

### Check the GPU

In [0]:
!nvidia-smi -L

GPU 0: Tesla P100-PCIE-16GB (UUID: GPU-7929352f-92c7-317b-d430-5469f9aeaca8)


### Switch to Tensorflow 1.x

The default TensorFlow version in Colab switched from  1.x to 2.x on the 27th of March, 2020.

Reference: https://colab.research.google.com/notebooks/tensorflow_version.ipynb

We switch to version 1.x to avoid the following error:
> `ModuleNotFoundError: No module named 'tensorflow.contrib' colab`

In [0]:
%tensorflow_version 1.x

TensorFlow 1.x selected.


## Installing StyleGAN2

### My fork of the official StyleGAN2 implementation

In [0]:
%cd /content/

/content


Clone my fork:

In [0]:
!rm -rf stylegan2/
!git clone https://github.com/woctezuma/stylegan2.git

Cloning into 'stylegan2'...
remote: Enumerating objects: 120, done.[K
remote: Total 120 (delta 0), reused 0 (delta 0), pack-reused 120[K
Receiving objects: 100% (120/120), 583.49 KiB | 13.89 MiB/s, done.
Resolving deltas: 100% (53/53), done.


In [0]:
pushd

['/content']

In [0]:
%cd stylegan2/

/content/stylegan2


Switch to one of my branches, designed for Google Colab:
-   `google-colab` to save every 2 ticks,
-   `google-colab-save-every-tick` to save every tick.

NB: It is expected that 1 tick ~ 15 min. However, if you are unlucky with the Colab machine which is allotted, 1 tick ~ 1 hour.

In [0]:
!git checkout google-colab-save-every-tick

Branch 'google-colab-save-every-tick' set up to track remote branch 'google-colab-save-every-tick' from 'origin'.
Switched to a new branch 'google-colab-save-every-tick'


In [0]:
!nvcc test_nvcc.cu -o test_nvcc -run

CPU says hello.
GPU says hello.


In [0]:
popd

/content
popd -> /content


## Mounting Google Drive

In [0]:
!pip install Google-Colab-Transfer



In [0]:
import colab_transfer

colab_path = colab_transfer.get_path_to_home_of_local_machine()
drive_path = colab_transfer.get_path_to_home_of_google_drive()

print('Disk of the virtual matchine: {}'.format(colab_path))
print('Google Drive: {}'.format(drive_path))

Disk of the virtual matchine: /content/
Google Drive: /content/drive/My Drive/


In [0]:
colab_transfer.mount_google_drive()

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


## Data parameters

In [0]:
folder_name = 'datasets/steam/'

## Preparing datasets (once)

### Importing data from Google Drive

Data consists of 14,035 vertical Steam banners, resized from 300x450 to 256x256 resolution.

In [0]:
# colab_transfer.copy_file(file_name='256x256.zip')

In [0]:
# !unzip 256x256.zip -d /content/

### Preparing data for StyleGAN2

In [0]:
# !python stylegan2/dataset_tool.py create_from_images '/content/datasets/steam' '/content/256x256'

### Archive prepared datasets to Google Drive

In [0]:
# colab_transfer.copy_folder(folder_name,
#                            source=colab_path,
#                            destination=drive_path)

## Importing prepared datasets from Google Drive

In [0]:
colab_transfer.copy_folder(folder_name,
                           source=drive_path,
                           destination=colab_path)

## Training networks

There is no need to edit `training/training_loop.py`, thanks to automatic resuming from the latest snapshot, implemented in my fork.

Otherwise, one would have to manually edit the file from within the Google Colab session!

### Train with the official StyleGAN2 implementation

Our Steam data consists of ~14k images, which exhibits a similar dataset size to the [FFHQ dataset](https://github.com/NVlabs/ffhq-dataset) (70k images, so 5 times larger).

Therefore, the parameters used for our data are inspired from the ones described in the StyleGAN2 README for the FFHQ dataset:
- `--mirror-augment=true`: data augmentation with horitontal mirroring,
- `--total-kimg=5000`: during training with our Steam data, StyleGAN2 will be shown 5 times fewer images than during training with the FFHQ data (the default value used for FFHQ is 25 million images: `--total-kimg=25000`). **Caveat:** this is an arbitrary decision: there is no good rule-of-thumb! Indeed, the right value would depend on the difficulty of the task (the more complex the task to learn, the longer training is **needed** ; e.g. generating game banners vs. human faces, 256x256 resolution vs. 1024x1024 resolution, etc.), and not solely on the size of the training dataset (the more diverse data is available, the longer training is **possible** without over-fitting the training dataset).

Model snapshots are directly saved to Google Drive (`--result-dir='/content/drive/My Drive/results'`).

In [0]:
!python stylegan2/run_training.py --config=config-e --metrics=none \
   --data-dir='/content/datasets' --dataset=steam \
   --mirror-augment=true \
   --total-kimg=5000 \
   --result-dir='/content/drive/My Drive/results' \
