<a href="https://colab.research.google.com/github/ZKTKZ/thdne/blob/master/StyleGAN2_Tazik_25GB_RAM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Introduction

This document will give you step-by-step instructions on training a GAN to make infinite images of an anime girl of your choice.

This would not be possible without the work of many before me -- most notably Gwern, whose pre-trained StyleGAN 2 model is the basis for our transfer learning, and who has also written an in-depth guide on his site; random chinese user on CSDN, whose Colab-specific experiences and code samples were helpful; and nagadomi, for his anime face cropper.

My original contribution is a color distance computer to filter undesirable Pixiv data. For characters with sufficient Danbooru images, this is not necessary; but for others, being able to draw on the Pixiv dataset is essential. In my case, Pixiv yielded 1500+ images, of which *hundreds* (25-50%) were not relevant; and the color distance script helped filter it down.

This project was my induction into deep learning. I've learnt to parse papers and debug Tensorflow. Of course, this is only the beginning -- to gain a proper, first-principles understanding of the field, I have begun to re-implement important DL papers.

# Scraping
Two sources:

1) Danbooru (https://github.com/Bionus/imgbrd-grabber)

2) Pixiv (https://github.com/Redcxx/Pikax)

In [None]:
from pikax import Pikax, settings

pixiv = Pikax(settings.username, settings.password)

results = pixiv.search(keyword='早坂愛')  # search

pixiv.download(results)  # download


See https://github.com/Redcxx/Pikax for instructions on setting up your `username` & `password`.
Next, run Dupeguru (https://github.com/arsenetar/dupeguru/) on your downloaded images. 

Danbooru consists primarily of high tier images for Pixiv, and this step prevents duplication.
Now that we have our data, on to processing!

# Cropping

In [None]:
# https://github.com/nagadomi/lbpcascade_animeface/blob/master/examples/detect.py

import imutils
import cv2
import sys
import os.path

def detect(abs_filename, cascade_file = "../lbpcascade_animeface.xml"):#, mode="display"):
    if not os.path.isfile(cascade_file):
        raise RuntimeError("%s: not found" % cascade_file)

    cascade = cv2.CascadeClassifier(cascade_file)
    image = cv2.imread(abs_filename, cv2.IMREAD_COLOR)
    #height, width, channels = image.shape
    #image = image[0: int(h/2), 0: w]
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    gray = cv2.equalizeHist(gray)
    
    faces = cascade.detectMultiScale(gray,
                                     # detector options
                                     scaleFactor = 1.05,
                                     minNeighbors = 5,
                                     minSize = (250, 250)
                                     #,maxSize = (int(0.4*w), int(0.4*h))
                                     )
    tag = 0
    filename = os.path.basename(abs_filename)
    for (x, y, w, h) in faces:

        #cv2.rectangle(image, (int(x*0.85), int(y*0.1)), (x + int(w*1.5), y + h), (255, 0, 0), 50)
        cropped = image[int(y*0.2): y + int(h*0.825), int(x*0.95): x + int(w*1.225)]
        #cv2.imshow("AnimeFaceDetect", imutils.resize(cropped, width=1080, height=1366))        cv2.waitKey(0)
        cv2.imwrite(str(filename[0:-4] + '_' + str(tag) + filename[-4:]), cropped)
        tag += 1

if len(sys.argv) != 2:
    sys.stderr.write("usage: detect.py <abs_filename>\n")
    sys.exit(-1)
detect(sys.argv[1])


You may want to modify the parameters. I crop the images rather selectively, to minimize background noise. The `scaleFactor` determines how many scales of the image the classification is run on. A lower value means more results, but also more false positives. The other two parameters are self-descriptive.

## Upscaling

In [None]:
!cat /usr/local/cuda/version.txt
!wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
!sudo dpkg -i cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
!apt-get install libvulkan-dev
!apt-get update

!%cd /content/
!git clone https://github.com/nihui/waifu2x-ncnn-vulkan.git
!cd waifu2x-ncnn-vulkan/
!git submodule update --init --recursive
!wget https://github.com/nihui/waifu2x-ncnn-vulkan/releases/download/20200606/waifu2x-ncnn-vulkan-20200606-linux.zip
!unzip waifu2x-ncnn-vulkan-20200606-linux.zip
%cd waifu2x-ncnn-vulkan-20200606-linux



In [None]:
# upload all files first ->
%mkdir 2x
!for img in *.??g; do ./waifu2x-ncnn-vulkan -i $img -o 2x/${img%.*}_2x.png; done

# copying in gdrive
#!gsutil -m cp -r hand_tuned_larger/ '/content/drive/My Drive/twist_moe/hand_tuned_larger_2x/'

# Cleaning

After obtaining the cropped images, I run the shell scripts in this Git repository, courtesy of Gwern. The one change I made is to preserve the JPGs at 100% quality, as I have a small dataset.

After changing the directory parameter, run them in the following order:

delete -> convert -> resize -> final



I downloaded *all* images of Hayasaka from Pixiv. Unlike Danbooru, Pixiv does not have proper image tags. So, to separate images of Hayasaka from images we don't want (black & white, other characters), I devised the following script, which calculates distance of the dominant color in thte image from RGB yellow (255, 255, 0).

In [None]:
from PIL import Image
import os 
from time import sleep
from colorthief import ColorThief

for root, dirs, files in os.walk('/home/tazik/Nextcloud/code/lbpcascade_animeface/examples/datasets/hand_tuned_larger_2x/'):
    #print(root, dirs, files)
    for name in files:
        print(name)
        color_thief = ColorThief(os.path.join(root, name))
        palette = color_thief.get_palette(color_count=2, quality=1)
        rgb = palette[0]
        delta_E = pow(rgb[0]-255, 2) + pow(rgb[1]-255,2) + pow(rgb[2], 2)
        print(delta_E)
        new_name = os.path.join(root, str(delta_E) + ".png")
        os.rename(os.path.join(root, name), new_name)


The script renames images according to "yellowness", making it easy to eliminate non-matching images.


Now, it's time to use Colab. We use the pre-trained anime face StyleGan2 model to rank the our pre-processed images. This helps with filtering, as higher ranked images tend to be lower quality. This trick is also courtesy of Gwern.

In [None]:
#CSDN blog
#https://translate.googleusercontent.com/translate_c?depth=1&pto=aue&rurl=translate.google.com&sl=auto&sp=nmt4&tl=en&u=https://blog.csdn.net/DLW__/article/details/104222546&usg=ALkJrhjWEtjIz8Yklx8uSjuFQuv7O9bPnA

import os
import pickle
import numpy as np
import PIL.Image
import dnnlib
import dnnlib.tflib as tflib
#import config
import sys

!pip install googledrivedownloader
from google_drive_downloader import GoogleDriveDownloader as gdd

import pretrained_networks

# StyleGAN2 Danbooru Portrait
url = 'https://drive.google.com/open?id=1WNQELgHnaqMTq3TlrnDaVkyrAH8Zrjez'
#'https://drive.google.com/open?id=1BHeqOZ58WZ-vACR2MJkh1ZVbJK2B-Kle'
model_id = url.replace('https://drive.google.com/open?id=', '')

network_pkl = '/content/models/model_%s.pkl' % model_id#(hashlib.md5(model_id.encode()).hexdigest())
gdd.download_file_from_google_drive(file_id=model_id,
                                    dest_path=network_pkl)


# If downloads fails, due to 'Google Drive download quota exceeded' you can try downloading manually from your own Google Drive account
# network_pkl = "/content/drive/My Drive/GAN/stylegan2-ffhq-config-f.pkl"


def main(origin_dir):
    image_names = [files for root, dirs, files in os.walk(origin_dir)][0]
    print('find %s files in %s' % (len(image_names), origin_dir))

    tflib.init_tf()
    print('Loading networks from "%s"...' % network_pkl)
    _G, _D, Gs = pretrained_networks.load_networks(network_pkl)
    noise_vars = [var for name, var in Gs.components.synthesis.vars.items() if name.startswith('noise')]

    for index, image_name in enumerate(image_names):
        image_path = os.path.join(origin_dir, image_name)
        img = np.asarray(PIL.Image.open(image_path))
        img = img.reshape(1, 3, 512, 512)
        score = _D.run(img, None)
        os.rename(image_path, os.path.join(origin_dir, '%s_%s.png' % (score[0][0], index)))
        print(image_name, score[0][0])

    print('Done!')


if __name__ == "__main__":
    main('/content/drive/My Drive/twist_moe/hand_tuned/')



Finally, I go through the images to look for potential outliers. I spent a bit of time on this step, as there were many low quality images of Hayasaka that I did not want the model to be learning from. This step could potentially be made redundant if one appropriately filters Pixiv images by art type. But the art categories are not immediately apparent to non-Pixiv users.

# Training

## Colab Hacks

```
i = []
while True:
  i.append(i)
```

The above is used to induce Google to offer you more RAM. Do note that this does not work on newly initialized notebooks, after a patch by Google; instead, you have to use an older notebook as your base (e.g. make a copy of this NB).

Keep Colab from disconnecting after 1.5hrs.

```

function KeepClicking(){
   console.log("Clicking");
   document.querySelector("colab-toolbar-button#connect").click()
}setInterval(KeepClicking,60000)
```







In [1]:
from google.colab import drive
drive.mount('/content/drive')

%tensorflow_version 1.x
import tensorflow as tf

# Download the code
!git clone https://github.com/ZKTKZ/stylegan2.git
%cd stylegan2
!nvcc test_nvcc.cu -o test_nvcc -run

print('Tensorflow version: {}'.format(tf.__version__) )
!nvidia-smi -L
print('GPU Identified at: {}'.format(tf.test.gpu_device_name()))

!pip install tensorboard

url = 'https://drive.google.com/open?id=1WNQELgHnaqMTq3TlrnDaVkyrAH8Zrjez'
#'https://drive.google.com/open?id=1BHeqOZ58WZ-vACR2MJkh1ZVbJK2B-Kle'
model_id = url.replace('https://drive.google.com/open?id=', '')

!pip install googledrivedownloader
from google_drive_downloader import GoogleDriveDownloader as gdd

network_pkl = './models/model_%s.pkl' % model_id#(hashlib.md5(model_id.encode()).hexdigest())
gdd.download_file_from_google_drive(file_id=model_id,
                                    dest_path=network_pkl)

!python dataset_tool.py create_from_images ./dataset/hayasaka /content/drive/'My Drive'/twist_moe/hand_tuned_larger_2x/

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive
TensorFlow 1.x selected.
Cloning into 'stylegan2'...
remote: Enumerating objects: 3, done.[K
remote: Counting objects: 100% (3/3), done.[K
remote: Compressing objects: 100% (3/3), done.[K
remote: Total 719 (delta 0), reused 0 (delta 0), pack-reused 716[K
Receiving objects: 100% (719/719), 16.77 MiB | 33.61 MiB/s, done.
Resolving deltas: 100% (462/462), done.
/content/stylegan2
CPU says hello.
GPU says hello.
Tensorflow version: 1.15.2
GPU 0: Tesla P100-PCI

In [None]:
!python run_training.py --spatial-augmentations=true --lr=0.0005 --num-gpus=1 --data-dir=./dataset --config=config-f --dataset=hayasaka --mirror-augment=true --metric=none --total-kimg=10000 --min-h=4 --min-w=4 --res-log2=7 --result-dir="/content/drive/My Drive/twist_moe/results/" --resume-pkl='./models/model_1WNQELgHnaqMTq3TlrnDaVkyrAH8Zrjez.pkl'


Local submit - run_dir: /content/drive/My Drive/twist_moe/results/00000-stylegan2-hayasaka-1gpu-config-f
dnnlib: Running training.training_loop.training_loop() on localhost...
Streaming data using training.dataset.TFRecordDataset...
Dataset shape = [3, 512, 512]
Dynamic range = [0, 255]
Label size    = 0
Loading networks from "./models/model_1WNQELgHnaqMTq3TlrnDaVkyrAH8Zrjez.pkl"...
Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... Compiling... Loading... Done.
Setting up TensorFlow plugin "upfirdn_2d.cu": Preprocessing... Compiling... Loading... Done.

G                             Params    OutputShape         WeightShape     
---                           ---       ---                 ---             
latents_in                    -         (?, 512)            -               
labels_in                     -         (?, 0)              -               
lod                           -         ()                  -               
dlatent_avg                   -       

# Generation

In [None]:

!python run_generator.py generate-images --seeds=0-50 --truncation-psi=1.0 --network=/content/drive/'My Drive'/twist_moe/results/00007-stylegan2-hayasaka-1gpu-config-f/network-snapshot-000086.pkl
%cp -av /content/stylegan2/results/00000-generate-images /content/drive/'My Drive'/twist_moe/seeds-1.0


In [None]:
import os
import pickle
import numpy as np
import PIL.Image
import dnnlib
import dnnlib.tflib as tflib
import scipy
import math
import moviepy.editor
from numpy import linalg


def main():
    tflib.init_tf()
    _G, _D, Gs = pickle.load(open("/content/drive/My Drive/twist_moe/results/00001-stylegan2-hayasaka-1gpu-config-f/network-snapshot-000108.pkl", "rb"))

    rnd = np.random
    latents_a = rnd.randn(1, Gs.input_shape[1])
    latents_b = rnd.randn(1, Gs.input_shape[1])
    latents_c = rnd.randn(1, Gs.input_shape[1])

    def circ_generator(latents_interpolate):
        radius = 40.0

        latents_axis_x = (latents_a - latents_b).flatten() / linalg.norm(latents_a - latents_b)
        latents_axis_y = (latents_a - latents_c).flatten() / linalg.norm(latents_a - latents_c)

        latents_x = math.sin(math.pi * 2.0 * latents_interpolate) * radius
        latents_y = math.cos(math.pi * 2.0 * latents_interpolate) * radius

        latents = latents_a + latents_x * latents_axis_x + latents_y * latents_axis_y
        return latents

    def mse(x, y):
        return (np.square(x - y)).mean()

    def generate_from_generator_adaptive(gen_func):
        max_step = 1.0
        current_pos = 0.0

        change_min = 10.0
        change_max = 11.0

        fmt = dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True)

        current_latent = gen_func(current_pos)
        current_image = Gs.run(current_latent, None, truncation_psi=0.7, randomize_noise=False, output_transform=fmt)[0]
        array_list = []

        video_length = 1.0
        while(current_pos < video_length):
            array_list.append(current_image)

            lower = current_pos
            upper = current_pos + max_step
            current_pos = (upper + lower) / 2.0

            current_latent = gen_func(current_pos)
            current_image = images = Gs.run(current_latent, None, truncation_psi=0.7, randomize_noise=False, output_transform=fmt)[0]
            current_mse = mse(array_list[-1], current_image)

            while current_mse < change_min or current_mse > change_max:
                if current_mse < change_min:
                    lower = current_pos
                    current_pos = (upper + lower) / 2.0

                if current_mse > change_max:
                    upper = current_pos
                    current_pos = (upper + lower) / 2.0


                current_latent = gen_func(current_pos)
                current_image = images = Gs.run(current_latent, None, truncation_psi=0.7, randomize_noise=False, output_transform=fmt)[0]
                current_mse = mse(array_list[-1], current_image)
            print(current_pos, current_mse)
        return array_list

    frames = generate_from_generator_adaptive(circ_generator)
    frames = moviepy.editor.ImageSequenceClip(frames, fps=30)

    # Generate video.
    mp4_file = 'circular.mp4'
    mp4_codec = 'libx264'
    mp4_bitrate = '3M'
    mp4_fps = 20

    frames.write_videofile(mp4_file, fps=mp4_fps, codec=mp4_codec, bitrate=mp4_bitrate)

if __name__ == "__main__":
    main()

## Video

In [None]:
#https://stackoverflow.com/a/57378660/8773953
!pip install -U kora
from kora.drive import upload_public
url = upload_public('/content/drive/My Drive/twist_moe/videos/circular.mp4')

from IPython.display import HTML
HTML(f"""<video src={url} width=500 controls/>""")

In [None]:
from IPython.display import HTML
HTML("""
    <video alt="test" controls>
        <source src='/content/drive/My Drive/twist_moe/videos/circular.mp4' type="video/mp4">
    </video>
""")

## Comments

The above is after training for about ~9 hours on Colab, and the interpolation is already pretty good!