### Thien Win
BrainStation Data Science Capstone <br>
April 2022 <br>

</br>

##### Notebook Table of Contents: <br>
<b>[1] Data Scraping and Wrangling </b><br>
[2] CycleGAN Training <br>
[3] Model Evaluation <br>
[4] FID Score <br>
<hr>

### [1] Data Scraping and Wrangling

##### Recommended Computing: Local Machine

<hr>

#### Introduction

My research into potential artist to  led me to discover that Studio Ghibli (SG), an animation production studio from Japan has recently made still images from several of their movies and works public. The following link is where you can find them: 

https://www.ghibli.jp/works/

In my practice for understanding cycleGANs, I came across a photograph data set that is hosted from the following link:

https://people.eecs.berkeley.edu/~taesung_park/CycleGAN/datasets/

The zip file to download is 'monet2photo.zip' which was used in a Kaggle competition utilizing cycleGANs. After extracting the zip file, I located the photo folders containing the photograph images as Train B and Test B. These folders were placed into the '/data/photos' directory. The images came in as rgb files with a 256x256 pixel size.

For the remainder of this notebook, you will find the method used to scrape the Studio Ghibli website and increase the quantity of photos for training.

As an advisory, the associated Google Drive (which can be found in the README.txt file) will contain the image data scraped in this notebook. If the reader wanted to experience the totality of the workflow, this will allow them to do so.

In [1]:
#start by import modules and libraries for the task

#used for scraping
import wget

#used for increasing quantity of SG images
import PIL
from PIL import Image
import os
import shutil
import random

In [2]:
#this code can be omitted should reader decide to work with prepared data
os.makedirs('data/photos')
os.makedirs('data_scrape/SG')
os.makedirs('data/SG/testA')
os.makedirs('data/SG/trainA')

<hr>

#### Data Scraping

In deciding to use these images for my project, I set about scraping the SG website by first understanding the structure of how the website presents the images. In inspecting a sample image, I noticed that they shared a similar structure which is as follows:

"https://www.ghibli.jp/gallery/(title).jpg"

where (title) is the movie title as denoted from the website followed by a 3 digit number. 

Working with the structure above, I decided to create two list:
   - images from 001 to 009
   - images from 010 to 050
   
This allows me to quickly create a list of the 20 titles for first the 10 to 50 list and iterate through it to concatenate a '0' for the 1 to 9 list. Unfortunately, I found myself needing to atleast sample one photo from the 20 titles to understand how the files were named. 

In [3]:
#create a list of url structure to be used in scraping images 10 to 50
url10to50 = ["https://www.ghibli.jp/gallery/nausicaa0",
            "https://www.ghibli.jp/gallery/laputa0",
            "https://www.ghibli.jp/gallery/totoro0",
            "https://www.ghibli.jp/gallery/majo0",
            "https://www.ghibli.jp/gallery/omoide0",
            "https://www.ghibli.jp/gallery/porco0",
            "https://www.ghibli.jp/gallery/umi0",
            "https://www.ghibli.jp/gallery/tanuki0",
            "https://www.ghibli.jp/gallery/onyourmark0",
            "https://www.ghibli.jp/gallery/mimi0",
            "https://www.ghibli.jp/gallery/mononoke0",
            "https://www.ghibli.jp/gallery/chihiro0",
            "https://www.ghibli.jp/gallery/baron0",
            "https://www.ghibli.jp/gallery/howl0",
            "https://www.ghibli.jp/gallery/ged0",
            "https://www.ghibli.jp/gallery/ponyo0",
            "https://www.ghibli.jp/gallery/karigurashi0",
            "https://www.ghibli.jp/gallery/kokurikozaka0",
            "https://www.ghibli.jp/gallery/kazetachinu0",
            "https://www.ghibli.jp/gallery/marnie0"]

In [4]:
#create empty list and use loop to iterate through 10to50 list to concat '0' 
url1to9 = []
for i in url10to50:
    url1to9.append(i+"0")

Now that I have the two lists and folders setup, I define a helper function to help with the task. The data is saved in the "/data_scrape/SG/trainA" directory instead of the data file. 

In [6]:
#define scraping function with path
def SG_scrape(url1to9_item, url10to50_item, path="data_scrape/SG"):
    
    file_1to9 = range(1,10)
    file_10to51 = range(11,51)
    
    #use a try function in case SG did not release enough images for title
    try:
        for i in file_1to9:
            # Use wget download method to download specified image url.
            image_filename = wget.download(url1to9_item+f"{i}.jpg", out = path)
            print('Image Successfully Downloaded: ', image_filename)
        
        for i in file_10to51:
            # Use wget download method to download specified image url.
            image_filename = wget.download(url10to50_item+f"{i}.jpg", out = path)
            print('Image Successfully Downloaded: ', image_filename)
    
    except Exception:
        pass

In defining the above scrape function, I will now apply it to the url lists. 

This may take 10 to 15 minutes to download all 958 images into the 'data_scrape/SG/trainA' directory. 

In [7]:
#loop through each list by zipping and download using scrape function
for m, n in zip(url1to9, url10to50):
    SG_scrape(m, n)

100% [............................................................................] 229593 / 229593Image Successfully Downloaded:  data_scrape/SG/nausicaa001.jpg
100% [............................................................................] 232333 / 232333Image Successfully Downloaded:  data_scrape/SG/nausicaa002.jpg
100% [............................................................................] 251419 / 251419Image Successfully Downloaded:  data_scrape/SG/nausicaa003.jpg
100% [............................................................................] 261582 / 261582Image Successfully Downloaded:  data_scrape/SG/nausicaa004.jpg
100% [............................................................................] 372449 / 372449Image Successfully Downloaded:  data_scrape/SG/nausicaa005.jpg
100% [............................................................................] 315831 / 315831Image Successfully Downloaded:  data_scrape/SG/nausicaa006.jpg
100% [......................

100% [............................................................................] 256825 / 256825Image Successfully Downloaded:  data_scrape/SG/laputa003.jpg
100% [............................................................................] 327274 / 327274Image Successfully Downloaded:  data_scrape/SG/laputa004.jpg
100% [............................................................................] 291112 / 291112Image Successfully Downloaded:  data_scrape/SG/laputa005.jpg
100% [............................................................................] 272045 / 272045Image Successfully Downloaded:  data_scrape/SG/laputa006.jpg
100% [............................................................................] 288536 / 288536Image Successfully Downloaded:  data_scrape/SG/laputa007.jpg
100% [............................................................................] 256052 / 256052Image Successfully Downloaded:  data_scrape/SG/laputa008.jpg
100% [..................................

100% [............................................................................] 194047 / 194047Image Successfully Downloaded:  data_scrape/SG/totoro006.jpg
100% [............................................................................] 216723 / 216723Image Successfully Downloaded:  data_scrape/SG/totoro007.jpg
100% [............................................................................] 259011 / 259011Image Successfully Downloaded:  data_scrape/SG/totoro008.jpg
100% [............................................................................] 273447 / 273447Image Successfully Downloaded:  data_scrape/SG/totoro009.jpg
100% [............................................................................] 319123 / 319123Image Successfully Downloaded:  data_scrape/SG/totoro011.jpg
100% [............................................................................] 298389 / 298389Image Successfully Downloaded:  data_scrape/SG/totoro012.jpg
100% [..................................

100% [............................................................................] 268492 / 268492Image Successfully Downloaded:  data_scrape/SG/majo009.jpg
100% [............................................................................] 146827 / 146827Image Successfully Downloaded:  data_scrape/SG/majo011.jpg
100% [............................................................................] 132159 / 132159Image Successfully Downloaded:  data_scrape/SG/majo012.jpg
100% [............................................................................] 253599 / 253599Image Successfully Downloaded:  data_scrape/SG/majo013.jpg
100% [............................................................................] 382504 / 382504Image Successfully Downloaded:  data_scrape/SG/majo014.jpg
100% [............................................................................] 282676 / 282676Image Successfully Downloaded:  data_scrape/SG/majo015.jpg
100% [..............................................

100% [............................................................................] 313175 / 313175Image Successfully Downloaded:  data_scrape/SG/omoide013.jpg
100% [............................................................................] 288062 / 288062Image Successfully Downloaded:  data_scrape/SG/omoide014.jpg
100% [............................................................................] 314393 / 314393Image Successfully Downloaded:  data_scrape/SG/omoide015.jpg
100% [............................................................................] 308584 / 308584Image Successfully Downloaded:  data_scrape/SG/omoide016.jpg
100% [............................................................................] 279325 / 279325Image Successfully Downloaded:  data_scrape/SG/omoide017.jpg
100% [............................................................................] 226946 / 226946Image Successfully Downloaded:  data_scrape/SG/omoide018.jpg
100% [..................................

100% [............................................................................] 212988 / 212988Image Successfully Downloaded:  data_scrape/SG/porco016.jpg
100% [............................................................................] 279903 / 279903Image Successfully Downloaded:  data_scrape/SG/porco017.jpg
100% [............................................................................] 319160 / 319160Image Successfully Downloaded:  data_scrape/SG/porco018.jpg
100% [............................................................................] 321823 / 321823Image Successfully Downloaded:  data_scrape/SG/porco019.jpg
100% [............................................................................] 274124 / 274124Image Successfully Downloaded:  data_scrape/SG/porco020.jpg
100% [............................................................................] 273395 / 273395Image Successfully Downloaded:  data_scrape/SG/porco021.jpg
100% [........................................

100% [............................................................................] 293098 / 293098Image Successfully Downloaded:  data_scrape/SG/umi019.jpg
100% [............................................................................] 273336 / 273336Image Successfully Downloaded:  data_scrape/SG/umi020.jpg
100% [............................................................................] 296593 / 296593Image Successfully Downloaded:  data_scrape/SG/umi021.jpg
100% [............................................................................] 253360 / 253360Image Successfully Downloaded:  data_scrape/SG/umi022.jpg
100% [............................................................................] 327240 / 327240Image Successfully Downloaded:  data_scrape/SG/umi023.jpg
100% [............................................................................] 205052 / 205052Image Successfully Downloaded:  data_scrape/SG/umi024.jpg
100% [....................................................

100% [............................................................................] 257709 / 257709Image Successfully Downloaded:  data_scrape/SG/tanuki022.jpg
100% [............................................................................] 217565 / 217565Image Successfully Downloaded:  data_scrape/SG/tanuki023.jpg
100% [............................................................................] 504555 / 504555Image Successfully Downloaded:  data_scrape/SG/tanuki024.jpg
100% [............................................................................] 350215 / 350215Image Successfully Downloaded:  data_scrape/SG/tanuki025.jpg
100% [............................................................................] 390379 / 390379Image Successfully Downloaded:  data_scrape/SG/tanuki026.jpg
100% [............................................................................] 246781 / 246781Image Successfully Downloaded:  data_scrape/SG/tanuki027.jpg
100% [..................................

100% [............................................................................] 263047 / 263047Image Successfully Downloaded:  data_scrape/SG/onyourmark024.jpg
100% [............................................................................] 250984 / 250984Image Successfully Downloaded:  data_scrape/SG/onyourmark025.jpg
100% [............................................................................] 291471 / 291471Image Successfully Downloaded:  data_scrape/SG/onyourmark026.jpg
100% [............................................................................] 311549 / 311549Image Successfully Downloaded:  data_scrape/SG/onyourmark027.jpg
100% [............................................................................] 316881 / 316881Image Successfully Downloaded:  data_scrape/SG/onyourmark028.jpg
100% [............................................................................] 275516 / 275516Image Successfully Downloaded:  data_scrape/SG/mimi001.jpg
100% [................

100% [............................................................................] 327191 / 327191Image Successfully Downloaded:  data_scrape/SG/mimi049.jpg
100% [............................................................................] 349763 / 349763Image Successfully Downloaded:  data_scrape/SG/mimi050.jpg
100% [............................................................................] 291888 / 291888Image Successfully Downloaded:  data_scrape/SG/mononoke001.jpg
100% [............................................................................] 275844 / 275844Image Successfully Downloaded:  data_scrape/SG/mononoke002.jpg
100% [............................................................................] 208193 / 208193Image Successfully Downloaded:  data_scrape/SG/mononoke003.jpg
100% [............................................................................] 228069 / 228069Image Successfully Downloaded:  data_scrape/SG/mononoke004.jpg
100% [..............................

100% [............................................................................] 272475 / 272475Image Successfully Downloaded:  data_scrape/SG/chihiro001.jpg
100% [............................................................................] 376802 / 376802Image Successfully Downloaded:  data_scrape/SG/chihiro002.jpg
100% [............................................................................] 441756 / 441756Image Successfully Downloaded:  data_scrape/SG/chihiro003.jpg
100% [............................................................................] 297234 / 297234Image Successfully Downloaded:  data_scrape/SG/chihiro004.jpg
100% [............................................................................] 160027 / 160027Image Successfully Downloaded:  data_scrape/SG/chihiro005.jpg
100% [............................................................................] 224296 / 224296Image Successfully Downloaded:  data_scrape/SG/chihiro006.jpg
100% [............................

100% [............................................................................] 328522 / 328522Image Successfully Downloaded:  data_scrape/SG/baron003.jpg
100% [............................................................................] 259024 / 259024Image Successfully Downloaded:  data_scrape/SG/baron004.jpg
100% [............................................................................] 283802 / 283802Image Successfully Downloaded:  data_scrape/SG/baron005.jpg
100% [............................................................................] 348317 / 348317Image Successfully Downloaded:  data_scrape/SG/baron006.jpg
100% [............................................................................] 228497 / 228497Image Successfully Downloaded:  data_scrape/SG/baron007.jpg
100% [............................................................................] 321405 / 321405Image Successfully Downloaded:  data_scrape/SG/baron008.jpg
100% [........................................

100% [............................................................................] 453842 / 453842Image Successfully Downloaded:  data_scrape/SG/howl006.jpg
100% [............................................................................] 180473 / 180473Image Successfully Downloaded:  data_scrape/SG/howl007.jpg
100% [............................................................................] 277406 / 277406Image Successfully Downloaded:  data_scrape/SG/howl008.jpg
100% [............................................................................] 401433 / 401433Image Successfully Downloaded:  data_scrape/SG/howl009.jpg
100% [............................................................................] 296796 / 296796Image Successfully Downloaded:  data_scrape/SG/howl011.jpg
100% [............................................................................] 245405 / 245405Image Successfully Downloaded:  data_scrape/SG/howl012.jpg
100% [..............................................

100% [............................................................................] 407247 / 407247Image Successfully Downloaded:  data_scrape/SG/ged009.jpg
100% [............................................................................] 207172 / 207172Image Successfully Downloaded:  data_scrape/SG/ged011.jpg
100% [............................................................................] 192901 / 192901Image Successfully Downloaded:  data_scrape/SG/ged012.jpg
100% [............................................................................] 272664 / 272664Image Successfully Downloaded:  data_scrape/SG/ged013.jpg
100% [............................................................................] 245726 / 245726Image Successfully Downloaded:  data_scrape/SG/ged014.jpg
100% [............................................................................] 144093 / 144093Image Successfully Downloaded:  data_scrape/SG/ged015.jpg
100% [....................................................

100% [............................................................................] 179104 / 179104Image Successfully Downloaded:  data_scrape/SG/karigurashi015.jpg
100% [............................................................................] 232848 / 232848Image Successfully Downloaded:  data_scrape/SG/karigurashi016.jpg
100% [............................................................................] 234917 / 234917Image Successfully Downloaded:  data_scrape/SG/karigurashi017.jpg
100% [............................................................................] 200921 / 200921Image Successfully Downloaded:  data_scrape/SG/karigurashi018.jpg
100% [............................................................................] 314656 / 314656Image Successfully Downloaded:  data_scrape/SG/karigurashi019.jpg
100% [............................................................................] 274987 / 274987Image Successfully Downloaded:  data_scrape/SG/karigurashi020.jpg
100% [....

100% [............................................................................] 348561 / 348561Image Successfully Downloaded:  data_scrape/SG/kazetachinu016.jpg
100% [............................................................................] 359433 / 359433Image Successfully Downloaded:  data_scrape/SG/kazetachinu017.jpg
100% [............................................................................] 223129 / 223129Image Successfully Downloaded:  data_scrape/SG/kazetachinu018.jpg
100% [............................................................................] 357189 / 357189Image Successfully Downloaded:  data_scrape/SG/kazetachinu019.jpg
100% [............................................................................] 167451 / 167451Image Successfully Downloaded:  data_scrape/SG/kazetachinu020.jpg
100% [............................................................................] 292229 / 292229Image Successfully Downloaded:  data_scrape/SG/kazetachinu021.jpg
100% [....

<hr>

#### Image Mirroring

Recognizing that I do not have as many still images as I would like for training, I thought of different ways of how to increase the number of SG images. Ultimately, I decide to mirror each of the images left to right that will double the quantity while still maintaining the provenance of the image. We will be PIL module and os module to accomplish this.

In [11]:
#get image path using os.walk
#use PIL to flip image and save

for root, dirs, files in os.walk("data_scrape/SG", topdown=False):
    for i, name in enumerate(files):
        location = os.path.join(root, name)
        print(name, i)
        img = Image.open(location)
        out = img.transpose(PIL.Image.FLIP_LEFT_RIGHT)
        out.save(f'data_scrape/SG/{i}.jpg', format='JPEG')

baron001.jpg 0
baron002.jpg 1
baron003.jpg 2
baron004.jpg 3
baron005.jpg 4
baron006.jpg 5
baron007.jpg 6
baron008.jpg 7
baron009.jpg 8
baron011.jpg 9
baron012.jpg 10
baron013.jpg 11
baron014.jpg 12
baron015.jpg 13
baron016.jpg 14
baron017.jpg 15
baron018.jpg 16
baron019.jpg 17
baron020.jpg 18
baron021.jpg 19
baron022.jpg 20
baron023.jpg 21
baron024.jpg 22
baron025.jpg 23
baron026.jpg 24
baron027.jpg 25
baron028.jpg 26
baron029.jpg 27
baron030.jpg 28
baron031.jpg 29
baron032.jpg 30
baron033.jpg 31
baron034.jpg 32
baron035.jpg 33
baron036.jpg 34
baron037.jpg 35
baron038.jpg 36
baron039.jpg 37
baron040.jpg 38
baron041.jpg 39
baron042.jpg 40
baron043.jpg 41
baron044.jpg 42
baron045.jpg 43
baron046.jpg 44
baron047.jpg 45
baron048.jpg 46
baron049.jpg 47
baron050.jpg 48
chihiro001.jpg 49
chihiro002.jpg 50
chihiro003.jpg 51
chihiro004.jpg 52
chihiro005.jpg 53
chihiro006.jpg 54
chihiro007.jpg 55
chihiro008.jpg 56
chihiro009.jpg 57
chihiro011.jpg 58
chihiro012.jpg 59
chihiro013.jpg 60
chihiro014

marnie004.jpg 444
marnie005.jpg 445
marnie006.jpg 446
marnie007.jpg 447
marnie008.jpg 448
marnie009.jpg 449
marnie011.jpg 450
marnie012.jpg 451
marnie013.jpg 452
marnie014.jpg 453
marnie015.jpg 454
marnie016.jpg 455
marnie017.jpg 456
marnie018.jpg 457
marnie019.jpg 458
marnie020.jpg 459
marnie021.jpg 460
marnie022.jpg 461
marnie023.jpg 462
marnie024.jpg 463
marnie025.jpg 464
marnie026.jpg 465
marnie027.jpg 466
marnie028.jpg 467
marnie029.jpg 468
marnie030.jpg 469
marnie031.jpg 470
marnie032.jpg 471
marnie033.jpg 472
marnie034.jpg 473
marnie035.jpg 474
marnie036.jpg 475
marnie037.jpg 476
marnie038.jpg 477
marnie039.jpg 478
marnie040.jpg 479
marnie041.jpg 480
marnie042.jpg 481
marnie043.jpg 482
marnie044.jpg 483
marnie045.jpg 484
marnie046.jpg 485
marnie047.jpg 486
marnie048.jpg 487
marnie049.jpg 488
marnie050.jpg 489
mimi001.jpg 490
mimi002.jpg 491
mimi003.jpg 492
mimi004.jpg 493
mimi005.jpg 494
mimi006.jpg 495
mimi007.jpg 496
mimi008.jpg 497
mimi009.jpg 498
mimi011.jpg 499
mimi012.jpg 

totoro045.jpg 903
totoro046.jpg 904
totoro047.jpg 905
totoro048.jpg 906
totoro049.jpg 907
totoro050.jpg 908
umi001.jpg 909
umi002.jpg 910
umi003.jpg 911
umi004.jpg 912
umi005.jpg 913
umi006.jpg 914
umi007.jpg 915
umi008.jpg 916
umi009.jpg 917
umi011.jpg 918
umi012.jpg 919
umi013.jpg 920
umi014.jpg 921
umi015.jpg 922
umi016.jpg 923
umi017.jpg 924
umi018.jpg 925
umi019.jpg 926
umi020.jpg 927
umi021.jpg 928
umi022.jpg 929
umi023.jpg 930
umi024.jpg 931
umi025.jpg 932
umi026.jpg 933
umi027.jpg 934
umi028.jpg 935
umi029.jpg 936
umi030.jpg 937
umi031.jpg 938
umi032.jpg 939
umi033.jpg 940
umi034.jpg 941
umi035.jpg 942
umi036.jpg 943
umi037.jpg 944
umi038.jpg 945
umi039.jpg 946
umi040.jpg 947
umi041.jpg 948
umi042.jpg 949
umi043.jpg 950
umi044.jpg 951
umi045.jpg 952
umi046.jpg 953
umi047.jpg 954
umi048.jpg 955
umi049.jpg 956
umi050.jpg 957


The quantity of images of successful been doubled to 1,916 images.
<hr>

#### SG Test Images

I will be reserving 380 SG images (about 20%) as a test for model evaluation. Images will be randomly selected and moved to the working directory 'data/SG/testA'. The remainder of the images will be moved to 'data/SG/trainA'.

In [18]:
#moving 380 random files
for i in range(380):
    random_img = random.choice(os.listdir("data_scrape/SG"))
    shutil.move(f"data_scrape/SG/{random_img}", f"data/SG/testA/{random_img}")

In [19]:
#moving 1916-380=1536 remaining files
for i in range(1536):
    random_img = random.choice(os.listdir("data_scrape/SG"))
    shutil.move(f"data_scrape/SG/{random_img}", f"data/SG/trainA/{random_img}")

The scraped folder should now be empty of any images.

#### Conclusion 

As demonstrated in this notebook, I have successfully scraped the Studio Ghibli website for the images to be used in cycleGAN as well as artificially increased the potential training images by mirroring each scraped image. In addition, 380 random SG images were reserved as a test for later notebooks to measure the performance of the model.

In the next notebook `[2] CycleGAN Training`, we will building the actual cycleGAN and use this data to train our model.