<a href="https://colab.research.google.com/github/trachtok/dspracticum2020_data/blob/main/assignment05/assignmen05.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transfer learning for classification
*Kája Trachtová, Michaela Kecskésová, Martin Špilar, Dagmar Al Tukmachi*

+ goal of this assignment is to take already trained classification model (MobileNetV2) and use it to classify our images
+ there will be 2 classes: Superman vs Batman images

### Workflow of this notebook
1. Load libraries
2. Prepare input data (split images into training and validation set)
3. Download MobileNetV2 model
4. Transfer learning - train MobileNetV2 for our classification
5. Export model to TSjs

## Load libraries

In [1]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import os
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator, load_img
from tensorflow.keras.utils import to_categorical
from google.colab import drive

from tensorflow.keras.preprocessing import image_dataset_from_directory

In [2]:
!pip install split-folders
import splitfolders

Collecting split-folders
  Downloading https://files.pythonhosted.org/packages/b8/5f/3c2b2f7ea5e047c8cdc3bb00ae582c5438fcdbbedcc23b3cc1c2c7aae642/split_folders-0.4.3-py3-none-any.whl
Installing collected packages: split-folders
Successfully installed split-folders-0.4.3


## Prepare input data

+ Mount Google Discs

In [3]:
drive.mount('/content/drive')

Mounted at /content/drive


+ check what data we have, how many images in each class?
+ why there is not the same number of images in each class? **because during model training, accuracy was not high enough so we went through misclassified images and tried to manually delete the most obvious ones (like images where both Batman and Superman were present) -> this helped to increase accuracy**

In [28]:
!ls '/content/drive/My Drive/assignment05_data/batman' | wc -l
!ls '/content/drive/My Drive/assignment05_data/superman' | wc -l

256
289


+ for effective usage of `ImageDataGenerator()` it would be better to have 2 folders: training and validation and in both a specific subset of images for classes batman & superman
+ to obtain such order, we can use package `splitfolders`

In [29]:
splitfolders.ratio("/content/drive/My Drive/assignment05_data", output="/content/drive/My Drive/assignment05_data/prepared_data", seed=1337, ratio=(.8, .2)) 

Copying files: 549 files [00:08, 63.46 files/s]


In [31]:
!ls "/content/drive/My Drive/assignment05_data/prepared_data/train"
!ls "/content/drive/My Drive/assignment05_data/prepared_data/val"

batman	superman
batman	superman


In [32]:
train_dir = "/content/drive/My Drive/assignment05_data/prepared_data/train"
validation_dir = "/content/drive/My Drive/assignment05_data/prepared_data/val"

# Download MobileNetV2
+ download MobileNetV2 model without the top layer

In [33]:
BATCH_SIZE = 32
EPOCHS = 5
IMG_SIZE = (160, 160)
IMG_SHAPE = IMG_SIZE + (3,)

In [34]:
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
                                               include_top=False,
                                               weights='imagenet')
# base_model.summary()

# Transfer learning
+ using already trained model, first freeze all layers

In [35]:
for layer in base_model.layers:
    layer.trainable = False

+ then add pooling layer and dropout, finally put on top `softmax` for classification

In [36]:
x = tf.keras.layers.GlobalAveragePooling2D()(base_model.output)
x = tf.keras.layers.Dropout(0.2)(x)
outputs = tf.keras.layers.Dense(units=2, activation="softmax")(x)

model = tf.keras.Model(inputs=base_model.input, outputs=outputs)
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

+ create `ImageDataGenerator()` instance separately for training a validation set
+ for training dataset, we can set several data augmentation parameters such as rotation, horizontal flip etc. -> it is not generally advised to do the same also for validation dataset, therefore, for validation only neccessary rescaling is set
+ we tried to use several data augmentation parameters, but it only worsened the accuracy of prediction, this might be due to the fact that we do not have enough images to begin with and transforming them only confuses the model?

In [37]:
train_datagen = ImageDataGenerator(
    rotation_range=15,
    rescale=1./255,
)

validation_datagen = ImageDataGenerator(
    rescale=1./255
)

+ create iterator through training and validation dataset with `.flow_from_directory()`

In [38]:
train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    class_mode='categorical',
    shuffle = True)

validation_generator = validation_datagen.flow_from_directory(
    validation_dir,
    target_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    class_mode='categorical', 
    shuffle=True)


Found 435 images belonging to 2 classes.
Found 110 images belonging to 2 classes.


In [39]:
history = model.fit(train_generator,
                    epochs=EPOCHS,
                    validation_data=validation_generator)

Epoch 1/5

  "Palette images with Transparency expressed in bytes should be "


Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


# Export to TSjs

In [40]:
!pip install tensorflowjs

Collecting tensorflowjs
[?25l  Downloading https://files.pythonhosted.org/packages/e8/c8/c52e21c49b3baf0845e395241046a993e244dd4b94c9827a8cd2d9b18927/tensorflowjs-2.7.0-py3-none-any.whl (62kB)
[K     |█████▎                          | 10kB 20.8MB/s eta 0:00:01[K     |██████████▌                     | 20kB 2.9MB/s eta 0:00:01[K     |███████████████▊                | 30kB 3.9MB/s eta 0:00:01[K     |█████████████████████           | 40kB 4.2MB/s eta 0:00:01[K     |██████████████████████████▏     | 51kB 3.5MB/s eta 0:00:01[K     |███████████████████████████████▍| 61kB 3.9MB/s eta 0:00:01[K     |████████████████████████████████| 71kB 3.2MB/s 
Collecting tensorflow-hub<0.10,>=0.7.0
[?25l  Downloading https://files.pythonhosted.org/packages/ac/83/a7df82744a794107641dad1decaad017d82e25f0e1f761ac9204829eef96/tensorflow_hub-0.9.0-py2.py3-none-any.whl (103kB)
[K     |███▏                            | 10kB 24.7MB/s eta 0:00:01[K     |██████▍                         | 20kB 31.9M

In [127]:
import tensorflowjs as tfjs
tfjs.converters.save_keras_model(model, "/content/drive/My Drive/assignment05_data/export_model")

  return h5py.File(h5file)


In [113]:
import shutil
shutil.make_archive("export_model", 'zip', "/content/drive/My Drive/assignment05_data/export_model/", )

from google.colab import files
files.download('export_model.zip')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>