I decided to use a dataset of different gemstones (87 categories) I found, partly because the images just look pretty. Unfortunately, this dataset only had a train/test split instead of train/test/dev, so I was unable to score how well my model generalized. The dataset was also a bit unbalanced, and used images of non-uniform size.

[Link: https://www.kaggle.com/lsind18/gemstones-images]

I got some coding help from the following sites, but the majority of code is original (except the GoogLeNet):

[Link: https://www.tensorflow.org/tutorials/images/cnn]

[GoogLeNet: https://www.analyticsvidhya.com/blog/2018/10/understanding-inception-network-from-scratch/]

For the Part 1 model, I decided to alternate Convolution and Max Pooling layers, as I have done the same when building CNNs in the past, and it has worked relatively well. I used a 3x3 filter for convolution, and a 2x2 for Max Pooling, while increasing the depth to 32 and 64, in order to retain the most local information about the image between layers. Finally, I repeated this three times total, to gain the most information out of the image as possible.

In Part 2, I implemented GoogLeNet (I had a lot of help), but changed certain variables to match my dataset more accurately. Unfortunately, I had to drastically decrease the number of epochs because the runtime was simply abhorrent, and I wanted to actually be able to turn this assignment in.

In Part 3, I was able to rotate the images and flip them both horizontally and vertically (while loading them in the ImageDataGenerator), because gemstones in particular don't exactly need a sense of orientation to be a type of gemstone.

In [1]:
import tensorflow as tf
from tensorflow import keras
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.models import Model
from keras.layers import Conv2D, MaxPooling2D, MaxPool2D, GlobalAveragePooling2D, AveragePooling2D
from keras.layers import Activation, Dropout, Flatten, Dense, Input, concatenate

from tensorflow.keras import datasets, layers, models
from keras.callbacks import LearningRateScheduler
import matplotlib.pyplot as plt

!unzip archive\ \(2\).zip

train_data_dir = "train"
test_data_dir = "test"
img_width = 255
img_height= 255
batch_size = 1



train_datagen = ImageDataGenerator(
                rescale = 1. / 255,
                 shear_range = 0.2,
                  zoom_range = 0.2,
                  rotation_range = 360,
                  vertical_flip=True,
            horizontal_flip = True)
  
test_datagen = ImageDataGenerator(rescale = 1. / 255)
  
train_generator = train_datagen.flow_from_directory(train_data_dir,
                              target_size =(img_width, img_height), color_mode = "rgb",
                     batch_size = 100, class_mode ='categorical')
  
test_generator = test_datagen.flow_from_directory(
                                    test_data_dir,
                   target_size =(img_width, img_height), color_mode = "rgb",
          batch_size = 33, class_mode ='categorical')
  



class_names = ['Alexandrite', 'Almandine', 'Amazonite', 'Amber', 'Amethyst',
               'Ametrine', 'Andalusite', 'Andradite', 'Aquamarine', 'Aventurine Green',
               'Aventurine Yellow', 'Benitoite', 'Beryl Golden', 'Bixbite', 'Bloodstone',
               'Blue Lace Agate', 'Carnelian', 'Cats Eye', 'Chalcedony', 'Chalcedony Blue',
               'Chrome Diopside', 'Chrysoberyl', 'Chrysocolla', 'Chrysoprase', 'Citrine',
               'Coral', 'Danburite', 'Diamond', 'Diaspore', 'Dumortierite',
               'Emerald', 'Fluorite', 'Garnet Red', 'Goshenite', 'Grossular',
               'Hessonite', 'Hiddenite', 'Iolite', 'Jade', 'Jasper',
               'Kunzite', 'Kyanite', 'Labradorite', 'Lapis Lazuli', 'Larimar', 
               'Malachite', 'Moonstone', 'Morganite', 'Onyx Black', 'Onyx Green', 
               'Onyx Red', 'Opal', 'Pearl', 'Peridot', 'Prehnite', 
               'Pyrite', 'Pyrope', 'Quartz Beer', 'Quartz Lemon', 'Quartz Rose', 
               'Quartz Rutilated', 'Quartz Smoky', 'Rhodochrosite', 'Rhodolite', 'Rhodonite', 
               'Ruby', 'Sapphire Blue', 'Sapphire Pink', 'Sapphire Purple', 'Sapphire Yellow', 
               'Scapolite', 'Serpentine', 'Sodalite', 'Spassartite', 'Sphene', 
               'Spinel', 'Spodumene', 'Sunstone', 'Tanzanite', 'Tigers Eye', 
               'Topaz', 'Tourmaline', 'Tsavorite','Turquoise', 'Variscite', 
               'Zircon', 'Zoisite']

'''
plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i])
    # The CIFAR labels happen to be arrays, 
    # which is why you need the extra index
    plt.xlabel(class_names[train_labels[i][0]])
plt.show()
'''

#Base Model-----------------------------------------------------------------------------


model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(255, 255, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(32, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))

model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(87))
model.add(Activation('sigmoid'))

model.compile(loss ='categorical_crossentropy',
                     optimizer ='rmsprop',
                   metrics =['categorical_accuracy'])

model.summary()

'''
model.fit(train_generator,
      steps_per_epoch = 28,
      epochs = 10, validation_data = test_generator,
      validation_steps = 11)
'''

#GoogLeNet-----------------------------------------------------------------------------------
def inception_module(x,
                     filters_1x1,
                     filters_3x3_reduce,
                     filters_3x3,
                     filters_5x5_reduce,
                     filters_5x5,
                     filters_pool_proj,
                     name=None):
    
    conv_1x1 = Conv2D(filters_1x1, (1, 1), padding='same', activation='relu', kernel_initializer=kernel_init, bias_initializer=bias_init)(x)
    
    conv_3x3 = Conv2D(filters_3x3_reduce, (1, 1), padding='same', activation='relu', kernel_initializer=kernel_init, bias_initializer=bias_init)(x)
    conv_3x3 = Conv2D(filters_3x3, (3, 3), padding='same', activation='relu', kernel_initializer=kernel_init, bias_initializer=bias_init)(conv_3x3)

    conv_5x5 = Conv2D(filters_5x5_reduce, (1, 1), padding='same', activation='relu', kernel_initializer=kernel_init, bias_initializer=bias_init)(x)
    conv_5x5 = Conv2D(filters_5x5, (5, 5), padding='same', activation='relu', kernel_initializer=kernel_init, bias_initializer=bias_init)(conv_5x5)

    pool_proj = MaxPool2D((3, 3), strides=(1, 1), padding='same')(x)
    pool_proj = Conv2D(filters_pool_proj, (1, 1), padding='same', activation='relu', kernel_initializer=kernel_init, bias_initializer=bias_init)(pool_proj)

    output = concatenate([conv_1x1, conv_3x3, conv_5x5, pool_proj], axis=3, name=name)
    
    return output

kernel_init = keras.initializers.glorot_uniform()
bias_init = keras.initializers.Constant(value=0.2)

input_layer = Input(shape=(224, 224, 3))

x = Conv2D(64, (7, 7), padding='same', strides=(2, 2), activation='relu', name='conv_1_7x7/2', kernel_initializer=kernel_init, bias_initializer=bias_init)(input_layer)
x = MaxPool2D((3, 3), padding='same', strides=(2, 2), name='max_pool_1_3x3/2')(x)
x = Conv2D(64, (1, 1), padding='same', strides=(1, 1), activation='relu', name='conv_2a_3x3/1')(x)
x = Conv2D(192, (3, 3), padding='same', strides=(1, 1), activation='relu', name='conv_2b_3x3/1')(x)
x = MaxPool2D((3, 3), padding='same', strides=(2, 2), name='max_pool_2_3x3/2')(x)

x = inception_module(x,
                     filters_1x1=64,
                     filters_3x3_reduce=96,
                     filters_3x3=128,
                     filters_5x5_reduce=16,
                     filters_5x5=32,
                     filters_pool_proj=32,
                     name='inception_3a')

x = inception_module(x,
                     filters_1x1=128,
                     filters_3x3_reduce=128,
                     filters_3x3=192,
                     filters_5x5_reduce=32,
                     filters_5x5=96,
                     filters_pool_proj=64,
                     name='inception_3b')

x = MaxPool2D((3, 3), padding='same', strides=(2, 2), name='max_pool_3_3x3/2')(x)

x = inception_module(x,
                     filters_1x1=192,
                     filters_3x3_reduce=96,
                     filters_3x3=208,
                     filters_5x5_reduce=16,
                     filters_5x5=48,
                     filters_pool_proj=64,
                     name='inception_4a')


x1 = AveragePooling2D((5, 5), strides=3)(x)
x1 = Conv2D(128, (1, 1), padding='same', activation='relu')(x1)
x1 = Flatten()(x1)
x1 = Dense(1024, activation='relu')(x1)
x1 = Dropout(0.7)(x1)
x1 = Dense(87, activation='softmax', name='auxilliary_output_1')(x1)

x = inception_module(x,
                     filters_1x1=160,
                     filters_3x3_reduce=112,
                     filters_3x3=224,
                     filters_5x5_reduce=24,
                     filters_5x5=64,
                     filters_pool_proj=64,
                     name='inception_4b')

x = inception_module(x,
                     filters_1x1=128,
                     filters_3x3_reduce=128,
                     filters_3x3=256,
                     filters_5x5_reduce=24,
                     filters_5x5=64,
                     filters_pool_proj=64,
                     name='inception_4c')

x = inception_module(x,
                     filters_1x1=112,
                     filters_3x3_reduce=144,
                     filters_3x3=288,
                     filters_5x5_reduce=32,
                     filters_5x5=64,
                     filters_pool_proj=64,
                     name='inception_4d')


x2 = AveragePooling2D((5, 5), strides=3)(x)
x2 = Conv2D(128, (1, 1), padding='same', activation='relu')(x2)
x2 = Flatten()(x2)
x2 = Dense(1024, activation='relu')(x2)
x2 = Dropout(0.7)(x2)
x2 = Dense(87, activation='softmax', name='auxilliary_output_2')(x2)

x = inception_module(x,
                     filters_1x1=256,
                     filters_3x3_reduce=160,
                     filters_3x3=320,
                     filters_5x5_reduce=32,
                     filters_5x5=128,
                     filters_pool_proj=128,
                     name='inception_4e')

x = MaxPool2D((3, 3), padding='same', strides=(2, 2), name='max_pool_4_3x3/2')(x)

x = inception_module(x,
                     filters_1x1=256,
                     filters_3x3_reduce=160,
                     filters_3x3=320,
                     filters_5x5_reduce=32,
                     filters_5x5=128,
                     filters_pool_proj=128,
                     name='inception_5a')

x = inception_module(x,
                     filters_1x1=384,
                     filters_3x3_reduce=192,
                     filters_3x3=384,
                     filters_5x5_reduce=48,
                     filters_5x5=128,
                     filters_pool_proj=128,
                     name='inception_5b')

x = GlobalAveragePooling2D(name='avg_pool_5_3x3/1')(x)

x = Dropout(0.4)(x)

x = Dense(87, activation='softmax', name='output')(x)

model = Model(input_layer, [x, x1, x2], name='inception_v1')

epochs = 4
initial_lrate = 0.01

def decay(epoch, steps=100):
    initial_lrate = 0.01
    drop = 0.96
    epochs_drop = 8
    lrate = initial_lrate * math.pow(drop, math.floor((1+epoch)/epochs_drop))
    return lrate

#sgd = SGD(lr=initial_lrate, momentum=0.9, nesterov=False)

lr_sc = LearningRateScheduler(decay, verbose=1)

model.compile(loss=['categorical_crossentropy', 'categorical_crossentropy', 'categorical_crossentropy'], loss_weights=[1, 0.3, 0.3],optimizer ='rmsprop',metrics =['categorical_accuracy'])

model.fit(train_generator,
      steps_per_epoch = 28,
      epochs = 4, validation_data = test_generator,
      validation_steps = 11)

Archive:  archive (2).zip
replace test/Alexandrite/alexandrite_18.jpg? [y]es, [n]o, [A]ll, [N]one, [r]ename: N
Found 2856 images belonging to 87 classes.
Found 363 images belonging to 87 classes.
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 253, 253, 32)      896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 126, 126, 32)      0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 124, 124, 32)      9248      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 62, 62, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 60, 60, 64)        18496     
__________________________________________________________

<keras.callbacks.History at 0x7f5b4ff54b50>

Task 1 Results:

The first network performed reasonably well, earning a 42% categorical accuracy score in around 35 minutes.

Inception V3 actually performed surprisingly poorly. It took a full hour to run just four epochs, and by the time it had completed, none of its output accuracies came even close to the first CNN, including when the first CNN was at its fourth epoch. Usually when a traditionally 'better' model performs worse, I am able to find some small area that could explain the difference. However, the difference here is significant that I am positively baffled.

Making the images rotate and flip earned a 43% categorical accuracy score in around 37 minutes. This is not too far from the first network, which makes sense as they used the same model. The data augmentation step seemed to trade off accuracy for runtime, but in the real world I would imagine it would generalize much better due to the fact that it understands the gemstones from multiple 'angles'.

For Task 2, I found a dataset of Bob Ross paintings, and thought it would be cool to see if a VAE or GAN could replicate them. Unfortunately, at this point I pretty much out of time and will likely not finish this part of the task. I read through and understood how to implement both VAEs and GANs (I had ample time while Task 1 was running for hours), and if I had more time I would likely only need to copy-paste large portions of code, then tune hyperparameters. When I originally found the dataset, I got really excited for this portion of the homework (I might even do it just for fun after submitting this), but I simply do not have enough time to get my code running, much less wait for the training. At least I learned to never underestimate the time it takes to train a model!

[Link: https://www.kaggle.com/residentmario/segmented-bob-ross-images]

In [6]:
from IPython import display

import glob
import imageio
import matplotlib.pyplot as plt
import numpy as np
import PIL
import tensorflow as tf
import tensorflow_probability as tfp
import time
from PIL import Image


!unzip archive\ \(3\).zip

def preprocess_images(images):
  images = images.reshape((images.shape[0], 28, 28, 1)) / 255.
  return np.where(images > .5, 1.0, 0.0).astype('float32')

images = []
for f in glob.iglob("train/images/*"):
    images.append(np.asarray(Image.open(f)))

images = np.array(images)


train_images = preprocess_images(images)
test_images = preprocess_images(test_images)

Archive:  archive (3).zip
replace labels.csv? [y]es, [n]o, [A]ll, [N]one, [r]ename: N


ValueError: ignored