# Project - deep learning modeling and optimization

In this project you'll be required to implement an architecture of a network, train it on dataset while comparing different optimizers and eventually optimize it using TensorRT.

## Implement and train the model
Hereby shown the architecture of a well known classifier VGG-19:

|Layer Type|	Feature Map|	Size	|Kernel Size|	Stride	|Activation|
| :-: | :-: | :-: | :-: | :-: | :-: |
|Image|	1	|224×224|	–|	–|	–|
|Convolution|	64|	224×224|	3×3|	1|	ReLU|
|Convolution|	64|	224×224|	3×3|	1|	ReLU|
|Max Pooling|	64|	112×112|	2×2|	2|	–|
|Convolution|	128|	112×112|	3×3|	1|	ReLU|
|Convolution|	128|	112×112|	3×3|	1|	ReLU|
|Max Pooling|	128|	56×56|	2×2|	2|	–|
|Convolution|	256|	56×56|	3×3|	1|	ReLU|
|Convolution|	256|	56×56|	3×3|	1|	ReLU|
|Convolution|	256|	56×56|	3×3|	1|	ReLU|
|Convolution|	256|	56×56|	3×3|	1|	ReLU|
|Max Pooling|	256|	28×28|	2×2|	2|	–|
|Convolution|	512|	28×28|	3×3|	1|	ReLU|
|Convolution|	512|	28×28|	3×3|	1|	ReLU|
|Convolution|	512|	28×28|	3×3|	1|	ReLU|
|Convolution|	512|	28×28|	3×3|	1|	ReLU|
|Max Pooling|	512|	14×14|	2×2|	2|	–|
|Convolution|	512|	14×14|	3×3|	1|	ReLU|
|Convolution|	512|	14×14|	3×3|	1|	ReLU|
|Convolution|	512|	14×14|	3×3|	1|	ReLU|
|Convolution|	512|	14×14|	3×3|	1|	ReLU|
|Max Pooling|	512|	7×7|	2×2|	2|	–|
|Fully Connected|	–|	4096|	–|	–|	ReLU|
|Fully Connected|	–|	4096|	–|	–|	ReLU|
|Fully Connected|	–|	1000|	–|	–|	Softmax|

Please implement this network architecture in tensorflow and load pretrained weights into it.

Choose the proper metrics to evaluate model performance and perform model evaluation.


### Import necessary libs

In [3]:
#!pip3 install tensorflow-datasets==4.1.0

distutils: /usr/local/lib/python3.8/dist-packages
sysconfig: /usr/lib/python3.8/site-packages[0m
distutils: /usr/local/lib/python3.8/dist-packages
sysconfig: /usr/lib/python3.8/site-packages[0m
distutils: /usr/local/include/python3.8/UNKNOWN
sysconfig: /usr/include/python3.8[0m
distutils: /usr/local/bin
sysconfig: /usr/bin[0m
distutils: /usr/local
sysconfig: /usr[0m
user = False
home = None
root = None
prefix = None[0m
Collecting tensorflow-datasets==4.1.0
  Downloading tensorflow_datasets-4.1.0-py3-none-any.whl (3.6 MB)
[K     |████████████████████████████████| 3.6 MB 19.9 MB/s eta 0:00:01
Installing collected packages: tensorflow-datasets
  distutils: /usr/local/lib/python3.8/dist-packages
  sysconfig: /usr/lib/python3.8/site-packages[0m
  distutils: /usr/local/lib/python3.8/dist-packages
  sysconfig: /usr/lib/python3.8/site-packages[0m
  distutils: /usr/local/include/python3.8/tensorflow-datasets
  sysconfig: /usr/include/python3.8/tensorflow-datasets[0m
  distutils: /usr/

In [4]:
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf

import tensorflow_datasets as tfds

### Load the data

In [5]:
ds, info = tfds.load('imagenet_v2', split='test', with_info=True)

tfds.as_dataframe(ds.take(4), info)

[1mDownloading and preparing dataset imagenet_v2/matched-frequency/1.0.0 (download: 1.17 GiB, generated: 1.16 GiB, total: 2.33 GiB) to /root/tensorflow_datasets/imagenet_v2/matched-frequency/1.0.0...[0m


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Extraction completed...: 0 file [00:00, ? file/s]

NonMatchingChecksumError: Artifact https://s3-us-west-2.amazonaws.com/imagenetv2public/imagenetv2-matched-frequency.tar.gz, downloaded to /root/tensorflow_datasets/downloads/s3-us-west-2_image_image-match-frequc56VsOLVttUFrJ7Ka21jcV9uodSP_TSQV-yxfB4t3_U.tar.gz.tmp.3c6b74284b5b475f8c0d0fb4113ab11f/imagenetv2-matched-frequency.tar.gz, has wrong checksum. This might indicate:
 * The website may be down (e.g. returned a 503 status code). Please check the url.
 * For Google Drive URLs, try again later as Drive sometimes rejects downloads when too many people access the same URL. See https://github.com/tensorflow/datasets/issues/1482
 * The original datasets files may have been updated. In this case the TFDS dataset builder should be updated to use the new files and checksums. Sorry about that. Please open an issue or send us a PR with a fix.
 * If you're adding a new dataset, don't forget to register the checksums as explained in: https://www.tensorflow.org/datasets/add_dataset#2_run_download_and_prepare_locally


### Data preprocessing

In [None]:
def resize_with_crop(image, label):
    i = image
    i = tf.cast(i, tf.float32)
    i = tf.image.resize_with_crop_or_pad(i, 224, 224)
    i = tf.keras.applications.vgg19.preprocess_input(i)
    return (i, label)

In [None]:
# Preprocess the images
ds = ds.map(resize_with_crop)

### Implement and build model

In [None]:
model = tf.keras.Sequential([
        tf.keras.layers.Conv2D(64, kernel_size=(3, 3), activation='relu',kernel_initializer=he_normal(), padding='same', input_shape=(224, 224, 3)),
        tf.keras.layers.Conv2D(64, kernel_size=(3, 3), activation='relu',kernel_initializer=he_normal(), padding='same'),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
     
        
        tf.keras.layers.Conv2D(128, kernel_size=(3, 3), activation='relu',kernel_initializer=he_normal(), padding='same'),
        tf.keras.layers.Conv2D(128, kernel_size=(3, 3), activation='relu',kernel_initializer=he_normal(), padding='same'),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
     
        
        tf.keras.layers.Conv2D(256, kernel_size=(3, 3), activation='relu',kernel_initializer=he_normal(), padding='same'),
        tf.keras.layers.Conv2D(256, kernel_size=(3, 3), activation='relu',kernel_initializer=he_normal(), padding='same'),
        tf.keras.layers.Conv2D(256, kernel_size=(3, 3), activation='relu',kernel_initializer=he_normal(), padding='same'),
        tf.keras.layers.Conv2D(256, kernel_size=(3, 3), activation='relu',kernel_initializer=he_normal(), padding='same'),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
      

        tf.keras.layers.Conv2D(512, kernel_size=(3, 3), activation='relu',kernel_initializer=he_normal(), padding='same'),
        tf.keras.layers.Conv2D(512, kernel_size=(3, 3), activation='relu',kernel_initializer=he_normal(), padding='same'),
        tf.keras.layers.Conv2D(512, kernel_size=(3, 3), activation='relu',kernel_initializer=he_normal(), padding='same'),
        tf.keras.layers.Conv2D(512, kernel_size=(3, 3), activation='relu',kernel_initializer=he_normal(), padding='same'),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),

        tf.keras.layers.Conv2D(512, kernel_size=(3, 3), activation='relu',kernel_initializer=he_normal(), padding='same'),
        tf.keras.layers.Conv2D(512, kernel_size=(3, 3), activation='relu',kernel_initializer=he_normal(), padding='same'),
        tf.keras.layers.Conv2D(512, kernel_size=(3, 3), activation='relu',kernel_initializer=he_normal(), padding='same'),
        tf.keras.layers.Conv2D(512, kernel_size=(3, 3), activation='relu',kernel_initializer=he_normal(), padding='same'),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
       
        tf.keras.layers.Flatten(),

        tf.keras.layers.Dense(4096, activation='relu'),
        tf.keras.layers.Dense(4096, activation='relu'),
        tf.keras.layers.Dense(1000, activation='relu')
    
    ]) 

# compile model
opt = tf.keras.optimizers.Adam(learning_rate=lr)
model.compile(optimizer=opt,
          loss=tf.keras.losses.SparseCategoricalCrossentropy(),
          metrics=['accuracy'])

### Load weights to model

In [None]:
# Loads the weights
model.load_weights("model.h5")

### Evalutate the model

In [None]:
# Evaluate the model
loss, acc = model.evaluate(test_images, test_labels, verbose=2)
print("Accuracy: {:5.2f}%".format(100 * acc))

## Optimize the model using TensorRT

After training of the model and evaluating it, your goal is to optimize the model for inference on target machine using TensorRT (use TF-TRT in this project).

Try quantizing the model for different percisions using TensorRT quantization features, compare the different percision modes and recommand what you choose.

> Bonus: if you were working on Tesla T4 GPU, what percision mode had you chosen then?

## Create Box Blur Cuda kernel with Numba

https://en.wikipedia.org/wiki/Box_blur

Follow the algorithm provided for Box blur (3X3 kernel size) and implement in two ways:
1. Using normal loop iteration over an Image
2. Using numba cuda kernel 