# CS 437 - Deep Learning - Assignment 4

*__Submission Instructions:__*
- Rename this notebook to `hw4_rollnumber.ipynb` before submission on LMS.
- Code for all the tasks must be written in this notebook (you do not need to submit any other files).
- The output of all cells must be present in the version of the notebook you submit.
- The university honor code should be maintained. Any violation, if found, will result in disciplinary action. 

In [6]:
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn')
import seaborn as sns

from keras.models import load_model
from keras.applications import vgg16

## Overview

In this assignment you will be exploring a few important concepts used in the deep learning projects:
- Working with satellite imagery data
- Dataset annotation
- Fine-tuning / Transfer Learning
- Unsupervised feature representation with Autoencoder
- Comparison of end-to-end trained model with finetuned model

We will be using two datasets, the links are provided to you. You will also be working with three pretrained models, which have been provided to you. You are **highly** encouraged to explore the datasets and model architectures in order to get the most out of this assignment. 

**_Datasets:_**
- Brick Kiln (Nepal) - available [here]()
- UC Merced Land Use - available [here](http://weegee.vision.ucmerced.edu/datasets/landuse.html)

**_Pretrained Models:_**
- ResNet18 pretrained on Brick Kiln (Lahore) - available [here]()
- Autoencoder pretrained on GT Cross View and fine tuned on UC Merced - available [here]()
- VGG16 pretrained on ImageNet - available in `keras.applications` (consult relevant documentation)

## Task 1

Let's start with a binary classification problem. 

The Brick Kiln (Nepal) dataset you have been given consists of 100 tiles at zoom level 17. A script to break up these tiles into 64 sub-tiles of zoom 20 has also been given to you. Your job is to:
- Split 100 images into 6400 images using the script
- Manually annotate the dataset by moving the kiln pictures into one folder and non-kiln picutures into other folder.
- Code up a generator to properly load the images and corresponding binary labels into a model. You have to resize images into 224X224X3

*Scale images between 0 and 1 and apply mean subtraction in the generator*

*Each of you has been given unique 100 tiles, so for the love of God do not get annotated data from someone else.*

In [None]:
def brick_kiln_generator(...):
    pass

## Task 2

Now you will evaluate performance of a pretrained (on Brick Kiln Lahore dataset) ResNet18 model using the generator made in Task1. You will:
- Obtain predictions for the entire dataset
- Construct a binary confusion matrix and visualize it as a heatmap

*You can use scikit-learn's `metrics.confusion_matrix` function. Consult the relevant documentation.*

## Task 3

Next you will employ Transfer Learning and finetune the pretrained ResNet18 model you used in Task2 to better fit the Brick Kiln (Nepal) dataset. You will:
- Freeze everything except the FC layers and train it using the generator from Task1 (using appropriate hyperparameters)
- Construct a binary confusion matrix and visualize it as a heatmap
- Compare this confusion matrix with the one made in Task2

In [None]:
resnet18_pretrained = load_model(...)
resnet18_pretrained.summary()
resnet18_pretrained.compile(...)

## Task 4

Now we will look at a multiclass classification problem.

The UC Merced Land Use dataset consists of 21 classes, ranging from airplanes to forests to tennis courts. Let's add kilns to it since you worked so hard to annotate the dataset in Task1. You will:
- Download the dataset and add a new folder (following the already existing folder structure) corresponding to brick kilns
- Code up a generator to properly load the images and corresponding 22-class labels into a model. You have to resize images into 224X224X3 for VGG16

*Scale images between 0 and 1 and apply mean subtraction in the generator*

In [None]:
def land_use_generator(...):
    pass

## Task 5

Next you will again employ Transfer Learning and finetune the pretrained (on ImageNet) VGG16 to better fit the modified Land Use dataset. You will:
- Change the number of nodes in the last FC layer according to the number of classes i.e. 22 
- Freeze everything except the FC layers and train it using the generator from Task4 (using appropriate hyperparameters)
- Construct a multi-class confusion matrix and visualize it as a heatmap

In [8]:
vgg_imagenet = vgg16.VGG16(include_top=False, weights='imagenet')
for l in vgg_imagenet.layers:
    l.freeze = True

# add new FC layers here

# print summary and compile

Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5


## Task 6

Now you will make use of Unsupervised Representation Learning as studied in class. You have been provided with a pretrained autoencoder (just the encoder part) and you will use it to obtain deep features for the modified UC Merced Land Use dataset. You will have to:
- Obtain predictions for the entire dataset
- Save then in an appropriate fashion

*Keep in mind that this model takes input of shape 256X256X3 so you need to resize the images before feeding them into this model*

*Try to think about how you could use the generator from Task4 to create another generator which would yield encoded features along with labels instead of raw images*

## Task 7

Now you will train a classifier from scratch to discriminate the 22 classes based on the deep features you extracted in Task6. You will:
- Train a classifier with the following architecture
> 1D conv 3x1 -> 1D conv 3x1 -> FC 256 -> FC 22
- Construct a multiclass confusion matrix and visualize it as a heatmap
- Compare this confusion matrix with the one made in Task5

*The input to this model will be the deep feature tensor obtained in Task6, so use appropriate input shape*

## Task 8

Now you will explore another use of the deep features extracted in Task6. Content Based Image Retrieval (CBIR) is the task of searching for visually similar images from a dataset. *Think Google image search.* This concept can obviously be applied on other forms of data like text, audio or video as well. In this task you will:
- Implement a function which will take three inputs and returns a list of visually similar images. The inputs would be
> An image from the dataset `im` <br />
The number of search results to return `n` <br />
A string representing the distance metric used for comparisons `dist`
- Use some images to compare the effects of these distance metrics on the output
> Euclidean <br />
Cosine <br />
Mahalanobis

*Look up the documentation for Scipy's `spatial.distance` module. It is your best friend in this task.*

*If you made a generator in Task6, you can very easily use it in this task as well*

In [4]:
def cbir(im, n, dist):
    pass