# Assignment 6: Neural Networks


## 1. Backpropagation - Essentials

As you examined in class, a simple layer in a feedforward neural network can be expressed as the following:

$$h = Wx + b$$

$$t = \sigma(h)$$

$$\mathcal{L} = \frac{1}{2}(y - t)^2$$

where $x$ is the input, $W$ is the weight matrix at this node, $b$ is the bias added at the node, $\sigma(\cdot)$ is the activation function, $y$ is the label, and $\mathcal{L}$ is the loss.

The activation function and the loss function (squared loss used here) are choices made when creating a neural network. 


### a. What are the unknowns in the problem?

TODO:


### b. What do we want minimize?

TODO:

### c. What method could we use to find the unknowns?

TODO:

### d. Find the partial derivatives of L with respect to the unknowns. 

Assume we use ReLU for the activation function.

TODO:

## 2. Backpropogation

A neural network is regarded as compositional, in that the output of one layer feeds in as the input to the next layer. Using the the same notation as above but ignoring the bias $b$ for simplicity:

$$t = \sigma_L(W_L \sigma_{L-1}(...\sigma_2(W_2 \sigma_1(W_1x))...))$$

Here $x$ is the original input data, and $t$ is the output of the neural network.

Even more simply, we can look at each layer L:

$$N_1\rightarrow N_2\rightarrow N_3\rightarrow ... N_{L-1}\rightarrow N_L $$

The idea here is the same - we will need to solve for partial derivatives for each layer to set the unknowns. As the previous layer feeds into the next, you can only solve for a Jacobian (vector of partials) one wrt one layer down e.g. we can first solve for

$$ J_{N_L} (N_{L-1})$$

the Jacobian of $N_L$ with respect to $N_{L-1}$

### a. For the above simple representation, write out the Jacobian of the the final layer with respect to the first layer.

TODO:

### b. Based on the equation you've described above, explain using time or space complexity why the best way to solve for the gradient in 2a. is to work backwards.

TODO:

## 3. Simple Neural Network

Here you'll try out writing a neural network for a simple classification problem. For full credit, the final test accuracy should be above 0.6.

The dataset is of cell images from thin blood smear slides of segmented cells, with labels indicating the presence of malaria.

Source: https://lhncbc.nlm.nih.gov/publication/pub9932

Paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6544011/

Some setup to start with:

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
import tensorflow_datasets as tfds

In [2]:
tf.enable_eager_execution()

AttributeError: module 'tensorflow' has no attribute 'enable_eager_execution'

In [3]:
malaria, info = tfds.load(name="malaria", split="train", with_info=True)
malaria = malaria.shuffle(30000).prefetch(tf.data.experimental.AUTOTUNE)
info

DatasetNotFoundError: Dataset malaria not found. Available datasets:
	- abstract_reasoning
	- aflw2k3d
	- amazon_us_reviews
	- bair_robot_pushing_small
	- bigearthnet
	- binarized_mnist
	- binary_alpha_digits
	- caltech101
	- caltech_birds2010
	- caltech_birds2011
	- cats_vs_dogs
	- celeb_a
	- celeb_a_hq
	- chexpert
	- cifar10
	- cifar100
	- cifar10_corrupted
	- clevr
	- cnn_dailymail
	- coco
	- coco2014
	- coil100
	- colorectal_histology
	- colorectal_histology_large
	- curated_breast_imaging_ddsm
	- cycle_gan
	- deep_weeds
	- definite_pronoun_resolution
	- diabetic_retinopathy_detection
	- downsampled_imagenet
	- dsprites
	- dtd
	- dummy_dataset_shared_generator
	- dummy_mnist
	- emnist
	- eurosat
	- fashion_mnist
	- flores
	- food101
	- gap
	- glue
	- groove
	- higgs
	- horses_or_humans
	- image_label_folder
	- imagenet2012
	- imagenet2012_corrupted
	- imdb_reviews
	- iris
	- kitti
	- kmnist
	- lfw
	- lm1b
	- lsun
	- mnist
	- mnist_corrupted
	- moving_mnist
	- multi_nli
	- nsynth
	- omniglot
	- open_images_v4
	- oxford_flowers102
	- oxford_iiit_pet
	- para_crawl
	- patch_camelyon
	- pet_finder
	- quickdraw_bitmap
	- resisc45
	- rock_paper_scissors
	- rock_you
	- scene_parse150
	- shapes3d
	- smallnorb
	- snli
	- so2sat
	- squad
	- stanford_dogs
	- stanford_online_products
	- starcraft_video
	- sun397
	- super_glue
	- svhn_cropped
	- ted_hrlr_translate
	- ted_multi_translate
	- tf_flowers
	- titanic
	- trivia_qa
	- uc_merced
	- ucf101
	- visual_domain_decathlon
	- voc2007
	- wikipedia
	- wmt14_translate
	- wmt15_translate
	- wmt16_translate
	- wmt17_translate
	- wmt18_translate
	- wmt19_translate
	- wmt_t2t_translate
	- wmt_translate
	- xnli
Check that:
    - the dataset name is spelled correctly
    - dataset class defines all base class abstract methods
    - dataset class is not in development, i.e. if IN_DEVELOPMENT=True
    - the module defining the dataset class is imported


In [0]:
# Visualize some images
plt.figure(figsize=(12,12))

for i, feature in enumerate(malaria.take(4)):
    image = feature["image"].numpy()
    label = feature["label"].numpy()
    
    plt.subplot(2, 2, i+1)
    plt.title("Label: "+str(label))
    plt.imshow(image)
    # i+=1
plt.show()

### a. Extract some samples from the malaria dataset

Hints:

* Keep the total number of samples small ( < 10000) - it largely depends on your memory (if your notebook starts to crash, reduce the number of samples and try again)
* The dimension of each image is height * width * 3, with the 3 representing the number of channels 
* The height and width of the images aren't all the same, so resize all of them to be 133 by 133 (see [tf.image.resize](https://www.tensorflow.org/api_docs/python/tf/image/resize))
* The possibles labels are 0s and 1s (scalars)
* Split into a training and testing set (a split like 80:20 train to test is reasonable)


In [0]:
# TODO: Initialize to the correct shapes with zeros
train_images = ...
train_labels = ...
test_images = ...
test_labels = ...

# TODO: Fill in the splits above
total_sample_size = ...
for i, feature in enumerate(malaria.take(total_sample_size)):

    ...

### b. Add some layers to the model

Hints:

*   See examples of layers in the Keras documentation: https://keras.io/layers/core/
*   For the first layer, provide an input_shape, which refers to the shape of an image from your dataset

See examples at https://www.tensorflow.org/tutorials



In [0]:
model = keras.Sequential()
# TODO:
model.add(...)

### c. Choose how to train the above model

Pick an optimizer, loss function, and metric. If you choose something not covered in class, give a brief explanation and an advantage of your choice.

*   Optimizers: https://keras.io/optimizers/
*   Losses: https://keras.io/losses/
*   Metrics: https://keras.io/metrics/

### Reasoning:

TODO:


In [0]:
# TODO:
opt = ...
loss_func = ...
metric = ["..."] 

In [0]:
model.compile(optimizer=opt,
              loss=loss_func,
              metrics=metric)

### d. Train the model

Choose an appropriate number of epochs (Hint: try some different values)

In [0]:
# TODO:
num_epochs = ...

model.fit(train_images, train_labels, epochs=num_epochs)

### e. Evaluate based on the testing set

Must be greater that 0.6 for full credit

In [0]:
test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)

print('\nTest accuracy:', test_acc)

### f. Based on the above accuracies between the testing and training sets, did you overfit while training?

TODO:


### g. (Extra Credit) Improve your model to achieve an accuracy of greater than 0.75 on the testing set.