# Implementing your own CNN

## Part I. Preparation

First, we load the CIFAR-10 dataset. This might take a couple minutes the first time you do it, but the files should stay cached after that. PyTorch provides convenient tools to download, preprocess and iterate through our dataset.

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

from torch.utils.data import DataLoader
from torch.utils.data import sampler

from torchvision import datasets, transforms

import numpy as np

In [None]:
NUM_TRAIN = 49000

# The torchvision.transforms package provides tools for preprocessing data
# and for performing data augmentation; here we set up a transform to
# preprocess the data by subtracting the mean RGB value and dividing by the
# standard deviation of each RGB value; we've hardcoded the mean and std.
transform = transforms.Compose([
                transforms.ToTensor(),
                transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
            ])

# We set up a Dataset object for each split (train / val / test); Datasets load
# training examples one at a time, so we wrap each Dataset in a DataLoader which
# iterates through the Dataset and forms minibatches. We divide the CIFAR-10
# training set into train and val sets by passing a Sampler object to the
# DataLoader telling how it should sample from the underlying Dataset.
cifar10_train = datasets.CIFAR10('cifar10', train=True, download=True,
                             transform=transform)
loader_train = DataLoader(cifar10_train, batch_size=16, 
                          sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN)))

cifar10_val = datasets.CIFAR10('cifar10', train=True, download=True,
                           transform=transform)
loader_val = DataLoader(cifar10_val, batch_size=16, 
                        sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN, 50000)))

cifar10_test = datasets.CIFAR10('cifar10', train=False, download=True, 
                            transform=transform)
loader_test = DataLoader(cifar10_test, batch_size=16)

You have an option to **use GPU by setting the flag to True below**. If you do not have CUDA enabled, `torch.cuda.is_available()` will return False and this notebook will fallback to CPU mode.

The global variables `dtype` and `device` will control the data types throughout this assignment. **Make sure you are using Google Colab correctly (GPU) by checking the printed device.**

In [None]:
USE_GPU = True

dtype = torch.float32 # we will be using float throughout this assignment

if USE_GPU and torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')

# Constant to control how frequently we print train loss
print_every = 100

print('using device:', device)

## Part II. PyTorch Module API

PyTorch provides the `nn.Module` API for you to define arbitrary network architectures, while tracking every learnable parameters for you. PyTorch provides the `torch.optim` package that implements all the common optimizers, such as RMSProp, Adagrad, and Adam. It even supports approximate second-order methods like L-BFGS! You can refer to the [doc](http://pytorch.org/docs/master/optim.html) for the exact specifications of each optimizer.

To use the Module API, follow the steps below:

1. Subclass `nn.Module`. Give your network class an intuitive name like `TwoLayerFC`. 

2. In the constructor `__init__()`, define all the layers you need as class attributes. Layer objects like `nn.Linear` and `nn.Conv2d` are themselves `nn.Module` subclasses and contain learnable parameters, so that you don't have to instantiate the raw tensors yourself. `nn.Module` will track these internal parameters for you. Refer to the [doc](http://pytorch.org/docs/master/nn.html) to learn more about the dozens of builtin layers. **Warning**: don't forget to call the `super().__init__()` first!

3. In the `forward()` method, define the *connectivity* of your network. You should use the attributes defined in `__init__` as function calls that take tensor as input and output the "transformed" tensor. Do *not* create any new layers with learnable parameters in `forward()`! All of them must be declared upfront in `__init__`. 

After you define your Module subclass, you can instantiate it as an object and call it.

### Architecture

Implement your own CNN with the architecture described in the table below  (https://pytorch.org/docs/stable/nn.html).  <br>
<br>

| Name     | Kernel | Padding | Channels In/Out |
|:---------|:-------|:--------|:----------------|
| conv1    | 5x5    | 2       | 3/32            |
| relu     | -      | -       | 32/32           |
| maxpool1 | 4x4    | 0       | 32/32           |
| conv2    | 3x3    | 1       | 32/64           |
| relu     | -      | -       | 64/64           |
| maxpool2 | 4x4    | -       | 64/64           | 
| avgpool  | 2x2    | -       | 64/64           |
| linear   | -      | -       | 64/10           |



In [None]:
class ConvNet(nn.Module):
    def __init__(self, in_channel, num_classes):
        super().__init__()
        raise NotImplementedError()
        

    def forward(self, x):
        raise NotImplementedError()


def test_ConvNet():
    x = torch.zeros((64, 3, 32, 32), dtype=dtype)  # minibatch size 64, image size [3, 32, 32]
    model = ConvNet(in_channel=3, num_classes=10)
    scores = model(x)
    print(scores.size())  # you should see [64, 10]
    
test_ConvNet()

### Training Loop


In [None]:
def train(model, optimizer, epochs=1):
    """
    Train a model on CIFAR-10 using the PyTorch Module API.
    
    Inputs:
    - model: A PyTorch Module giving the model to train.
    - optimizer: An Optimizer object we will use to train the model
    - epochs: (Optional) A Python integer giving the number of epochs to train for
    """
    model = model.to(device=device)  # move the model parameters to CPU/GPU
    
    raise NotImplementedError()

### Check Accuracy
Given the validation or test set, we can check the classification accuracy of a neural network. 

In [None]:
def check_accuracy(loader, model):
    raise NotImplementedError()

## Part III. CIFAR-10 Challenge

In this section, you can experiment with whatever ConvNet architecture you'd like on CIFAR-10.

Now it's your job to experiment with architectures, hyperparameters, loss functions, and optimizers to train a model that achieves at least 70% accuracy on the CIFAR-10 validation set within 10 epochs. You can use the check_accuracy and train functions from above. 

* Layers in torch.nn package: http://pytorch.org/docs/stable/nn.html
* Activations: http://pytorch.org/docs/stable/nn.html#non-linear-activations
* Loss functions: http://pytorch.org/docs/stable/nn.html#loss-functions
* Optimizers: http://pytorch.org/docs/stable/optim.html
* Data augmentation: https://pytorch.org/docs/stable/torchvision/transforms.html

In [None]:
# Experiment with any architectures, optimizers, and hyperparameters.          
# Achieve AT LEAST 70% accuracy on the *validation set* within 10 epochs.      
#                                                                              
# Note that you can use the check_accuracy function to evaluate on either      
# the test set or the validation set, by passing either loader_test or         
# loader_val as the second argument to check_accuracy. You should not touch    
# the test set until you have finished your architecture and  hyperparameter   
# tuning, and only run the test set once at the end to report a final value.   


model = None
optimizer = None

train(model, optimizer, epochs=10)

# Transfer Learning

In practice, very few people train an entire Convolutional Network from scratch (with random initialization), because it is relatively rare to have a dataset of sufficient size. Instead, it is common to pretrain a ConvNet on a very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories), and then use the ConvNet either as an initialization or a fixed feature extractor for the task of interest.

* **Finetuning the convnet:** Instead of random initializaion, we initialize the network with a pretrained network, like the one that is trained on imagenet 1000 dataset. Rest of the training looks as usual.
* **ConvNet as fixed feature extractor:** Here, we will freeze the weights for all of the network except that of the final fully connected layer. This last fully connected layer is replaced with a new one with random weights and only this layer is trained.

We're going to train a model to classify ants and bees. We have about 120 training images each for ants and bees. There are 75 validation images for each class. Usually, this is a very small dataset to generalize upon, if trained from scratch. Since we are using transfer learning, we should be able to generalize reasonably well.

This dataset is a very small subset of imagenet.

In [None]:
from __future__ import print_function, division

from torch.optim import lr_scheduler

import torchvision
from torchvision import models
import matplotlib.pyplot as plt
import time
import os
import copy

In [None]:
data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

data_dir = 'ants_bees'
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                                          data_transforms[x])
                  for x in ['train', 'val']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4,
                                             shuffle=True, num_workers=4)
              for x in ['train', 'val']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}
class_names = image_datasets['train'].classes

## Visualize Transformations

We will visualize a few images to understand the data augmentations.

In [None]:
def imshow(inp, title=None):
    """Imshow for Tensor."""
    inp = inp.numpy().transpose((1, 2, 0))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    inp = std * inp + mean
    inp = np.clip(inp, 0, 1)
    plt.imshow(inp)
    if title is not None:
        plt.title(title)
    plt.pause(0.001)  # pause a bit so that plots are updated


# Get a batch of training data
inputs, classes = next(iter(dataloaders['train']))

# Make a grid from batch
out = torchvision.utils.make_grid(inputs)

imshow(out, title=[class_names[x] for x in classes])

## Finetuning the ConvNet

To train the model, we will keep track of the model's accuracy during the training phase. Every epoch will consist of a training and validation set, where we will begin by setting the model's best weights to those of the pretrained mode. If we move to the validation phase and the accuracy has improved, we will save the current weights as the best model weights. Remember that the weights should only be updated when the model is training. 

In [None]:
def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    raise NotImplementedError()

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model

Load a pretrained model and reset the final fully connected layer.

In [None]:
model_ft = models.resnet18(pretrained=True)
# Change the input and output dimensions of the final linear layer

### YOUR CODE HERE

model_ft = model_ft.to(device)

criterion = nn.CrossEntropyLoss()

# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)

In [None]:
model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
                       num_epochs=25)

## ConvNet as fixed feature extractor

Due to long training times, people sometimes choose to only train the last layer, while keeping the pretrained model as a fixed feature extractor. Thus, we need to freeze all of the network except the final layer. We need to set requires_grad == False to freeze the parameters so that the gradients are not computed in backward(). Parameters of newly constructured modules also have requires_grad=True by default.


In [None]:
model_conv = torchvision.models.resnet18(pretrained=True)

## YOUR CODE HERE


model_conv = model_conv.to(device)

criterion = nn.CrossEntropyLoss()

# Observe that only parameters of final layer are being optimized as
# opposed to before.
optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)

In [None]:
model_conv = train_model(model_conv, criterion, optimizer_conv,
                         exp_lr_scheduler, num_epochs=25)