For the last part of this assignment, we're going to migrate to a popular deep learning framework called PyTorch. We will train and analyze a convolutional neural network on iWildCam, a dataset that contains camera trap images (https://sites.google.com/view/fgvc6/competitions/iwildcam-2019). We've subsampled this dataset to use only 3 classes, rather than the full dataset. 

This section of the assignment is a lot more open ended than the previous parts, (and in fact, can serve as the starting point for your final project, if you're interested) 


**Why use a deep learning framework?**
* Our code can now run on GPUs. You can use something like Google colab to use GPUs to train your code much faster. A framework like Pytorch interfaces directly with the GPU architecture without us having to write CUDA code directly (which is beyond the scope of this class).
* In this class, we want you to be ready to use one of these framework for your project so you can experiment more efficiently than if you were writing every feature you want to use by hand. 
* We want you to stand on the shoulders of giants! PyTorch is an excellent framework that will make your lives a lot easier, and now that you understand the guts of convolutional networks, you should feel free to use such frameworks. 
* Finally, we want you to be exposed to the sort of deep learning code you might run into in academia or industry.

**Note: We're going to be working at the highest level of abstraction within PyTorch. This should provide enough flexibility to be able to train a model for our purpose within this assignment, but you can do a lot more with PyTorch, if you're interested** You can go through this [tutorial](https://github.com/jcjohnson/pytorch-examples?tab=readme-ov-file) to understand more about the library itself.


*Thanks to instructors from Stanfords' CS231n, including Prof. Fei-Fei Li, Prof. Ranjay Krishna, and Prof. Justin Johnson for ideas and some starter code.*

In [None]:
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as T
from torch.utils.data import Dataset, DataLoader
from PIL import Image

In [None]:
USE_GPU = True
dtype = torch.float32 # We will be using float throughout.

if USE_GPU and torch.cuda.is_available(): 
    device = torch.device('cuda')
else:
    device = torch.device('cpu')

## Working with Data

Pytorch offers dataloaders to handle datasets easily. Here is a Dataset class written for the iWildCam dataset. Note the two required functions: `__len__()` that gives the length of the dataset and `__getitem__()` which retrieves a specific image and it's annotations 

We're interested in two aspects of each iWildCam image: the kind of animal (stored in `labels`) that we're using as the target variable and the camera trap location (stored in `locations`) for our analysis. `__getitem__()` returns both of these annotations as well as the image itself. Also note the transform - we convert the image to a form that we want to work with. Typically, we want the image to be a **`tensor`**, which is similar to a numpy array. 


In [None]:

class WildCamDataset(Dataset):
    def __init__(self, img_paths, annotations, transform=T.ToTensor(), directory='WildCam_3classes/train'):
        self.img_paths = img_paths
        self.annotations = annotations
        self.transform = transform
        self.dir = directory

    def __len__(self):
        return len(self.img_paths)

    def __getitem__(self, index):
        ID = '{}/{}'.format(self.dir, self.img_paths[index])
        img = Image.open(ID).convert('RGB')
        X = self.transform(img)             
        y = self.annotations['labels'][self.img_paths[index]]
        loc = self.annotations['locations'][self.img_paths[index]]
        return X, y, loc
    

In [None]:
import os
import json

# We often want to preprocess the image. Here, we're resizing all images to 112x112 and then normalizing 
# them. You can add additional transforms / data augmentations here, check out torchvision transforms: 
# https://pytorch.org/vision/stable/transforms.html
# Remember, if you do any kind of randomized data transformation during training, we need to find the (approximate)
# expected value during that during inference. 

normalize = T.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])
transform = T.Compose([
            T.Resize((112,112)),
            T.ToTensor(),
            normalize
])

# We're also specifying the batch size and whether or not we want to shuffle the images 
# Typically, we shuffle the images when training, but we don't need to shuffle images when testing the model 

param_train = {
    'batch_size': 256,       
    'shuffle': True
    }

param_valtest = {
    'batch_size': 256,
    'shuffle': False
    }


In [None]:
# Download and unzip the subset of the dataset before running the following. 
# You may need to change the folder names / locations. 

annotations = json.load(open('WildCam_3classes/annotations.json'))

train_images = sorted(os.listdir('WildCam_3classes/train'))
train_dset = WildCamDataset(train_images, annotations, transform, directory='WildCam_3classes/train/')
train_loader = DataLoader(train_dset, **param_train)

val_images = sorted(os.listdir('WildCam_3classes/val'))
val_dset = WildCamDataset(val_images, annotations, transform, directory="WildCam_3classes/val/")
val_loader = DataLoader(val_dset, **param_valtest)

## PyTorch Sequential API

PyTorch provides a container Module called `nn.Sequential`, which allows us to construct simple, feedforward networks. It is not as flexible as other methods, but is sufficient for our case. 


### Sequential API: Two-Layer Network
Let's see how to write a simple two-layer fully connected network example with `nn.Sequential`, and train it.
You don't need to tune any hyperparameters here, but you shoud achieve above 60% accuracy after one epoch of training.

In [None]:
# This function converts N 3 dimentional images to a one dimensional vector per image
def flatten(x):
    N = x.shape[0] 
    return x.view(N, -1)  

# We need to wrap `flatten` function in a module in order to stack it in nn.Sequential
class Flatten(nn.Module):
    def forward(self, x):
        return flatten(x)

hidden_layer_size = 100
learning_rate = 5e-4


# This creates a model that has 2 linear layers with a hidden layer size of 100. 
# Notice that we first flatten our images
model = nn.Sequential(
    Flatten(),
    nn.Linear(3 * 112 * 112, hidden_layer_size), 
    nn.ReLU(),
    nn.Linear(hidden_layer_size, 3),
)


# optim has different optimizers you can try! This is set to SGD + momentum, but you can use Adam or 
# RMSprop as examples. See https://pytorch.org/docs/stable/optim.html

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9)

In [None]:
def train(model, optimizer, loader_train, loader_val, epochs=1, print_every=1):
    """
    Train a model using the PyTorch Module API.
    
    Inputs:
    - model: A PyTorch Module giving the model to train.
    - optimizer: An Optimizer object we will use to train the model
    - loader_train: A dataloader containing the train dataset
    - loader_val: A dataloader containing the validation dataset
    - epochs: (Optional) An integer giving the number of epochs to train for
    - print_every: (Optional) An integer specifying how often to print the loss. 
    
    Returns: Nothing, but prints model losses and accuracies during training.
    """
    model = model.to(device=device)  # move the model parameters to CPU/GPU
    for e in range(epochs):
        for t, (x, y, loc) in enumerate(loader_train):
            model.train()  # put model to training mode
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.long)

            scores = model(x)
            loss = torch.nn.functional.cross_entropy(scores, y)

            # Zero out all of the gradients for the variables which the optimizer
            # will update.
            optimizer.zero_grad()

            # This is the backwards pass: compute the gradient of the loss with
            # respect to each  parameter of the model.
            loss.backward()

            # Actually update the parameters of the model using the gradients
            # computed by the backwards pass.
            optimizer.step()

            if t % print_every == 0:
                print('Epoch {}, iteration {}, loss = {}'.format(e, t, loss.item()))
                
        
        print('Epoch {} done'.format(e))
        check_accuracy(loader_val, model)
        

In [None]:

def check_accuracy(loader, model):
    """
    Finds the accuracy of a model
    
    Inputs:
    - loader: A dataloader containing the validation / testing dataset
    - model: A PyTorch Module giving the model to evaluate.
    
    Returns: Nothing, but prints the accuracy.
    """
    
    num_correct = 0
    num_samples = 0
    model.eval()  # set model to evaluation mode
    with torch.no_grad(): # no need to store computation graph or local gradients
        for x, y, loc in loader:
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.long)
            scores = model(x)
            _, preds = scores.max(1)
            num_correct += (preds == y).sum()
            num_samples += preds.size(0)
        acc = float(num_correct) / num_samples
        print('Got %d / %d correct (%.2f)' % (num_correct, num_samples, 100 * acc))

In [None]:
train(model, optimizer, train_loader, val_loader, print_every=1)

### Sequential API: 3-layer ConvNet

Here you should use `nn.Sequential` to define and train a three-layer ConvNet with this architecture. 
Hint: Look up https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html and https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html#torch.nn.MaxPool2d

1. Convolutional layer (with bias) with 32 5x5 filters, with zero-padding of 2
2. ReLU
3. Convolutional layer (with bias) with 16 3x3 filters, with zero-padding of 1
4. ReLU
5. MaxPool (filter size 2, stride 2)
5. Fully-connected layer (with bias) to compute scores for 3 classes

You can use the default PyTorch weight initialization.

You should optimize your model using stochastic gradient descent with momentum 0.9.

Again, you don't need to tune any hyperparameters but you should see accuracy above 60% after one epoch of training.

In [None]:
channel_1 = 32
channel_2 = 16
learning_rate = 7.5e-4

model = None
optimizer = None


# TODO: Write a 3-layer ConvNet Sequential API.                            


################################################################################

train(model, optimizer, train_loader, val_loader, print_every=1)

## Open ended challenge

Now, it's your turn. Design and train a network that gets over **80% accuracy on the validation set** within **5 epochs**. You can play around with different architectures (e.g, increasing the depth , changing the number / size of filters), changing the optimizer (e.g., using Adam or RMSProp), adding data augmentation, etc. 


**Deliverable**: In your report, describe what you did. Make sure to include your model architecture, optimizer and list all hyperparameters. Additionally, plot the training loss across the iterations (you can modify the `train()` function to return the loss values)

You can save your model as follows: 

```
torch.save(model.state_dict(), PATH)
```

In [None]:
torch.save(model.state_dict(), 'model.pth')  ## to save a model to model.pth


# Suppose your model is the 2-layer fully connected network we defined initially. In that case, do the 
# following to load the parameter values into the model 
model = nn.Sequential(
    Flatten(),
    nn.Linear(3 * 112 * 112, hidden_layer_size), 
    nn.ReLU(),
    nn.Linear(hidden_layer_size, 3),
)

model.load_state_dict(torch.load('model.pth'))  

## Analysis

We're now going to walk through an analysis of this model. First, create a dataloader that contains images from the `test` folder. 

In [1]:
#### Create a test data loader, similar to what we did for the validation set. 


#### Accuracy

Next, find the overall model accuracy on the test dataset, as well as the per-class accuracies. You may need to modify / rewrite `check_accuracy()`

In [None]:
# find the overall accuracy and the per-class accuracy. 

#### Confusion Matrix

Let's dig deeper into the analysis. Construct the confusion matrix for your model (you can use the sklearn implementation, if you prefer.) Are any of the classes harder to identify than the others? Do you have any hypotheses for these? 

In [None]:
# Find the confusion matrix

#### Analyzing a single class by camera trap location

Pick one of the three classes. Let's analyze how the model performance is different for different camera traps (this is the `location` information within the annotations). Let's find the per-class accuracy for this class for each of the different camera trap locations. Plot the accuracy as a function of the fraction of images from the training set that come from that location. What do you notice? 

Visualize images that are from camera trap locations with good performance versus those from locations with poor performance. What do you notice? 

#### Improvements

Finally, describe how you might improve the model performance on rare camera trap locations. You don't have to actually implement your proposed improvement, but describe exactly how you could go about implementing it and what pitfalls you might anticipate.