# Train neural network

- Define functions for training a neural network
- Train a fully connect neural network

### Setup drive

Run the following cell to mount your Drive onto Colab. Go to the given URL and once you login and copy and paste the authorization code, you should see "drive" pop up in the files tab on the left.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


Click the little triangle next to "drive" and navigate to the "AI4All Chest X-Ray Project" folder. Hover over the folder and click the 3 dots that appear on the right. Select "copy path" and replace `PASTE PATH HERE` with the path to your folder.

In [None]:
# cd "PASTE PATH HERE"

### Import necessary libraries

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd

import torch
from torch import nn, optim, Tensor
from torch.utils.data import DataLoader, random_split

import torchvision
from torchvision import datasets, transforms

from utils.datahelper import get_random_image
from utils.modelhelper import simple_train
from utils.plotting import imshow_dataset, imshow_tensors

### Setup paths

In [None]:
path_to_dataset = os.path.join('data')

path_to_images = os.path.join(path_to_dataset, 'images')

metadata = pd.read_csv(os.path.join(path_to_dataset, 'metadata_train.csv'))

In [None]:
# setup device for gpu/cpu
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

### Preprocess data

Preprocess steps from the previous notebook


In [None]:
DATA_MEAN = 0.544
DATA_STD = 0.237

# RESIZE_SIZE = 235
# CROP_SIZE = 224 

# smaller image size for quicker training
RESIZE_SIZE = 50
CROP_SIZE = 50 

IM_SIZE = CROP_SIZE

data_transforms = transforms.Compose([
        transforms.Grayscale(),
        transforms.Resize(RESIZE_SIZE),
        transforms.CenterCrop(IM_SIZE),
        transforms.ToTensor(),
        transforms.Normalize(mean=DATA_MEAN, std=DATA_STD)])

dataset = datasets.ImageFolder(path_to_images, transform=data_transforms)

Run this block a few times to see the transform steps for different random images

In [None]:
dataset_notransform = datasets.ImageFolder(path_to_images, transform=None)
im = get_random_image(dataset_notransform)

# images for each step of the transformation
im_greyscale = transforms.Grayscale()(im)
im_resize = transforms.Resize(RESIZE_SIZE)(im_greyscale)
im_crop = transforms.CenterCrop(IM_SIZE)(im_resize)
im_tensor = transforms.ToTensor()(im_crop)
im_normed = transforms.Normalize(mean=DATA_MEAN, std=DATA_STD)(im_tensor)

# show images
fig, ax = plt.subplots(1,5, figsize=(15,5))
ax = ax.ravel()

plt.gray()
ax[0].imshow(im_greyscale)
ax[1].imshow(im_resize)
ax[2].imshow(im_crop)
ax[3].imshow(np.squeeze(im_tensor))
ax[4].imshow(np.squeeze(im_normed))

In [None]:
# tensor vs PIL image

pil_size = im_crop.size
tensor_size = im_tensor.shape

print(pil_size)
print(tensor_size)

### Loading data with DataLoader
Use DataLoader to define how we want to load in the data. [Read about the function here](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader). 

**Loading a batch of images**

In this example, we define the DataLoader to load images from the dataset such that 5 images (`batch_size=5`) at a time.

In [None]:
loader = DataLoader(dataset=dataset, batch_size=5)

# get the next batch of images
batch_dataset = next(iter(loader))

# display this batch of image
batch_images = batch_dataset[0]
imshow_tensors(batch_images, n=5)

**Randomly shuffling when loading a batch of images**

If we set `shuffle=True`, the images are randomly shuffled when we get the batches. This is useful if we want to mix up the training data so we don't train the model on all the Covid images first (and possible over-bias the model) before training on the No Finding images.

In [None]:
loader = DataLoader(dataset=dataset, shuffle=True, batch_size=5)

# get the next batch of images
batch_dataset = next(iter(loader))

# display this batch of image
batch_images = batch_dataset[0]
imshow_tensors(batch_images, n=5)

**Transforms are applied "just in time"**

The transforms we defined are applied to the images right before the DataLoader grabs a batch of images. This means that if there are random transformations, the same batch of images may be look differently every time they is called (for example, during different iterations of epochs).

In [None]:
# define transformations with added random affine step
data_transforms_affine = transforms.Compose([
        transforms.Grayscale(),
        transforms.Resize(RESIZE_SIZE),
        transforms.CenterCrop(IM_SIZE),
        transforms.RandomAffine(20, translate=(0.2, 0.2)),
        transforms.ToTensor(),
        transforms.Normalize(mean=DATA_MEAN, std=DATA_STD)])

dataset_affine = datasets.ImageFolder(path_to_images, transform=data_transforms_affine)

loader = DataLoader(dataset=dataset_affine, shuffle=False, batch_size=5)

# get the next batch of images
batch_dataset = next(iter(loader))

# display this batch of image
batch_images = batch_dataset[0]
imshow_tensors(batch_images, n=5)

### Fully connected neural network example

The nn.Sequential container is a simple method for defining a neural network in PyTorch. This method is approachable for getting started but is difficult to customize. 

For more custom and advanced networks, [check out the nn.Module class](https://pytorch.org/docs/master/generated/torch.nn.Module.html), which is the more standard approach for creating neural networks using PyTorch. Once you are comfortable with the code here, I would recommend creating a network using `nn.Module`. You can try defining the `predict` and `evaluate` methods directly in the class or try customizing the weight initalization function.

**Print the NN model below**

 What's the layout of the network? How many parameters (weights) are we training between each layer?

In [None]:
input_size = IM_SIZE * IM_SIZE ## INPUT SIZE ##

model = nn.Sequential(nn.Flatten(),
                      nn.Linear(input_size, 32),
                      nn.ReLU(),
                      nn.Linear(32, 8),
                      nn.ReLU(),
                      nn.Linear(8, 2),
                      nn.LogSoftmax(dim=1))
model.to(device)

print(model)

**Train the network**

We defined a function `simple_train` to do forward and backward propagation on the model. You will get a chance to expand on this function later in the notebook.

In [None]:
train_loader = DataLoader(dataset=dataset, batch_size=64)
simple_train(model, train_loader)

**How did this model do?**

### Training and validation sets



Remember that we don't want to train on all of the data. To evaluate the generalizability of the model (how well the model performs on unknown data), we need to split the data into training and validation datasets.

In [None]:
# EXERCISE: Write a function that calculates the training and validation set 
# sizes based on the ratio of data you want have in the training set. 
# For example, if the dataset has 100 images and you want 60% (0.6) in the training
# set, the function should return (60, 40)

def get_train_val_sizes(dataset, train_ratio):
  ## WRITE CODE HERE ##
  
  return train_size, val_size

**Train and validation loaders**

Note: Think about how you might want to set the parameters for DataLoader differently for the training and validation sets.

In [None]:
# EXERCISE: Use the random_split and DataLoader functions to get two DataLoaders,
# one for training and one for validation

train_ratio = # CODE #
train_size, val_size = get_train_val_sizes(dataset, train_ratio)

## WRITE CODE HERE ##

### Assess training progress

Modify the training function to record the loss and accuracy so we can assess the training progress over iterations.

**Calculate accuracy**

Given a set of images, we can use the model to predict a list of labels. If we have 8 images, the model might output something like `predicted` in the code two blocks below. `predicted[0]` would be the predicted label for the first image, `predicted[1]` for the second, and so on. We can compare the predicted labels to the actual labels. If the labels are the same for an image, then the model has made a correct prediction. If the labels are different then the model has made an incorrect prediction. 

In [None]:
# EXERCISE: Write a function that calculates the accuracy (# correct / total) 
# given tensors (arrays) of predicted and actual labels. 
# Tip: check out the torch.mean() function

def calc_accuracy(predicted, actual):
    ## WRITE CODE HERE ##
    
    return accuracy

Check your function by running this block. The output should be 0.625

In [None]:
predicted = Tensor([0,1,1,0,0,0,1,0])
actual = Tensor([0,1,0,1,1,0,1,0])
calc_accuracy(predicted, actual)

**Make predictions**

As you may remember, the output of the neural network is a set of probabilities for the class labels. If the model outputs `[0.6, 0.4]` for an image, this means that the model predicts the image to be a Covid case with 0.6 probability and a No Finding case with a 0.4 probability. To convert the probabilities into a class label prediction, we would take the label with the greater probability.  

Note: In our case, the final output layer is actually a log probability (for reasons related to the loss function). Taking the max will still work since the log function is monotonically increasing. If you want to play around with this, you can convert the log probability to probability using `torch.exp()`

Note 2: We are only taking the max here. However, you can imagine that a prediction with 0.9 probability is much stronger than a prediction with 0.55 probability. This is why we train the model using a loss function rather than the accuracy!

You can check out what the model outputs look like using this code.

In [None]:
# get a batch of images (inputs)
inputs, _ = next(iter(train_loader))

# get the output from the model
outputs = model(inputs)

Write the `predict` function which will return the predicted labels for a model and a set of input images. 

In [None]:
# EXERCISE: Write a function that uses a trained neural network model to predict
# labels for a list of input images
# Tip: Check out the torch.max() function

def predict(model, inputs):
    ## WRITE CODE HERE ##
    
    return predicted

Run the code below to test your prediction function. You should get 0.6 as the output

In [None]:
# sets a manual random seed 
torch.manual_seed(8)

# loader to test predict function
testing_loader = DataLoader(dataset=dataset, shuffle=True, batch_size=10)

# get first batch of images and class labels
images, labels_actual = next(iter(testing_loader))

# get predicted labels (from the prediction function above)
labels_predicted = predict(model, images)

# reset random seed
_ = torch.seed()

# calculate accuracy
calc_accuracy(labels_predicted, labels_actual)

**Update the training function**

The framework for training the model is provided in the `train` function below. 

In this function, we first define the loss function (`nn.NLLLoss`) and the optimization function (`optim.Adam`). We also load in the validation dataset so that we can compute the validation accuracy. 

The function then loops over each epoch and each batch. Remember that the epoch number is the number of times we loop over the entire dataset and each batch is a subset of the data for used one step of training. If we are running the model for 3 epochs and 10 batches / epoch, then the model will train a total of `3 * 10 = 30` steps.  

The code inside the loops defines what happens during each training step: 
1. Get the next batch of data for training
2. Forward propagation: Use the model to make predictions
3. Calculate loss (error): How badly did the model do?
4. Back propagation: Propagate the error and update the weights in a direction that would minimize the error.

Calculate and record loss, training accuracy, and validation accuracy during model training so we can track the training progress. In the indicated parts of the function, add code so that the training function will keep track of the training loss, training accuracy, and validation accuracy during training. 

In [None]:
# EXERCISE: Add code to the train function to print and return the training 
# accuracy, validation accuracy, and loss

def train(model, train_loader, val_loader, epoch_num=1, lr=0.001):
    '''Trains a neural network

    Args:
        model: torch model
        train_loader (DataLoader): training data loader
        val_loader (DataLoader): validation data loader
        epoch_num (int): number of epochs
        lr (float): learning rate

    Return:
        None (plots loss over iterations)
    '''
    
    # define loss and optimization functions
    criterion = nn.NLLLoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)

    # load validation set
    inputs_val, targets_val = next(iter(val_loader))
    
    # create variables for tracking loss and accuracy values
    
    ### WRITE CODE HERE ###


    for epoch in range(epoch_num):

        for batch_num, data in enumerate(train_loader):

            inputs, targets = data

            # set parameter gradients to zero
            optimizer.zero_grad()

            # forward
            outputs = model(inputs)

            # calculate loss and update weights
            loss = criterion(outputs, targets)

            loss.backward()
            optimizer.step()

            ### WRITE CODE HERE



            

            ###

            ### UPDATE LINE BELOW TO PRINT LOSS AND ACCURACY ###
            print('[epoch {} batch {}]'.format(epoch, batch_num))


    print('Finished Training')

    # Returns training, validation accuracy, and loss
    return train_accuracy_log, val_accuracy_log, loss_log

Now try training a neural network model with the updated `train` function

In [None]:
train_accuracy_log, val_accuracy_log, loss_log = train(model, train_loader, val_loader)

### Modify neural network model





**Try a really small network**

How did the performance change compared the the previous model?

In [None]:
model = nn.Sequential(nn.Flatten(),
                      nn.Linear(input_size, 4),
                      nn.ReLU(),
                      nn.Linear(4, 2),
                      nn.LogSoftmax(dim=1))

print(model)

In [None]:
train_accuracy_log, val_accuracy_log, loss_log = train(model, train_loader, val_loader)

**Modify network architecture and change hyperparameters**

Try modifying the hyperparameters: number of layers, number of nodes, number of epochs, batch size. You can also modify the input image size. A larger input image will train slower but may give better results!

If you are interested in other types of neural network layers, you can [read more here](https://pytorch.org/docs/master/nn.html)

**How did different architectures and hyperparameters change the learning performance?**