#  Training a Neural Network with PyTorch

Some of the best methods for computer vision use Neural Networks.

We are going to try training a neural network to recognize hand-written digits.

## Imports

In [None]:
import torch
from torch import nn,optim
import torchvision
from torchvision import transforms, datasets
import matplotlib.pyplot as plt
from time import time

## Prepare the dataset

We will use a dataset provided by the library torchvision: `torchvision.datasets.MNIST`
    
The library will automatically save the dataset, so we need to mount the drive to give in a place to save the data.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Additionally, we will want to apply transforms to our data to make it compatible with our neural net library (pytorch)

In [None]:
transform = transforms.Compose([transforms.ToTensor(),
                              transforms.Normalize((0.5,), (0.5,)),
                              ])

Now, load the dataset

In [None]:
trainset = datasets.MNIST('/content/drive/My Drive/mnist/train', download=True, train=True, transform=transform)
valset = datasets.MNIST('/content/drive/My Drive/mnist/valid/', download=True, train=False, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
valloader = torch.utils.data.DataLoader(valset, batch_size=64, shuffle=True)

Let's look at some of the data

In [None]:
dataiter = iter(trainloader)
images, labels = dataiter.next()

figure = plt.figure()
num_of_images = 60
for index in range(1, num_of_images + 1):
    plt.subplot(6, 10, index)
    plt.axis('off')
    plt.imshow(images[index].numpy().squeeze(), cmap='gray_r')

## Make the Neural Network
We want to build a network like this one
![image.png](https://miro.medium.com/max/2100/1*HWhBextdDSkxYvz0kEMTVg.png)

In [None]:
input_size = 784
hidden_sizes = [128, 64]
output_size = 10

model = nn.Sequential(nn.Linear(input_size, hidden_sizes[0]),
                      nn.ReLU(),
                      nn.Linear(hidden_sizes[0], hidden_sizes[1]),
                      nn.ReLU(),
                      nn.Linear(hidden_sizes[1], output_size),
                      nn.LogSoftmax(dim=1))
print(model)

Compare the model we created with the image above and try to find which functions correspond to each layer

## Train the network

In [None]:
criterion = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.003, momentum=0.9)
time0 = time()
epochs = 15
for e in range(epochs):
    running_loss = 0
    for images, labels in trainloader:
        # Flatten MNIST images into a 784 long vector
        images = images.view(images.shape[0], -1)
    
        # Training pass
        optimizer.zero_grad()
        
        output = model(images)
        loss = criterion(output, labels)
        
        #This is where the model learns by backpropagating
        loss.backward()
        
        #And optimizes its weights here
        optimizer.step()
        
        running_loss += loss.item()
    else:
        print("Epoch {} - Training loss: {}".format(e, running_loss/len(trainloader)))
print("\nTraining Time (in minutes) =",(time()-time0)/60)

## Test the network

Let's test the network on the validation images

In [None]:
correct_count, all_count = 0, 0
for images,labels in valloader:
  for i in range(len(labels)):
    img = images[i].view(1, 784)
    with torch.no_grad():
        logps = model(img)

    
    ps = torch.exp(logps)
    probab = list(ps.numpy()[0])
    pred_label = probab.index(max(probab))
    true_label = labels.numpy()[i]
    if(true_label == pred_label):
      correct_count += 1
    all_count += 1

print("Number Of Images Tested =", all_count)
print("\nModel Accuracy =", (correct_count/all_count))

## Next steps

Try to read the code we ran. It seems complicated at first, but try to understand what each line is doing. 

Try changing the learning rate, momentum, loss function, nonlinearities, model size, etc. and see how high you can get the performance on mnist. 

Think about what we will need to do to train a neural net like this one with our data. What will be different?

## Questions

Use your coding knowledge and search the web to answer these questions. Work with the rest of the team to find the answers.

1. How many images did we train the network on?


2. We used something called an optimizer to train the network. What is an optimizer? What optimizer did we use? Name 2 other optimizers that are in the Pytorch library.


3. In the hidden layers, we applied the ReLU function to the output of the Linear layers. Draw a picture of what the ReLU function looks like. Why do we use this function in neural networks?

Code Source: https://towardsdatascience.com/handwritten-digit-mnist-pytorch-977b5338e627