<a href="https://colab.research.google.com/github/yala/introML_chem/blob/master/lab2/pytorch_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Deep Learning Packages
In this tutorial, we'll take you through developing models to classify images in PyTorch from start to finish. We'll go through preprocessing, building neural networks, and experimentation.

Let's get started!

In [0]:
# http://pytorch.org/
from os import path
from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())

accelerator = 'cu80' if path.exists('/opt/bin/nvidia-smi') else 'cpu'

!pip install -q http://download.pytorch.org/whl/{accelerator}/torch-0.4.0-{platform}-linux_x86_64.whl torchvision==0.2.0
import torch
print(torch.__version__)
print(torch.cuda.is_available())

In [0]:
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from tqdm import tqdm
import matplotlib.pyplot as plt
import numpy as np


In [0]:
#@title Helper Function to display Images { display-mode: "form" }
def plot_images(images, cls_true):
    assert len(images) == len(cls_true) == 9
    
    # Create figure with 3x3 sub-plots.
    fig, axes = plt.subplots(3, 3, figsize=(4,4))
    fig.subplots_adjust(hspace=0.3, wspace=0.3)

    for i, ax in enumerate(axes.flat):
        # Plot image.
        ax.imshow(np.array(images[i], dtype='float').reshape((28,28))*255, cmap='binary')

        # Show true and predicted classes.

        xlabel = "True: {0}".format(cls_true[i])

        # Show the classes as the label on the x-axis.
        ax.set_xlabel(xlabel)
        
        # Remove ticks from the plot.
        ax.set_xticks([])
        ax.set_yticks([])
    
    # Ensure the plot is shown correctly with multiple plots
    # in a single Notebook cell.
    plt.show()

## What is PyTorch
It's a python based deep learning library. It's very popular amongst researchers because of it's speed and flexibility. 

At the base of pytorch is the idea of a `Tensor`.
A `Tensor` is just an `n-dimensional` array, like a numpy `ndarray`.

For example,
Let's make a random `3x3` tensor.
We can inspect tensors by printing them, and get their size with `.size()` 

In [0]:
a = torch.rand(3,3)
print(a)
print(a.size())

We can also take an array, and convert it to a tensor.


In [0]:
b = torch.Tensor([[1,2,3],[4,5,6]])
print(b)
print(b.size())

### Operations on Tensors
Any operation between tensors produces new tensors.
You can use regular python syntax to add, multiply them. PyTorch also nice functions for matrix multipication, and reshaping tensors.


In [0]:
a = a + 4
print(a)
d = a * 2
print(d)
e = a - d
print(e)

print(a.size(), b.size())
# Wont work because shapes number of columns in a 
# doesn't match number of rows in b
'c = torch.matmul(a, b)'
# This will work
c = torch.matmul(b, a)
print(c.size())


If you're running into a bug, it's often helpful to step through and check your dimensions.

### The magic: Autograd
The power behind PyTorch comes from its automatic differentiation engine, Autograd. To turn it on, construct your tensors with `requires_grad = True`.

Every computation you make, i.e `c=a+b` will create a computation graph with node `c` being linked to `a` and `b` via a `+` operator. 

<img src="https://raw.githubusercontent.com/yala/MLCodeLab/master/lab2/abc.png">

If you call `.backward()` on your final node, autograd will work out all the gradients for you and store the values in `a.grad` and `b.grad`.

Let's look at an example.

Consider the function
`y = a*(x^2) + b`, where `a = b = 1`. This is a simple parabola. 
<img src="https://raw.githubusercontent.com/yala/MLCodeLab/master/lab2/parabola.png">

The compute graph for this would be:
<img src="https://raw.githubusercontent.com/yala/MLCodeLab/master/lab2/parab_graph.png">

From basic calculas, we know that the derivative of `dy/dx` is
`dy/dx = 2a x`. So the derivative at `x = 1` is `2`. 
This wasn't very hard, but let's see how autograd can do this automatically.

In [0]:
a = torch.ones(1, requires_grad=False)
b = torch.ones(1, requires_grad=False)
x = torch.ones(1, requires_grad=True)
y = a*(x*x)  + b
print(y)
y.backward()
print("x.grad={}".format(x.grad))


### Why autograd is exciting
Now, this may have seemed trivial and contrived, but this flexible automatic differentiation process really shines when our computation graph is large and complex, i.e when it's a neural network.

If we place our whole model into our computation graph, and the loss calculation, the a call to `backward`, will compute all the gradients, and it becomes very easy to train neural networks. 




# The Task: MNIST, Digit Classification
<img src="https://raw.githubusercontent.com/yala/MLCodeLab/master/lab2/mnist.png">

In this lab, we'll build a neural network to classify hand-written digits.



## Step 1: Loading Data and Preprocessing
Let's start by loading the data.
We're going to normalize our images to have 0 mean, and unit variance. We'll do this using some [torchvision](https://pytorch.org/docs/stable/torchvision/index.html) transforms. This generally helps stablize learning, and is common practice. 

In [0]:
# Img mean value of .13, and stdv of .31 were computed across entire train set
# in prior work
normalize_image = transforms.Compose([
                           transforms.ToTensor(),
                           transforms.Normalize((0.1307,), (0.3081,))
                ])

# Dataset is loaded fro torchvision
all_train = datasets.MNIST('data', train=True, download=True, transform=normalize_image)

num_train = int(len(all_train)*.8)
train = [all_train[i] for i in range(num_train)]
dev = [all_train[i] for i in range(num_train,len(all_train))]
test = datasets.MNIST('data', train=False, download=True, 
                      transform=normalize_image)
                           


In [0]:
all_train = datasets.MNIST('data', train=True, download=True)
# images = [tr[0] for tr in all_train[:9]]
num_examples = 9 
images, labels = [], []
for i in range(num_examples):
  images.append(all_train[i][0])
  labels.append(all_train[i][1].item())
    
plot_images(images, labels)

In [0]:
train[0][0].size()

## Step 2: Building a model

All pytorch models should be implemented as instances of `nn.Module`. 

To build a model you need to:
a) define what parameters it'll need in it's `__init__` function
b) define the model's computation, using those parameters, in a forward function.


To keep things simple, lets define a simple linear classifer, like logistic regression. We'll experiment with more complex models soon.

In [0]:
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        # Learn weights for each pixel and perform linear regression
        self.fc = nn.Linear(28*28, 10)

    def forward(self, x):
        batch_size, num_channels, height, width = x.size()
        # Flatten image
        x = x.view(batch_size, -1)
        # Put it through linear classifier
        return self.fc(x)


## Step 3. Defining our training procedure

To train our model, let's introduce a couple new PyTorch ideas.

A [DataLoader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) is an iterator that goes over our entire dataset and selects batches. 
We'll be using this to iterate through our train/dev/test sets.

Let's intialize these now. 

An [Optimizer](https://pytorch.org/docs/stable/optim.html) defines an update rule. In class, we've discussed vanilla SGD, which is one method to compute the next weight, given the current weight and gradient. There are plently of other optimizers you can try from the pytorch library. 


In [0]:
# Training settings
batch_size = 64
epochs = 10
lr = .01
momentum = 0.5


train_loader = torch.utils.data.DataLoader(train, batch_size=batch_size, shuffle=True)
dev_loader = torch.utils.data.DataLoader(dev, batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(test, batch_size=batch_size, shuffle=True)


model = Model()
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)


To train our model:

1) we'll randomly sample batches from our train loader

2) compute our loss (using standard `cross_entropy`)

3) compute our gradients (by calling `backward()` on our loss)

4) update our neural network with an `optimizer.step()`, and go back to 1)

I've added some extra stuff here to log our accuracy and avg loss for the epoch.


In [0]:
def train_epoch( model, train_loader, optimizer, epoch):
    model.train() # Set the nn.Module to train mode. 
    total_loss = 0
    correct = 0
    num_samples = len(train_loader.dataset)
    for batch_idx, (x, target) in enumerate(train_loader): #1) get batch
        # Reset gradient data to 0
        optimizer.zero_grad()
        # Get prediction for batch
        output = model(x)
        # 2) Compute loss
        loss = F.cross_entropy(output, target)
        #3) Do backprop
        loss.backward()
        #4) Update model
        optimizer.step()
        
        ## Do book-keeping to track accuracy and avg loss
        pred = output.max(1, keepdim=True)[1] # get the index of the max log-probability
        correct += pred.eq(target.view_as(pred)).sum().item()
        total_loss += loss.detach() # Don't keep computation graph 

    print('Train Epoch: {} \tLoss: {:.4f}, Accuracy: {}/{} ({:.0f}%)'.format(
            epoch, total_loss / num_samples, 
            correct, 
            num_samples,
            100. * correct / num_samples))


## Step 3.5 Define our evaluation loop
Similar to above, we'll also loop through our dev or test set, and compute our loss and accuracy. 
This lets us see how well our model is generalizing. 

In [0]:
def eval_epoch(model, test_loader, name):
    model.eval()
    test_loss = 0
    correct = 0
    for data, target in test_loader:
        output = model(data)
        test_loss += F.cross_entropy(output, target).item() # sum up batch loss
        pred = output.max(1, keepdim=True)[1] # get the index of the max log-probability
        correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)
    print('\n{} set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        name,
        test_loss, 
        correct, 
        len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))


## Step 4: Training the model

In [0]:

for epoch in range(1, epochs + 1):
    train_epoch(model, train_loader, optimizer, epoch)
    eval_epoch(model,  dev_loader, "Dev")
    print("---")

# Step 5. Experiment with MLP
This model gets a dev accuracy of 93%, which isn't too bad. However, the power of neural networks comes from composing layers with nonlinearities.

Let's try a more complex model.

In [0]:
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.fc1 = nn.Linear(28*28, 200)
        self.fc2 = nn.Linear(200, 200)
        self.fc3 = nn.Linear(200, 10)
        

    def forward(self, x):
        batch_size, num_channels, height, width = x.size()
        x = x.view(batch_size, -1)
        hidden = F.relu(self.fc1(x))
        hidden = F.relu(self.fc2(hidden))
        logit = self.fc3(hidden)
        return logit
    
model = Model()
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)

for epoch in range(1, epochs + 1):
    train_epoch(model, train_loader, optimizer, epoch)
    eval_epoch(model,  dev_loader, "Dev")
    print("---")

## Step 7. Explore further.
You can try different model architectures, different optimizers, learning rates and regularization strategies. Neural networks are incredibly flexibile, and so the space to do explore is enourmous.  Once you're done exploring, take your best model (i.e achieves best results on dev set) and run it on test!

In [0]:
eval_epoch(model,  test_loader, "Test")

## Step 8. Now try it on your own on Beer Review and Property prediction!
