## Tutorial 3:  Deep Learning with PyTorch

In [1]:
import matplotlib.pyplot as plt
%matplotlib inline
import sys
import math
import numpy as np
import pandas as pd
import os
import torch
import time
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F

ModuleNotFoundError: No module named 'torch'

### Check Package Versions

In [None]:
print('__Python VERSION:', sys.version)
print('__PyTorch VERSION:', torch.__version__)
print('__CUDNN VERSION:', torch.backends.cudnn.version())
print('__Number CUDA Devices:', torch.cuda.device_count())

### PyTorch 


What is PyTorch?

It’s a Python based scientific computing package targeted at two sets of
audiences:

-  A replacement for numpy to use the power of GPUs
-  a deep learning research platform that provides maximum flexibility
   and speed



### Tensors

Tensors are similar to numpy’s ndarrays, with the addition being that
Tensors can also be used on a GPU to accelerate computing.

Construct a 5x3 matrix, uninitialized

In [None]:
x = torch.Tensor(5, 3)
print(x)

Get its size

In [None]:
y = torch.rand(5, 3)
print(x + y)

In [None]:
print(torch.add(x, y))

In [None]:
# Addition: giving an output tensor
result = torch.Tensor(5, 3)
torch.add(x, y, out=result)
print(result)

In [None]:
# Addition: in-place
y.add_(x)

### Numpy Bridge


Converting a torch Tensor to a numpy array and vice versa is a breeze.

The torch Tensor and numpy array will share their underlying memory
locations, and changing one will change the other.

Converting torch Tensor to numpy Array

In [None]:
a = torch.ones(5)
print(a)

In [None]:
b = a.numpy()
print(b)

In [None]:
# See how the numpy array changed in value.
a.add_(1)
print(a)
print(b)

Converting numpy array to torch Tensor

See how changing the np array changed the torch Tensor automatically

In [None]:
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)

All the Tensors on the CPU except a CharTensor support converting to
NumPy and back.

CUDA Tensors


Tensors can be moved onto GPU using the ``.cuda`` function.

In [None]:
# let us run this cell only if CUDA is available
if torch.cuda.is_available():
    x = x.cuda()
    y = y.cuda()
    x + y

In [None]:
torch.cuda.is_available()

Autograd: automatic differentiation
===================================

Central to all neural networks in PyTorch is the ``autograd`` package.
Let’s first briefly visit this, and we will then go to training our
first neural network.


The ``autograd`` package provides automatic differentiation for all operations
on Tensors. It is a define-by-run framework, which means that your backprop is
defined by how your code is run, and that every single iteration can be
different.

Let us see this in more simple terms with some examples.

Variable
--------

``autograd.Variable`` is the central class of the package. It wraps a
Tensor, and supports nearly all of operations defined on it. Once you
finish your computation you can call ``.backward()`` and have all the
gradients computed automatically.

You can access the raw tensor through the ``.data`` attribute, while the
gradient w.r.t. this variable is accumulated into ``.grad``.

There’s one more class which is very important for autograd
implementation - a ``Function``.

``Variable`` and ``Function`` are interconnected and build up an acyclic
graph, that encodes a complete history of computation. Each variable has
a ``.grad_fn`` attribute that references a ``Function`` that has created
the ``Variable`` (except for Variables created by the user - their
``grad_fn is None``).

If you want to compute the derivatives, you can call ``.backward()`` on
a ``Variable``. If ``Variable`` is a scalar (i.e. it holds a one element
data), you don’t need to specify any arguments to ``backward()``,
however if it has more elements, you need to specify a ``grad_output``
argument that is a tensor of matching shape.

In [None]:
###############################################################
# Create a variable:
x = Variable(torch.ones(2, 2), requires_grad=True)
print(x)

In [None]:
###############################################################
# Do an operation of variable:
y = x + 2
print(y)


In [None]:
###############################################################
# ``y`` was created as a result of an operation, so it has a ``grad_fn``.
print(y.grad_fn)

In [None]:

###############################################################
# Do more operations on y
z = y * y * 3
out = z.mean()

print(z, out)


In [None]:
###############################################################
# Gradients
# ---------
# let's backprop now
# ``out.backward()`` is equivalent to doing ``out.backward(torch.Tensor([1.0]))``

out.backward()

In [None]:
###############################################################
# print gradients d(out)/dx
#

print(x.grad)


You should have got a matrix of ``4.5``. Let’s call the ``out`` *Variable* $o$.
We have that: $o = \frac{1}{4}\sum_i z_i$, 
$z_i = 3(x_i+2)^2$ and $z_i\bigr\rvert_{x_i=1} = 27$

Therefore,
$$\frac{\partial o}{\partial x_i} = \frac{3}{2}(x_i+2)$$ hence
$$\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5$$

In [None]:
# You can do many crazy things with autograd!
x = torch.randn(3)
x = Variable(x, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2

print(y)

In [None]:
gradients = torch.FloatTensor([0.1, 1.0, 0.0001])
y.backward(gradients)

print(x.grad)

## Exercise 3.1 Calculate the gradient of the following function wrt variables $w$ and $b$:
$$ f(w, b) = \frac{1}{1+e^{-(w^Tx+b)}}$$

In [None]:
w = Variable(torch.randn(10), requires_grad=True)
b = Variable(torch.randn(1), requires_grad=True)
X = torch.randn(100,10)

## Exercise 3.2  implement a logistic regression using PyTorch



In [None]:
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
    cancer.data, cancer.target, stratify=cancer.target, random_state=42)
y_train = y_train.astype('float')
y_test = y_test.astype('float')
X_train = sklearn.preprocessing.scale(X_train)
X_test = sklearn.preprocessing.scale(X_test)

In [None]:
print(X_train.shape)
print(X_test.shape)

In [1]:
# Logistic regression implemented in PyTorch
# Complete the implementation of the following class
class MyLogisticRegression:
        
    def __init__(self, num_iter=100, lr=0.001) :
        self.num_iter = num_iter
        self.lr = lr
    
    def fit(self, X, y) :
        X = np.concatenate((X, np.ones((X.shape[0],1))), axis=1)
        X = Variable(Tensor(X))
        y = Variable(Tensor(y))
              
        # gradient descent
        for iter in range(self.num_iter):
            pass


            
    # calculate cross entropy
    @staticmethod
    def cross_entropy(y, prob):
        # calculate cross entropy between label y and predicted prob 
        pass
    
    @staticmethod
    def sigmoid(x):  
        # given x as np array, return sigmoid transformation of x
        pass

    # return prediction accuracy given X and labey y
    def score(self, X, y) :
        pass
    
    # return predicted labels (1/0) given X
    def predict(self, X):
        pass

In [None]:
# After implemention, create an instance 
lr = MyLogisticRegression(1000,0.01)

In [None]:
# fit the model with training data
lr.fit(X_train, y_train)

In [None]:
# check the score of training data
lr.score(X_train, y_train)

In [None]:
# check the score of test data
lr.score(X_train, y_train)

In [None]:
# Tensor = torch.Tensor
# class MyLogisticRegression:
    
#     def __init__(self, num_iter=100, lr=0.001) :
#         self.num_iter = num_iter
#         self.lr = lr
    
#     def fit(self, X, y) :
#         X = np.concatenate((X, np.ones((X.shape[0],1))), axis=1)
#         X = Variable(Tensor(X))
#         y = Variable(Tensor(y))
        
#         w = Variable(Tensor(0.1*np.random.randn(X.size()[1])), requires_grad=True)
        
#         # gradient descent
#         for iter in range(self.num_iter):
#             if hasattr(w.grad, "data"):
#                 w.grad.data.zero_()  # zero grad
#             prob = Variable.sigmoid(X.matmul(w))
#             loss = self.cross_entropy(y, prob)
#             if iter%100==0:
#                 print('{}: {}'.format(iter, loss.data.cpu().numpy()))
#             loss.backward()            
#             w.data.sub_(self.lr*w.grad.data)
            
#         self.w = w.data.cpu().numpy()    
            
#     # calculate cross entropy
#     @staticmethod
#     def cross_entropy(y, prob):
#         return -(y*Variable.log(prob+0.00001) + (1-y)*Variable.log(1-prob+0.00001)).mean()
    
#     @staticmethod
#     def sigmoid(x):
#         return 1 / (1 + np.exp(-x))

#     def score(self, X, y) :
#         pred_y = self.predict(X)
#         return 1.0*np.sum(pred_y==y)/len(y)
    
#     def predict(self, X):
#         return 1*(self.sigmoid(np.concatenate((X, np.ones((X.shape[0],1))), axis=1).dot(self.w)) >= 0.5)

In [None]:
lr.score(X_test, y_test)

### Implement logistic regression using torch.nn

In [None]:
class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
       # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(30, 1)
    
    def forward(self, x):
        x = F.sigmoid(self.fc1(x))
        return x

In [None]:
net = Net()

In [None]:
params = list(net.parameters())
print(len(params))
print(params[0].size())  # fc1's .weight
print(params[1].size())  # bias

In [None]:
# check the parameters
print(net.fc1.weight)
print(net.fc1.bias)

In [None]:
input = Variable(Tensor(X_train))
out = net(input)

In [None]:
criterion = nn.BCELoss()

In [None]:
import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

input = Variable(Tensor(X_train))
target = Variable(Tensor(y_train))

# in your training loop:
for i in range(1000):
    optimizer.zero_grad()   # zero the gradient buffers
    output = net(input)
    loss = criterion(output, target)
    loss.backward()
    optimizer.step()    # Does the update
    if i%100 == 0:
        print('iter: {} loss: {}'.format(i, loss[0].data.numpy()[0]))

In [None]:
# training accuracy
sklearn.metrics.accuracy_score(1*(net(Variable(Tensor(X_train))).data.cpu().numpy()>0.5), y_train)

In [None]:
# test accuracy
sklearn.metrics.accuracy_score(1*(net(Variable(Tensor(X_test))).data.cpu().numpy()>0.5), y_test)

In [None]:
pred_y = net(Variable(Tensor(X_test))).data.cpu().numpy()

In [None]:
def plot_auc(target_y, pred_y) :
    FPR, TPR, thresholds = sklearn.metrics.roc_curve(target_y,pred_y)
    roc_auc = sklearn.metrics.auc(FPR, TPR)

    plt.title('LOG_LOSS=' + str(sklearn.metrics.log_loss(target_y, pred_y)))
    plt.plot(FPR, TPR, 'b', label='AUC = %0.6f' % roc_auc)
    plt.plot([0,1],[0,1],'--k')
    plt.legend()
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Posive Rate')
    plt.grid()

In [None]:
plot_auc(y_test, pred_y)

### CIFAR10 Image Classifciation
It has the classes: ‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’,
‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’. 

The images in CIFAR-10 are of
size 3x32x32, i.e. 3-channel color images of 32x32 pixels in size.

Training an image classifier


We will do the following steps in order:

1. Load and normalizing the CIFAR10 training and test datasets using
   ``torchvision``
2. Define a Convolution Neural Network
3. Define a loss function
4. Train the network on the training data
5. Test the network on the test data

1. Loading and normalizing CIFAR10

Using ``torchvision``, it’s extremely easy to load CIFAR10.

In [None]:
import torchvision
import torchvision.transforms as transforms

### Data Preprocessing
The output of torchvision datasets are PILImage images of range [0, 1].

We transform them to Tensors of normalized range [-1, 1]

In [None]:
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

### Let us show some of the training images, for fun.

In [None]:
# functions to show an image
def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))

import builtins

# get some random training images
dataiter = builtins.iter(trainloader)
images, labels = dataiter.next()

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

In [None]:
########################################################################
# 2. Define a Convolution Neural Network
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# Copy the neural network from the Neural Networks section before and modify it to
# take 3-channel images (instead of 1-channel images as it was defined).



class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

In [None]:
net = Net()

In [None]:
########################################################################
# 3. Define a Loss function and optimizer
# Let's use a Classification Cross-Entropy loss and SGD with momentum
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

In [None]:
########################################################################
# 4. Train the network
# ^^^^^^^^^^^^^^^^^^^^
#
# This is when things start to get interesting.
# We simply have to loop over our data iterator, and feed the inputs to the
# network and optimize

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs
        inputs, labels = data

        # wrap them in Variable
        inputs, labels = Variable(inputs), Variable(labels)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.data[0]
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')


In [None]:
#######################################################################
# 5. Test the network on the test data
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#
# We have trained the network for 2 passes over the training dataset.
# But we need to check if the network has learnt anything at all.
#
# We will check this by predicting the class label that the neural network
# outputs, and checking it against the ground-truth. If the prediction is
# correct, we add the sample to the list of correct predictions.
#
# Okay, first step. Let us display an image from the test set to get familiar.

dataiter = builtins.iter(testloader)
images, labels = dataiter.next()

# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))

In [None]:
########################################################################
# Okay, now let us see what the neural network thinks these examples above are:

outputs = net(Variable(images))

########################################################################
# The outputs are energies for the 10 classes.
# Higher the energy for a class, the more the network
# thinks that the image is of the particular class.
# So, let's get the index of the highest energy:
_, predicted = torch.max(outputs.data, 1)

print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
                              for j in range(4)))


In [None]:
# Let us look at how the network performs on the whole dataset.

correct = 0
total = 0
for data in testloader:
    images, labels = data
    outputs = net(Variable(images))
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted == labels).sum()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))


## Exercise 3.3:  Change the above model so that it can be run on GPU