## Trying out different optimizers

We begin with building a CNN architecture for image classification task on CIFAR10 dataset. In this part of the tutorial, we will understand how to use  different optimizers to train a CNN network.

To make data loading simple, we would use the torchvision package created as part of PyTorch which has data loaders for standard datasets such as ImageNet, CIFAR10, MNIST.

### CIFAR10 dataset
![CIFAR10](images/cifar10.png)

### Required Packages

In [1]:
#a Tensor library with GPU support
import torch

#Datasets, Transforms and Models specific to Computer Vision
import torchvision
import torchvision.transforms as transforms

#differentiation library that supports all differentiable Tensor operations in torch
from torch.autograd import Variable

#a neural networks library integrated with autograd functionality
import torch.nn as nn
import torch.nn.functional as F

#an optimization package with standard optimization methods such as SGD, RMSProp, LBFGS, Adam etc.
import torch.optim as optim

#Weight Initialization
import torch.nn.init as weight_init

# import nn module
import torch.nn as nn

#scientific computing library for Python
import numpy as np

#plotting and visualization library
import matplotlib.pyplot as plt
#Display on the notebook
%matplotlib inline 
plt.ion() #Turn interactive mode on.

### Dataloader and Transformers

In [2]:
#Train data
#Compose transforms (applies data transformation and augmentation) prior to feeding to training
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

#inbuilt dataset class for reading CIFAR10 dataset
trainset = torchvision.datasets.CIFAR10(root='../../data/', train=True,
                                        download=False, transform=transform)

#dataloader for Batching, shuffling and loading data in parallel
train_loader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

#test data
testset = torchvision.datasets.CIFAR10(root='../../data/', train=False,
                                       download=False, transform=transform)
test_loader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')



### Defining the model

To create a network, we should first inherit the base class nn.Module. You just have to define the forward function, and the backward function (where gradients are computed) is automatically defined for you using autograd. You can use any of the Tensor operations in the forward function.

In [3]:
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        
        #calling conv2d module for convolution
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=6, kernel_size=5,stride=1,padding=0,bias=True)
        
        #calling MaxPool2d module for max pooling with downsampling of 2
        self.pool_1 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.pool_2 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        #fully connected layers
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)   
     
    #defining the structure of the network
    def forward(self, x):
        
        #Applying relu activation after each conv layer
        x = self.pool_1(F.relu(self.conv1(x)))
        x = self.pool_2(F.relu(self.conv2(x)))
        
        #reshaping to 1d for giving input to fully connected units
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

model = Model()
model = model.cuda()

#Printing the network architecture
print(model)

Model (
  (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
  (pool_1): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (pool_2): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
  (fc1): Linear (400 -> 120)
  (fc2): Linear (120 -> 84)
  (fc3): Linear (84 -> 10)
)


### Using different kind of optimizers

For more details on each of the optimizers please refer the following: http://ruder.io/optimizing-gradient-descent/

In [4]:
# optimization scheme can be 'sgd', 'RMSProp', 'Adam', 'Adadelta', 'Adagrad'
optimization_scheme = "Adagrad"

criterion = nn.CrossEntropyLoss()

if optimization_scheme == 'sgd':
    optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
elif optimization_scheme == 'RMSProp':
    optimizer = optim.RMSprop(model.parameters(), lr=0.001, weight_decay=0)
elif optimization_scheme == "Adadelta":
     optimizer = optim.Adadelta(model.parameters(), lr=0.001, weight_decay=0)
elif optimization_scheme == "Adam":
     optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=0)
elif optimization_scheme == "Adagrad":
     optimizer = optim.Adagrad(model.parameters(), lr=0.001, weight_decay=0)
    

### Training with the defined optimizer

In [5]:
for epoch in range(5):  # loop over the dataset multiple times

    total_loss = 0.0
    correct = 0
    for i, data in enumerate(train_loader):
        # get the inputs
        inputs, labels = data

        # wrap them in Variable
        inputs, labels = Variable(inputs.cuda()), Variable(labels.cuda())

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        total_loss += loss.data[0]
        # Calculate no of correct classifications
        _, predicted_class = outputs.max(1)
        correct += predicted_class.data.eq(labels.data).sum()     
        
    print("Epoch: {0} | loss: {1} | accuracy: {2}".format(epoch, total_loss/len(train_loader)
                                                          , correct/float(len(train_loader.dataset))))

Epoch: 0 | loss: 1.9349696386814117 | accuracy: 0.30368
Epoch: 1 | loss: 1.8291597516107558 | accuracy: 0.34478
Epoch: 2 | loss: 1.7830828502893448 | accuracy: 0.36162
Epoch: 3 | loss: 1.7487359103488922 | accuracy: 0.37204
Epoch: 4 | loss: 1.7215245046949386 | accuracy: 0.3821
