# Convolutional Neural Network using pytorch

This will go over my implementation of All-CNN-C model, introduced in paper [Striving For Simplicity: The All Convolutional Net](https://arxiv.org/abs/1412.6806), using pytorch library

In usual CNN, 3 types of layers are used
- Convolution Layer
- Pooling Layer
- Fully Connected Layer

This paper present All-CNN-C convolutional network, which utilizes 
- **Convolution with 2 strides** instead of MaxPool
- **Global Averaging and softmax** instead of Fully Connected Layer

## Architecture 

- First Conv Layers
    * 3 x 3 Conv, ReLU 96, Stride 1
    * 3 x 3 Conv, ReLU 96, Stride 1
- First Conv Pooling Layer
    * 3 x 3 Conv, ReLU 96, stride 2

| Layer | Kernel | Stride | Image Size | 
|-------|--------|--------|
|  Conv, ReLU 96 | 3 x 3  | 1 x 1  | 32 x 32 | 
|  Conv, ReLU 96 | 3 x 3  | 1 x 1  | 
|  **Conv, ReLU 96** | **3 x 3**  | **2 x 2**  |
|  Conv, ReLU 192 | 3 x 3  | 1 x 1  |
|  Conv, ReLU 192 | 3 x 3  | 1 x 1  |
|  **Conv, ReLU 192** | **3 x 3**  | **2 x 2**  |
|  Conv, ReLU 192 | 3 x 3  | 1 x 1  |
|  Conv, ReLU 192 | 1 x 1  | 1 x 1  | 
|  Conv, ReLU 10 | 1 x 1  | 1 x 1  | 6 x 6 |
| **Global Average** |  **6 x 6**  | **1 x 1** |
|           10 Way Softmax          |


- Batch Normalization was applied for each layers, except the first Conv layer

## Creating the Model using Pytorch LIbrary

### Importing neccesary modules

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F

### Modulizing the model 
- Here, we will break down the models into smaller 'modules', which contains...
    * 2D convolution
    * Batch Normalization
    * LeakyReLU activation
We'll do this by creating pytorch class, called CUnit, and each has
- \__init\__: this initializes neccesary layers
- forward: perform the calculations using defined layers in  \__init\__

In [4]:
class CUnit(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1, batch_norm=True):
        super(CUnit, self).__init__()
        pass
  
        self.conv = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride, padding=padding)
        self.bn = nn.BatchNorm2d(num_features=out_channels)
        self.lrelu = nn.LeakyReLU(negative_slope=0.2)
        
    def forward(self, inp, batch_norm=True):
        out = self.conv(inp)
        if batch_norm:
            out = self.bn(out)
        out = self.lrelu(out)
        return out

### Attributes for CUnit class
* \__init\__()
    - in_channels: depth/channels of input images to the unit
    - out_channels: depth/channels of output images from the unit
    - kernel_size: kernel/filter size of the convolution
    - stride: how many pixels the filter moves in one step of convolution
    - padding: padding on input images
    - batch_norm: whether or not to apply batch normalization in the unit
    
* forward()
    - inp: input (image matrix)
    - batch_norm: Boolean value

### Building the whole model

**Now that we constructed the unit class, let's use them to construct the model**

In [6]:
class all_CNN(nn.Module):
    def __init__(self, image_depth, num_classes):
        # first, set up parameters and configs
        self.image_depth = image_depth
        self.num_classes = num_classes
        self.num_out1 = 96
        self.num_out2 = 64
        # Defining dropouts with defined probability
        self.drop1 = nn.Dropout(p=0.2)
        self.drop2 = nn.Dropout(p=0.5)
        
        # now we create units using the CUnit class, based on the
        # model table above...
        self.conv1 = CUnit(in_channels=self.image_depth, out_channels=96, stride=1, batch_norm=False)
        self.conv2 = CUnit(in_channels=96, out_channels=96)
        # here, we'll use 2 stride convolution layer instead of pooling layer
        self.convPool1 = CUnit(in_channels=96, out_channels=96, stride=2, padding=0) 
        self.conv3 = CUnit(in_channels=96, out_channels=192)
        self.conv4 = CUnit(in_channels=192, out_channels=192)
        # Second ConvPool Layer
        self.convPool2 = CUnit(in_channels=192, out_channels=192, stride=2)
        self.conv5 = CUnit(in_channels=192, out_channels=192, padding=0)
        self.conv6 = CUnit(in_channels=192, out_channels=192, kernel_size=1, padding=0)
        self.conv7 = CUnit(in_channels=192, out_channels=self.num_classes, kernel_size=1, padding=0)
        
        # Average Pooling and softmax layers 
        self.avp = nn.AvgPool2d(6)
        self.softmax = nn.softmax(dim=1)
        
    def forward(self, x):
        # Convolution and convPool computations
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.convPool1(x)
        x = self.drop2(x)
        x = self.conv3(x)
        x = self.conv4(x)
        x = self.convPool2(x)
        x = self.drop2(x)
        x = self.conv5(x)
        x = self.conv6(x)
        x = self.conv7(x)
        # average pooling
        avg = self.avp(x)
        # changing shape
        avg = avg.view(-1, self.num_classes)
        # applying softmax
        out = self.softmax(avg)
        
        return out

### Data Preprocessing and Model Training

**Dataset**

Here, we will use image dataset, CIFAR10. Image datasets for classification.
- [CIFAR 10 & 100 dataset](https://www.cs.toronto.edu/~kriz/cifar.html)
- 32 x 32 pixel images of 10 classes
- 60000 images total, 10000 for testing, 50000 for training.
- 6000 images per class

**Preprocessing**
- Horizontal Flip
- Normalization
are used for the data.

**Note**: the code is designed so that it will take advantage of GPU if it is available.

Let's get started!

In [8]:
# Importing stuff...
import os
import torch
from torch.autograd import Variable
import torchvision
import torch.nn as nn
from torch.optim.lr_scheduler import MultiStepLR
from torchvision import transforms
from torchvision.utils import save_image
from all_CNN_model import all_CNN
from logger import Logger


### Imported Modules
- `os`: used to execute system command in python
- `torch`: that's the ML library... duh
- `torchvision`: importing CIFAR10. Other major datasets are also available through this.
- `torch.nn`: for layers, activations
- `MultiStepLR`: for adaptive learning rate, refer the paper
- `transforms`: preprocessing CIFAR10
- `save_image`: Saving image
- `all_CNN`: our model
- `Logger`: custom logger class, logging training data using tensorflow. thanks to someone from github

In [9]:
# Setting up the device, CPU or GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Hyper Parameters
lr = 0.04 #0.25, 0.01, 0.05, ,  
image_size = 32# for image's hight and witdh
num_epochs = 50 # how many times to go through the training set
num_classes = 10
batch_size = 64 
image_depth = 3 # or channels
sample_dir = 'CIFAR10_sample'

# Create a directory
if not os.path.exists(sample_dir):
    os.makedirs(sample_dir)

# Initializing the logger
logPath = 'logs_CNN/'
record_name = 'CIFAR10_' + str(lr)
logger = Logger(logPath + record_name)


Little bit about torch.device...

`torch.cuda.is_available()` returns if cuda is available. 
`torch.device` create device, that can be used later for Variable/tensor setting.

This is done, so we don't have to rewrite the whole program for cuda and cpu option.

### Data Preprocessing
- `transforms.Compose()` accepts list of transformation and define transformation to apply. Here we'll use horizontal Flip and Normalize
- `transform.RandomHorizontalFlip(p=n)` flips the image horizontally, with probability of n
- `transform.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))` normalize the images, with provided mean and standard deviation values. One value for each channel, here 3 channels


In [10]:
transform = transforms.Compose([ # transforms.Compose, list of transforms to perform
                transforms.RandomHorizontalFlip(p=0.5),
                transforms.ToTensor(),
                transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))])

### Loading the dataset
train_dataset = torchvision.datasets.CIFAR10(root='CIFAR10_data/', # where at??
                                   train=True,
                                   transform=transform, # pass the transform  we made
                                   download=True)

test_dataset = torchvision.datasets.CIFAR10(root='CIFAR10_data/', # where at??
                                   train=False,
                                   transform=transform, # pass the transform  we made
                                   download=True)

### Data Loader
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                          batch_size=batch_size,
                                          shuffle=True, drop_last=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False, drop_last=True)

Files already downloaded and verified
Files already downloaded and verified


### Dataset and Loaders
We first load the dataset for train and test, and create dataloader for each of them.

**Dataset**
- `torchvision.datasets.DATASET_NAME()` load the dataset with the name. 
    * **root**: relative file path to store the data
    * **train**: bol, train set or not
    * **trainsform**: accept the pre-defined transformation. Here we provide the one we created 
    * **download**: if we download the dataset
 
**Data Loaders**
- `torch.utils.data.DataLoader()` create data loader, which provide batches of the data to the model during training
    * **dataset**: dataset to create the loader from
    * **batch_size**: batch size
    * **shuffle**: whether to shuffle data
    * **drop_last**: drop the last data points in the loader which is smaller than the batch size, if it's true
    

### Model and Training Setups


In [11]:
# initialize the model with parameters
D = all_CNN(image_depth, num_classes)

# Device setting
# D.to() moves and/or casts the parameters and buffers to device(cuda), dtype
# setting to whatevefr the device set earlier
D = D.to(device)

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(D.parameters(), lr=lr, momentum=0.9, weight_decay=0.001)

## Adaptive Learning Rate ##
scheduler = MultiStepLR(optimizer, milestones=[200, 250, 300], gamma=0.1)

- `model/variable/tensor.to(device)` casts/move the parameters and buffers to device(cpu/cuda) or dtypes
- `MultiStepLR`: adaptive learning rate scheduler
    * `milestones`: specifies the epoch number to change the lr
    * `gamma`: $lr_{new} = gamma*lr_{old}$
    
### Utility Functions

In [14]:
# denormalize the image
def denorm(x):
    out = (x + 1) / 2
    return out.clamp(0, 1)

# evaluate the model
def evaluate(mode, num):
    '''
    Evaluate using only first num batches from loader
    '''
    
    test_loss = 0
    correct = 0
    
    # define the mode, to use training set or testing set
    if mode == 'train':
        loader = train_loader
    elif mode == 'test':
        loader = test_loader

    with torch.no_grad():
        for i, (data, target) in enumerate(loader):
            # create the variables for image and target
            data, target = Variable(data), Variable(target)
            # forward pass
            output = D(data)
            # calculate, and add the loss of the batch to total loss
            test_loss += criterion(output, target).item()
            # make prediction, and get the index numbers as class label
            pred = output.data.max(1, keepdim=True)[1]
            # compare prediction with the target
            correct += pred.eq(target.data.view_as(pred)).cpu().sum()
            if i % 10 == 0:
                print(i)
            if i == num: # break out when numth number of batch
                break
        sample_size = batch_size * num # How many datapoints
        test_loss /= sample_size # average loss
        print('\n' + mode + 'set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
            test_loss, correct, sample_size,
            100. * correct / sample_size)) # acccuracy
    return 100. * correct / sample_size




- `Variable.item()` returns the data, if Variable is 1D
- `Variable.data` returns Tensor
- `Tensor.max(dim, keepdim)` returns max values along the dim, and set keepdim=True, to retain the shape of the original tensor


### Train the Model!