#### ResNet
[Deep Residual Learning for Image Recognition](resnet.pdf)

ResNet architecture makes use of shortcut connections to solve the vanishing gradient problem. The basic building block of ResNet is a Residual block that is repeated throughout the network.

Instead of learning the mapping from x →F(x), the network learns the mapping from x → F(x)+G(x). When the dimension of the input x and output F(x) is the same, the function G(x) = x is an identity function and the shortcut connection is called Identity connection. The identical mapping is learned by zeroing out the weights in the intermediate layer during training since it's easier to zero out the weights than push them to one.

For the case when the dimensions of F(x) differ from x (due to stride length>1 in the CONV layers in between), the Projection connection is implemented rather than the Identity connection. The function G(x) changes the dimensions of input x to that of output F(x). Two kinds of mapping were considered in the original paper.

- Non-trainable Mapping (Padding): The input x is simply padded with zeros to make the dimension match that of F(x)

- Trainable Mapping (Conv Layer): 1x1 Conv layer is used to map x to G(x). It can be seen from the table above that across the network the spatial dimensions are either kept the same or halved, and the depth is either kept the same or doubled and the product of Width and Depth after each conv layer remains the same i.e. 3584. 1x1 conv layers are used to half the spatial dimension and double the depth by using stride length of 2 and multiple of such filters respectively. The number of 1x1 conv layers is equal to the depth of F(x).

![image](resnet34.png)


In [2]:
import torch
import torch.nn as nn
import random
import torchvision
import numpy as np
import matplotlib.pyplot as plt
from torchvision import datasets
from torch.utils.data import DataLoader,random_split
from torchvision import transforms

def get_dataloaders(datasetname, batch_size, validation_fraction=0.2, train_transforms=transforms.ToTensor(), test_transforms=transforms.ToTensor()):
    # Load train and test datasets
    if datasetname == 'CIFAR10':
        train_dataset = datasets.CIFAR10(root='data', train=True, transform=train_transforms, download=True)
        test_dataset = datasets.CIFAR10(root='data', train=False, transform=test_transforms)
    else: # MNIST
        train_dataset = datasets.MNIST(root='data', train=True, transform=train_transforms, download=True)
        test_dataset = datasets.MNIST(root='data', train=False, transform=test_transforms)

    # Split train dataset into train and validation subsets
    train_size = int((1 - validation_fraction) * len(train_dataset))
    valid_size = len(train_dataset) - train_size
    train_subset, val_subset = random_split(train_dataset, [train_size, valid_size])

    # Create data loaders for each subset
    train_loader = DataLoader(train_subset, batch_size, shuffle=True)
    valid_loader = DataLoader(val_subset, batch_size, shuffle=True)
    test_loader = DataLoader(test_dataset, batch_size, shuffle=True)

    return train_loader, valid_loader, test_loader

def set_seed(seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed) 

def compute_accuracy(model, data_loader, device):
    with torch.no_grad():    
        correct_pred, num_examples = 0, 0
        for i, (features, targets) in enumerate(data_loader):

            features = features.to(device)
            targets = targets.float().to(device)

            logits = model(features)
            _, predicted_labels = torch.max(logits, 1)

            num_examples += targets.size(0)
            correct_pred += (predicted_labels == targets).sum()

        return correct_pred.float()/num_examples * 100

In [None]:
### SETTINGS
BATCH_SIZE = 256
NUM_EPOCHS = 50
DEVICE = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
set_seed(123)


### CIFAR10 DATASET
train_transforms = torchvision.transforms.Compose([
    torchvision.transforms.Resize((120, 120)),
    torchvision.transforms.RandomCrop((110, 110)),
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

test_transforms = torchvision.transforms.Compose([
    torchvision.transforms.Resize((120, 120)),
    torchvision.transforms.RandomCrop((110, 110)),
    torchvision.transforms.ToTensor(),                
    torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

train_loader, valid_loader, test_loader = get_dataloaders('CIFAR10', batch_size=BATCH_SIZE, validation_fraction=0.1, train_transforms=train_transforms, test_transforms=test_transforms)

# Checking the dataset
for images, labels in train_loader:  
    print('Image batch dimensions:', images.shape)
    print('Image label dimensions:', labels.shape)
    print('Class labels of 10 examples:', labels[:10])
    break