# Assignment 1
## [Section 2]
This assignment will make you familier with 
1. loading and preprocessing data using built-in function
2. how to construct a simple CNN model
3. the training and testing pipeline


In this assignment, you might find some tutorials useful, such as https://pytorch.org/tutorials/beginner/basics/intro.html.

In [9]:
# Import dependencies.
import random
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor
from torchvision import datasets, transforms

In [10]:
# Set up your device 
cuda = torch.cuda.is_available()
device = torch.device("cuda:0" if cuda else "cpu")
print(device)

cuda:0


In [11]:
# Set up random seed to 1008. Do not change the random seed.
# Yes, these are all necessary when you run experiments!
seed = 1008
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
if cuda:
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

## 1. Data: MNIST [2 pt]
#### Load the MNIST training and test dataset using $\texttt{torch.utils.data.DataLoader}$ and $\texttt{torchvision.datasets}$. 

This dataset consists of images of handwritten digit, and thus the number of classes is 10. The shape of image in MNIST dataset is (28, 28, 1)

The normalization parameters we will use is (0.1307, 0.3081)

More details please refer to  http://yann.lecun.com/exdb/mnist/.

### 1.1. Load Training Set [1 pt]

In [12]:
# Load the MNIST training set with batch size 128, apply data shuffling and normalization 
# train_loader = TODO
training_data=datasets.MNIST(
    root="",
    train=True,
    download=True,
    transform=ToTensor()
)
train_dataloader = DataLoader(training_data, batch_size=128, shuffle=True)

### 1.2. Load Test Set [1 pt]

In [13]:
# Load the MNIST test set with batch size 128, apply normalization
# test_loader = TODO
test_data=datasets.MNIST(
    root="",
    train=False,
    download=True,
    transform=ToTensor()
) 
test_dataloader = DataLoader(test_data, batch_size=128, shuffle=True)

## 2. Models [3 pts]
#### You are going to define two convolutional neural networks which are trained to classify MNIST digits

### 2.1. CNN without Batch Norm [2 pts]

In [14]:
# Fill in the values below that make this network valid for MNIST data
# Hint: to make sure these, you may calculate the shape of x of every line in the forward.
conv1_in_ch = 1
conv2_in_ch = 20
fc1_in_features = 50*4*4
fc2_in_features = 500
n_classes = 10

In [15]:
class NetWithoutBatchNorm(nn.Module):
    def __init__(self):
        super(NetWithoutBatchNorm, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=conv1_in_ch, out_channels=20, kernel_size=5, stride=1)
        self.conv2 = nn.Conv2d(in_channels=conv2_in_ch, out_channels=50, kernel_size=5, stride=1)
        self.fc1 = nn.Linear(in_features=fc1_in_features, out_features=500)
        self.fc2 = nn.Linear(in_features=fc2_in_features, out_features=n_classes)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        #输出128x20x24x24

        x = F.max_pool2d(x, kernel_size=2, stride=2)
        #输出128x20x12x12

        x = F.relu(self.conv2(x))
        #输出128x50x8x8

        x = F.max_pool2d(x, kernel_size=2, stride=2)
        #输出128x50x4x4

        x = x.view(-1, fc1_in_features) # reshaping

        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        log_softmax_fun=nn.LogSoftmax()
        # Return the log_softmax of x.
        return log_softmax_fun(x)

### 2.2. CNN with Batch Norm [1 pt]

In [16]:
# Fill in the values below that make this network valid for MNIST data
# Hint: to make sure these, you may calculate the shape of x of every line in the forward.
conv1_bn_size = 20
conv2_bn_size = 50
fc1_bn_size = 500

In [17]:
# Define the CNN with architecture explained in Part 2.2
class NetWithBatchNorm(nn.Module):
    def __init__(self):
        super(NetWithBatchNorm, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=conv1_in_ch, out_channels=20, kernel_size=5, stride=1)
        self.conv1_bn = nn.BatchNorm2d(conv1_bn_size)
        self.conv2 = nn.Conv2d(in_channels=conv2_in_ch, out_channels=50, kernel_size=5, stride=1)
        self.conv2_bn = nn.BatchNorm2d(conv2_bn_size)
        self.fc1 = nn.Linear(in_features=fc1_in_features, out_features=500)
        self.fc1_bn = nn.BatchNorm1d(fc1_bn_size)
        self.fc2 = nn.Linear(in_features=fc2_in_features, out_features=n_classes)

    def forward(self, x):
        x = F.relu(self.conv1_bn(self.conv1(x)))
        #输出128x20x24x24

        x = F.max_pool2d(x, kernel_size=2, stride=2)
        #输出:128x20x12x12

        x = F.relu(self.conv2_bn(self.conv2(x)))
        #输出 :128x50x8x8

        x = F.max_pool2d(x, kernel_size=2, stride=2)
        #输出128x50x4x4


        x = x.view(-1, fc1_in_features)
        x = F.relu(self.fc1_bn(self.fc1(x)))
        x = self.fc2(x)

        # Return the log_softmax of x.
        log_softmax_fun=nn.LogSoftmax()

        return log_softmax_fun(x)


## 3. Training & Evaluation [4 pts]

### 3.1. Define training method [1 pt]

In [18]:
def train(model, device, train_loader, optimizer, epoch, log_interval = 100):
    # Set model to training mode
    model.train()
    # Loop through data points
    for batch_idx, (data, target) in enumerate(train_loader):

        # Send data and target to device
        # TODO
        data.to(device)
        target.to(device)
        # Zero out the optimizer
        # TODO
        optimizer.zero_grad()

        # Pass data through model
        # TODO
        output=model.forward(data)
        # Compute the negative log likelihood loss
        # TODO
        lossFunction=nn.NLLLoss()
        loss=lossFunction(output,target)

        # Backpropagate loss
        # TODO
        loss.backward()
        # Make a step with the optimizer
        # TODO
        optimizer.step()

        if batch_idx % log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))

### 3.2. Define test method [1 pt]

In [23]:
# Define test method
def test(model, device, test_loader):
    # Set model to evaluation mode
    model.eval()
    # Variable for the total loss 
    test_loss = 0
    # Counter for the correct predictions
    num_correct = 0
    
    # don't need autograd for eval
    with torch.no_grad():
        # Loop through data points

        for data, target in test_loader:

            # Send data to device
            # TODO
            data.to(device)
            target.to(device)
            # Pass data through model
            # TODO
            output=model.forward(data)
            # Compute the negative log likelihood loss with reduction='sum' and add to total test_loss
            # TODO
            lossFunction=nn.NLLLoss(reduction="sum")
            test_loss+=lossFunction(output,target)
            # Get predictions from the model for each data point
            # TODO
            pred = output.data.max(1, keepdim=True)[1]
            # Add number of correct predictions to total num_correct
            # TODO
            num_correct+= pred.eq(target.data.view_as(pred)).sum()
    # Compute the average test_loss
    avg_test_loss = test_loss/ len(test_loader.dataset)
    
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        avg_test_loss, num_correct, len(test_loader.dataset),
        100. * num_correct / len(test_loader.dataset)))

### 3.3 Train NetWithoutBatchNorm() [1 pt]

In [24]:
# Deifne model and sent to device
model = NetWithoutBatchNorm()

# Optimizer: SGD with learning rate of 1e-2 and momentum of 0.5
optimizer = optim.SGD(model.parameters(), lr=0.01,momentum=0.5)

# Training loop with 10 epochs
for epoch in range(1, 10 + 1):

    # Train model
    # TODO
    train(model,device,train_dataloader,optimizer,epoch)
    # Test model
    test(model,device,test_dataloader)

  return log_softmax_fun(x)



Test set: Average loss: 0.3135, Accuracy: 9039/10000 (90%)


Test set: Average loss: 0.1820, Accuracy: 9449/10000 (94%)


Test set: Average loss: 0.1228, Accuracy: 9633/10000 (96%)


Test set: Average loss: 0.0893, Accuracy: 9738/10000 (97%)


Test set: Average loss: 0.0782, Accuracy: 9769/10000 (98%)


Test set: Average loss: 0.0682, Accuracy: 9804/10000 (98%)


Test set: Average loss: 0.0675, Accuracy: 9802/10000 (98%)


Test set: Average loss: 0.0534, Accuracy: 9839/10000 (98%)


Test set: Average loss: 0.0525, Accuracy: 9838/10000 (98%)


Test set: Average loss: 0.0508, Accuracy: 9840/10000 (98%)



### 3.4 Train NetWithBatchNorm() [1 pt]

In [25]:
# Deifne model and sent to device
model = NetWithBatchNorm()

# Optimizer: SGD with learning rate of 1e-2 and momentum of 0.5
optimizer = optim.SGD(model.parameters(), lr=0.01,momentum=0.5)

# Training loop with 10 epochs
for epoch in range(1, 10 + 1):
    
    # Train model
    # TODO
    train(model,device,train_dataloader,optimizer,epoch)
    # Test model
    # TODO
    test(model,device,test_dataloader)



  return log_softmax_fun(x)



Test set: Average loss: 0.0821, Accuracy: 9799/10000 (98%)


Test set: Average loss: 0.0550, Accuracy: 9857/10000 (99%)


Test set: Average loss: 0.0444, Accuracy: 9872/10000 (99%)


Test set: Average loss: 0.0372, Accuracy: 9900/10000 (99%)


Test set: Average loss: 0.0346, Accuracy: 9898/10000 (99%)


Test set: Average loss: 0.0310, Accuracy: 9908/10000 (99%)


Test set: Average loss: 0.0309, Accuracy: 9901/10000 (99%)


Test set: Average loss: 0.0277, Accuracy: 9911/10000 (99%)


Test set: Average loss: 0.0276, Accuracy: 9914/10000 (99%)


Test set: Average loss: 0.0257, Accuracy: 9925/10000 (99%)



## 4. Empirically, which of the models achieves higher accuracy faster? [1pt]

Answer: NetWithBatchNorm() achieves higher accuracy faster.