### Name: Xinyan Yang 
### NetID: xinyany2
**HW3: Train a deep convolution network on a GPU with PyTorch for the CIFAR10 dataset. The convolution network should use (A) dropout, (B) trained with RMSprop or ADAM, and (C) data augmentation. For 10% extra credit, compare dropout test accuracy (i) using the heuristic prediction rule and (ii) Monte Carlo simulation.**
*For full credit, the model should achieve 80-90% Test Accuracy. Submit via Compass (1) the code and (2) a paragraph (in a PDF document) which reports the results and briefly describes the model architecture. Due September 28 at 5:00 PM.*

### The results and model architecture  

The following code trains a convolution neural network on a GPU with Pytorch for the CIFAR10 dataset. We will do the following steps in order.  
1. Load and normalizing the CIFAR10 training and test datasets using ``torchvision``  
2. Define a Convolution Neural Network  
3. Define a loss function  
4. Train the network on the training data  
5. Test the network on the test data 
  
*Data augmention*(Random horizontal flip, random vertical flip and random rotation) is performed while loading the dataset. The deep cnn is composed of 8 convolution layers, 2 max-pooling layers and 2 fully-connected layers, at the same time *dropout* and *batch normalization* are performed every other layer.  The criterion for the loss function is CrossEntropyLoss and the optimizer is *Adam*.  
  
 After 100 epochs , **the test accuracy is 84.29%**.

#### The below output comes from the result of running the code on a GPU with BW. 

  
  
--- Epoch :  1 ---  
lr : 0.001000  
Epoch [1/100], Step [100/500], Loss: 1.9017  
Epoch [1/100], Step [200/500], Loss: 1.6356  
Epoch [1/100], Step [300/500], Loss: 1.8095  
Epoch [1/100], Step [400/500], Loss: 1.7006  
Epoch [1/100], Step [500/500], Loss: 1.5524  
Val Acc : 0.4356   
  
  
--- Epoch : 21 ---  
lr : 0.000810  
Epoch [21/100], Step [100/500], Loss: 1.0175  
Epoch [21/100], Step [200/500], Loss: 0.8192  
Epoch [21/100], Step [300/500], Loss: 0.7113  
Epoch [21/100], Step [400/500], Loss: 0.8276  
Epoch [21/100], Step [500/500], Loss: 0.8906  
Val Acc : 0.7641  
  
  
--- Epoch : 41 ---  
lr : 0.000656  
Epoch [41/100], Step [100/500], Loss: 0.6814  
Epoch [41/100], Step [200/500], Loss: 0.7379  
Epoch [41/100], Step [300/500], Loss: 0.5427  
Epoch [41/100], Step [400/500], Loss: 0.7138  
Epoch [41/100], Step [500/500], Loss: 0.5250  
Val Acc : 0.8070  
  
  
--- Epoch : 61 ---  
lr : 0.000531  
Epoch [61/100], Step [100/500], Loss: 0.5049  
Epoch [61/100], Step [200/500], Loss: 0.5073  
Epoch [61/100], Step [300/500], Loss: 0.4406  
Epoch [61/100], Step [400/500], Loss: 0.6080  
Epoch [61/100], Step [500/500], Loss: 0.4265  
Val Acc : 0.8243  
  
  
--- Epoch : 81 ---  
lr : 0.000430  
Epoch [81/100], Step [100/500], Loss: 0.4927  
Epoch [81/100], Step [200/500], Loss: 0.4958  
Epoch [81/100], Step [300/500], Loss: 0.4971  
Epoch [81/100], Step [400/500], Loss: 0.4501  
Epoch [81/100], Step [500/500], Loss: 0.5944  
Val Acc : 0.8381  
  
  
--- Epoch : 100 ---  
lr : 0.000387  
Epoch [100/100], Step [100/500], Loss: 0.6904  
Epoch [100/100], Step [200/500], Loss: 0.6240  
Epoch [100/100], Step [300/500], Loss: 0.5897  
Epoch [100/100], Step [400/500], Loss: 0.4948  
Epoch [100/100], Step [500/500], Loss: 0.6030  
Val Acc : 0.8429  
  
  
Test Accuracy of the model on the 10000 test images: 84.29 %  


In [None]:
#coding: utf-8
import torch
import torchvision
import torch.nn as nn
import numpy as np
import torchvision.transforms as transforms
import torch.nn.functional as F
import numpy as np
import tensorboardX

In [None]:
# Device configuration
torch.cuda.is_available()
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

In [None]:
#set seed
torch.manual_seed(1)
# Hyper parameters
num_epochs = 80
num_classes = 10
batch_size = 100
learning_rate = 0.001

In [None]:
# Prepare the data.
print('==> Preparing data..')

# The output of dataset of torchvision is PILImage in [0,1], we normalize it first to [-1, 1] 
transform_train = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomVerticalFlip(),
    transforms.RandomVerticalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

# Download and construct CIFAR-10 dataset.
#dataloader, 组合数据集和采样器，并在数据集上提供单进程或多进程迭代器/batch_size:每批次进入多少数据/shuffle：shuffle the data

train_dataset = torchvision.datasets.CIFAR10(root='./data',
                                             train=True, 
                                             transform=transform_train,
                                             download=True)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size, 
                                           shuffle=True)


test_dataset = torchvision.datasets.CIFAR10(root='./data',
                                          train=False, 
                                          transform=transform_test)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size, 
                                          shuffle=False)

classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

In [None]:
class Cifar10Model(nn.Module):
    def __init__(self):
        super(Cifar10Model, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=4, stride=1, padding=2)
        self.conv2 = nn.Conv2d(64, 64, kernel_size=4, stride=1, padding=2)
        self.conv3 = nn.Conv2d(64, 64, kernel_size=4, stride=1, padding=2)
        self.conv4 = nn.Conv2d(64, 64, kernel_size=4, stride=1, padding=2)
        self.conv5 = nn.Conv2d(64, 64, kernel_size=4, stride=1, padding=2)
        self.conv6 = nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=0)
        self.conv7 = nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=0)
        self.conv8 = nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=0)
        self.fc1 = nn.Linear(64 * 4 * 4, 500)
        self.fc2 = nn.Linear(500, num_classes)
        self.batchnorm1 = nn.BatchNorm2d(64)
        self.batchnorm2 = nn.BatchNorm2d(64)
        self.batchnorm3 = nn.BatchNorm2d(64)
        self.batchnorm4 = nn.BatchNorm2d(64)
        self.batchnorm5 = nn.BatchNorm2d(64)
        self.dropout1 = nn.Dropout2d(0.25)
        self.dropout2 = nn.Dropout2d(0.25)
        self.dropout3 = nn.Dropout2d(0.25)
        self.dropout4 = nn.Dropout2d(0.5)
           
    
    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.batchnorm1(x)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, (2, 2))
        x = self.dropout1(x)
        x = F.relu(self.conv3(x))
        x = self.batchnorm2(x)
        x = F.relu(self.conv4(x))
        x = F.max_pool2d(x, (2, 2))
        x = self.dropout2(x)
        x = F.relu(self.conv5(x))
        x = self.batchnorm3(x)
        x = F.relu(self.conv6(x))
        x = self.dropout3(x)
        x = F.relu(self.conv7(x))
        x = self.batchnorm4(x)
        x = F.relu(self.conv8(x))
        x = self.batchnorm5(x)
        x = self.dropout4(x)   
        x = x.view(-1, 64 * 4 * 4)
        x = F.relu(self.fc1(x))
        
        return self.fc2(x)

model = Cifar10Model().to(device)
print(model)

In [None]:
# Define the loss fucntion and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)	
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.9)

In [None]:
global_step = 0

def train(epoch, writer):
    model.train()
    scheduler.step()
    with open('out.txt', 'w') as f:
    
    print("\n--- Epoch : %2d ---" % epoch, file = f)
    print("lr : %f" % optimizer.param_groups[0]['lr'], file = f)
     
    steps = 50000//batch_size
    
    #avoid the overflow error in bluewater
    if(epoch > 6):
        for group in optimizer.param_groups:
            for p in group['params']:
                state = optimizer.state[p]
                if(state['step'] >= 1024):
                    state['step'] = 1000
    optimizer.step()
    
    for step, (images, labels) in enumerate(train_loader, 1):
        global global_step
        global_step += 1
        
        images, labels = images.to(device), labels.to(device)
        
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        if step % 100 == 0:
            print ('Epoch [%d/%d], Step [%d/%d], Loss: %.4f' % (epoch, epochs, step, steps, loss.item()), file = f)
            writer.add_scalar('train/train_loss', loss.item() , global_step)


In [None]:
def eval(epoch, writer):
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for (images, labels) in test_loader:
            images, labels = images.to(device), labels.to(device)
            
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            correct += (predicted == labels).sum().item()
            total += labels.size(0)
    
    print("Val Acc : %.4f" % (correct/total), file = f)
    writer.add_scalar('eval/val_acc', correct*100/total, epoch)

In [None]:
from tensorboardX import SummaryWriter
writer = SummaryWriter()
epochs = 1
 
for epoch in range(1, num_epochs+1):
    train(epoch, writer)
    eval(epoch, writer)

writer.close()

In [None]:
torch.save(model.state_dict(), 'model_cifar10_nobatch.pkl')

In [None]:
# Test the model
model.eval()  # eval mode (batchnorm uses moving mean/variance instead of mini-batch mean/variance)
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Test Accuracy of the model on the 10000 test images: {} %'.format(100 * correct / total), file = f)
    

In [None]:
### Save the model checkpoint
torch.save(model.state_dict(), 'model.ckpt')