# Digit Recognizer | Kaggle
  
https://www.kaggle.com/c/digit-recognizer
## Competition Description

MNIST ("Modified National Institute of Standards and Technology") is the de facto “hello world” dataset of computer vision. Since its release in 1999, this classic dataset of handwritten images has served as the basis for benchmarking classification algorithms. As new machine learning techniques emerge, MNIST remains a reliable resource for researchers and learners alike.

In this competition, your goal is to correctly identify digits from a dataset of tens of thousands of handwritten images. We’ve curated a set of tutorial-style kernels which cover everything from regression to neural networks. We encourage you to experiment with different algorithms to learn first-hand what works well and how techniques compare.

## Brief Introduction
this code is using pytorch,the newly released deep learning framework to build a simple CNN to tackle the classic problem,MNIST.  
During this process,I've learned how to build a simple network with pytorch,and also learned how to using our own dataset instead of the dataset downloaded by torchvision  

Public Score:  
0.98714
  
## Environment
CPU: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz  
Mem: 16G  
GPU: GeForce GTX 1060  
Python: Python 3.6.0 |Anaconda 4.3.1 (64-bit)  
PyTorch: torch-0.1.12 with cuda8.0

In [1]:
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torchvision.datasets as dsets
import torchvision.transforms as transforms
from torch.autograd import Variable

In [27]:
print(torch.version)

<module 'torch.version' from '/home/wrc/anaconda3/lib/python3.6/site-packages/torch/version.py'>


## Hyper Parameters

In [2]:
num_epochs = 5
batch_size = 100
learning_rate = 0.001

## raw data
the data folder should be with the same path to this notebook,and it should contains the file "train.csv" and "test.csv",which are the data for this competition  
the training dataset should have 28x28 = 784 dimensions features and a column of label  
the test dataset should only have 784-D data

In [6]:
train_data = pd.read_csv('./data/train.csv')
train_data.head()

Unnamed: 0,label,pixel0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,...,pixel774,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783
0,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,4,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [7]:
test_data = pd.read_csv('./data/test.csv')
test_data.head()

Unnamed: 0,pixel0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,...,pixel774,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [8]:
del train_data
del test_data

## Build CustomedDataset
build this class in order to load data more conveniently and to process data in the way of mini-batch

In [9]:
class CustomedDataSet(torch.utils.data.Dataset):
    def __init__(self, train=True):
        self.train = train
        if self.train :
            trainX = pd.read_csv('./data/train.csv')
            trainY = trainX.label.as_matrix().tolist()
            trainX = trainX.drop('label',axis=1).as_matrix().reshape(trainX.shape[0], 1, 28, 28)
            self.datalist = trainX
            self.labellist = trainY
        else:
            testX = pd.read_csv('./data/test.csv')
            testX = testX.as_matrix().reshape(testX.shape[0], 1, 28, 28)
            self.datalist = testX
            
    def __getitem__(self, index):
        if self.train:
            return torch.Tensor(self.datalist[index].astype(float)),self.labellist[index]
        else:
            return torch.Tensor(self.datalist[index].astype(float))
    
    def __len__(self):
        return self.datalist.shape[0]

In [10]:
train_dataset = CustomedDataSet()

In [11]:
test_dataset = CustomedDataSet(train=False)

In [12]:
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True,
                                           num_workers=2)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

## CNN Model
2 conv layers + 1 fc layer  
with batchnorm and the activation function of ReLU

In [13]:
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1 ,16, kernel_size=5,padding=2),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.MaxPool2d(2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(16, 32, kernel_size=5,padding=2),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(2))
        self.fc = nn.Linear(7*7*32, 10)
        
    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.view(out.size(0), -1)
        out = self.fc(out)
        return out

In [14]:
cnn = CNN()
cnn.cuda()

CNN (
  (layer1): Sequential (
    (0): Conv2d(1, 16, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True)
    (2): ReLU ()
    (3): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
  )
  (layer2): Sequential (
    (0): Conv2d(16, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True)
    (2): ReLU ()
    (3): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
  )
  (fc): Linear (1568 -> 10)
)

In [15]:
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(cnn.parameters(),lr=learning_rate)

In [16]:
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = Variable(images).cuda()
        labels = Variable(labels).cuda()
        
        #Forward + Backward + Optimize
        optimizer.zero_grad()
        outputs = cnn(images)
        loss = criterion(outputs,labels)
        loss.backward()
        optimizer.step()
        if (i+1) % 100 == 0:
            print ('Epoch [%d/%d], Iter [%d/%d] Loss: %.4f' %(epoch+1, num_epochs, i+1, len(train_dataset)//batch_size, loss.data[0]))

Epoch [1/5], Iter [100/420] Loss: 0.1188
Epoch [1/5], Iter [200/420] Loss: 0.0466
Epoch [1/5], Iter [300/420] Loss: 0.0906
Epoch [1/5], Iter [400/420] Loss: 0.0562
Epoch [2/5], Iter [100/420] Loss: 0.1072
Epoch [2/5], Iter [200/420] Loss: 0.0412
Epoch [2/5], Iter [300/420] Loss: 0.0658
Epoch [2/5], Iter [400/420] Loss: 0.0138
Epoch [3/5], Iter [100/420] Loss: 0.0649
Epoch [3/5], Iter [200/420] Loss: 0.0236
Epoch [3/5], Iter [300/420] Loss: 0.0169
Epoch [3/5], Iter [400/420] Loss: 0.0187
Epoch [4/5], Iter [100/420] Loss: 0.0122
Epoch [4/5], Iter [200/420] Loss: 0.0853
Epoch [4/5], Iter [300/420] Loss: 0.0281
Epoch [4/5], Iter [400/420] Loss: 0.0594
Epoch [5/5], Iter [100/420] Loss: 0.0063
Epoch [5/5], Iter [200/420] Loss: 0.0198
Epoch [5/5], Iter [300/420] Loss: 0.0580
Epoch [5/5], Iter [400/420] Loss: 0.0039


## apply our trained model to test dataset

In [17]:
cnn.eval()  #    Sets the module in evaluation mode.
            #   This has any effect only on modules such as Dropout or BatchNorm.

CNN (
  (layer1): Sequential (
    (0): Conv2d(1, 16, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True)
    (2): ReLU ()
    (3): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
  )
  (layer2): Sequential (
    (0): Conv2d(16, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True)
    (2): ReLU ()
    (3): MaxPool2d (size=(2, 2), stride=(2, 2), dilation=(1, 1))
  )
  (fc): Linear (1568 -> 10)
)

In [18]:
ans = torch.cuda.LongTensor()    #build a tensor to concatenate answers

In [19]:
#I just can't throw all of test data into the network,since it was so huge that my GPU memory cann't afford it
for images in test_loader:
    images = Variable(images).cuda()
    outputs = cnn(images)
    _,predicted = torch.max(outputs.data, 1)
    ans = torch.cat((ans,predicted),0)

In [20]:
ans = ans.cpu().numpy()              #only tensor on cpu can transform to the numpy array

In [21]:
aa = pd.DataFrame(ans)
aa.columns = ['Label']
Id = range(1,aa.size+1)
aa.insert(0, 'ImageId', Id)               #bulid the summit csv

In [22]:
aa.head()

Unnamed: 0,ImageId,Label
0,1,2
1,2,0
2,3,9
3,4,0
4,5,3


In [23]:
aa.to_csv('submit_pytorch.csv',index = False)