<a href="https://colab.research.google.com/github/zohyan/Understanding-PyTorch/blob/master/LSTM_MNIST.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

> ### This is an example of application of LSTM on the MNIST

Steps
* Step 1: Load Dataset
* Step 2: Make Dataset Iterable
* Step 3: Create Model Class
* Step 4: Instantiate Model Class
* Step 5: Instantiate Loss Class
* Step 6: Instantiate Optimizer Class
* Step 7: Train Model



## Step 1: Loading MNIST Train Dataset



We ll use MNIST dataset. Images from 1 to 9

In [0]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as datasets

In [2]:
train_dataset = datasets.MNIST(root='./data', train=True, transform=transforms.ToTensor(), download=True)
test_dataset = datasets.MNIST(root='./data', train=False, transform=transforms.ToTensor())

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw
Processing...
Done!


In [3]:
train_dataset.train_data.size()







torch.Size([60000, 28, 28])

In [4]:
train_dataset.train_labels.size()



torch.Size([60000])

In [5]:
test_dataset.train_data.size()



torch.Size([10000, 28, 28])

In [6]:
test_dataset.test_labels.size()



torch.Size([10000])

## Step 2: Make Dataset Iterable

In [0]:
batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

## Step 3: Create Model Class

In [0]:
class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_dim):
        
        super(LSTMModel, self).__init__()
        
        # Hidden size
        # hidden_size - The number of features in the hidden state h
        self.hidden_size = hidden_size
        
        # Number of hidden layers
        # num_layers – Number of recurrent layers
        self.num_layers = num_layers
        
        # Building your LSTM
        # batch_first=True causes input/output tensors to be of shape
        # (batch_dim, seq_dim, feature_dim)
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        
        # Readout layer
        self.fc = nn.Linear(hidden_size, output_dim)
    
    def forward(self, x):

        # Initialize hidden state with zeros
        # h_0 of shape (num_layers * num_directions, batch, hidden_size)
        # here we have simple LSTM so num_directions = 1
        # tensor containing the initial hidden state for each element in the batch.
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).requires_grad_()
        
        # Initialize cell state
        # c_0 of shape (num_layers * num_directions, batch, hidden_size)
        # here we have simple LSTM so num_directions = 1
        # tensor containing the initial cell state for each element in the batch.
        c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).requires_grad_()
        
        # 28 time steps
        # We need to detach as we are doing truncated backpropagation through time (BPTT)
        # If we don't, we'll backprop all the way to the start even after going through another batch
        out, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))
        
        # Index hidden state of last time step
        # out.size() --> 100, 28, 100
        # out[:, -1, :] --> 100, 100 --> just want last time step hidden states! 
        out = self.fc(out[:, -1, :]) 
        # out.size() --> 100, 10
        return out

## Step 4: Instantiate Model Class

* 28 time steps
    * Each time step: input dimension = 28
* 1 hidden layer
* MNIST 1-9 digits $\rightarrow$ output dimension = 10

In [0]:
input_dim = 28
hidden_dim = 100
layer_dim = 1
output_dim = 10

In [0]:
model = LSTMModel(input_dim, hidden_dim, layer_dim, output_dim)

## Step 5: Instantiate Loss Class

In [0]:
criterion = nn.CrossEntropyLoss()

## Step 6: Instantiate Optimizer Class

At every iteration, we update our model's parameters

In [0]:
learning_rate = 0.1
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

## Step 7: Train Model

Process :

1. Convert inputs/labels to variables
2. Clear gradient buffers
3. Get output given inputs
4. Get loss
5. Get gradients w.r.t. parameters
6. Update parameters using gradients
7. REPEAT

In [0]:

# Number of steps to unroll
seq_dim = 28  

iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Load images as a torch tensor with gradient accumulation abilities
        images = images.view(-1, seq_dim, input_dim).requires_grad_()

        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        # outputs.size() --> 100, 10
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
        
        iter += 1
        
        # Every 500 iterations, we display the loss  
        # and accuracy for the current iteration
        if iter % 500 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                # Resize images
                images = images.view(-1, seq_dim, input_dim)
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                # Total correct predictions
                correct += (predicted == labels).sum()
            
            accuracy = 100 * correct / total
            
            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.item(), accuracy))

Iteration: 500. Loss: 2.2769298553466797. Accuracy: 17
Iteration: 1000. Loss: 1.0712666511535645. Accuracy: 64
