# Feedforward Neural Network with PyTorch (on MNIST dataset)
By [Zahra Taheri](https://github.com/zata213), August 30, 2020

## Logistic regression problems
- It can represent linear functions well, e.g., $y=2x+1$ and $y=x_1+x_2$.
- It cannot represent non-linear functions well, e.g., $y=2x^2+1$ and $y=x_1x_2$.

![alt text](logistic-vs-nn.png)

## Non-linear functions in-depth

**Common types of non-linearity:**
- ReLUs (Rectified Linear Units)
- Sigmoid
- tanh

![alt text](non-linear-functions.png)

## Building Feedforward Neural Network Models with PyTorch (on CPU)

### Model A: 1 Hidden Layer Feedforward Neural Network (Sigmoid Activation)

In [1]:
# import libraries
import torch
import torch.nn as nn
from torch.autograd import Variable

import torchvision.transforms as transforms
import torchvision.datasets as datasets

In [2]:
# read MNIST dataset
train_dataset = datasets.MNIST(root='.\data',
                           train=True,
                           transform=transforms.ToTensor(),
                           download=True)

test_dataset = datasets.MNIST(root='.\data', 
                           train=False, 
                           transform=transforms.ToTensor())

In [5]:
train_dataset.data.size()

torch.Size([60000, 28, 28])

In [8]:
train_dataset.targets.size()

torch.Size([60000])

In [9]:
test_dataset.data.size()

torch.Size([10000, 28, 28])

In [10]:
test_dataset.targets.size()

torch.Size([10000])

#### Make dataset iterable

In [3]:
# The following values are considered such that the number of epochs equals to 5. If we want the number of epochs to be 10,
# then we must consider n_iters=6000

batch_size = 100 # means that in every iteration we fit hundred images to our model at one time

n_iters = 3000

num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)
num_epochs

5

In [4]:
# create iterable object :training dataset

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                          batch_size=batch_size,
                                          shuffle=True)# it means after each epochs, shuffle the training dataset

# create iterable object :testing dataset

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

### Create model class

In [5]:
class FeedforwardNeuralNetworkModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(FeedforwardNeuralNetworkModel, self).__init__() #to inherit every things from nn.Module
        # linear function
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        # non-linearity
        self.sigmoid = nn.Sigmoid()
        # linear function
        self.fc2 = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, x):
        # linear function
        out = self.fc1(x)
        # non-linearity
        out = self.sigmoid(out)
        #linear function
        out = self.fc2(out)
        return out       
    

###  Instantiate model class

In [6]:
input_dim = 28*28 # size of image
output_dim = 10 # 0,1,2,3,...,9
hidden_dim = 100 # number of neurons/ number of non-linear activation functions

model = FeedforwardNeuralNetworkModel(input_dim, hidden_dim, output_dim)

###  Instantiate loss class
**Cross Entropy Loss:** `nn.CrossEntropyLoss()`, computes softmax(logistic/softmax function) and then computes Cross Entropy Loss

In [7]:
criterion=nn.CrossEntropyLoss()

###  Instantiate optimizer class

- Simplified equation for updating parameters is $\theta=\theta-\eta\cdot\nabla_\theta$, where $\theta$ is parameters (our variables), $\eta$ is learning rate (how fast we want the model to learn), and $\nabla_\theta$ is parameters' gradients.

- At every iteration, we update our model's parameters.

In [8]:
learning_rate = 0.1

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

In [9]:
print(model.parameters())
print(len(list(model.parameters())))

# FC 1 parameters
print(list(model.parameters())[0].size())

# FC 1 Bias parameters
print(list(model.parameters())[1].size())

# FC 2 parameters
print(list(model.parameters())[2].size())

# FC 2 Bias parameters
print(list(model.parameters())[3].size())

<generator object Module.parameters at 0x000001FA066B1510>
4
torch.Size([100, 784])
torch.Size([100])
torch.Size([10, 100])
torch.Size([10])


![alt text](nn-parameters.png)

### Train the model

In [10]:
iter = 0 # n_iter is from 0 to 3000
for epoch in range(num_epochs): #num_epochs is 5
    # one iteration = 
    #{
    for i, (images, labels) in enumerate(train_loader):
        # load images as variables
        images = Variable(images.view(-1, 28*28)) # number of images in each iteration is equal to batch_size=100
        labels = Variable(labels)
        
        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        # 100x10 
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
    #}
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                images = Variable(images.view(-1, 28*28))
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                #100x1
                _, predicted = torch.max(outputs.data, 1) # torch.size(100, 1) containing largest predictions for each image
                
                # Total number of labels
                total += labels.size(0)
           
                # Total correct predictions
                correct += (predicted == labels).sum()
            
            accuracy = 100 * correct // total
            
            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.data, accuracy))

Iteration: 500. Loss: 0.6608788967132568. Accuracy: 85
Iteration: 1000. Loss: 0.5462762713432312. Accuracy: 89
Iteration: 1500. Loss: 0.4568452537059784. Accuracy: 90
Iteration: 2000. Loss: 0.2747637927532196. Accuracy: 91
Iteration: 2500. Loss: 0.2744704484939575. Accuracy: 91
Iteration: 3000. Loss: 0.2502192258834839. Accuracy: 92


In [11]:
save_model = True
if save_model is True:
    torch.save(model.state_dict(), 'feedforward_neural_network_pytorch_sigmoid_mnist.pkl')# only save parameters

### Model B: 1 Hidden Layer Feedforward Neural Network (Tanh Activation)
Just create model class is different from Model A.

In [12]:
# import libraries
import torch
import torch.nn as nn
from torch.autograd import Variable
import torchvision.transforms as transforms
import torchvision.datasets as datasets

# read MNIST dataset
train_dataset = datasets.MNIST(root='.\data',
                           train=True,
                           transform=transforms.ToTensor(),
                           download=True)

test_dataset = datasets.MNIST(root='.\data', 
                           train=False, 
                           transform=transforms.ToTensor())

# make dataset iterable
batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

# create iterable objects
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                          batch_size=batch_size,
                                          shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

### Create model class

In [13]:
class FeedforwardNeuralNetworkModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(FeedforwardNeuralNetworkModel, self).__init__()
        # linear function
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        # non-linearity
        self.tanh = nn.Tanh()
        # linear function
        self.fc2 = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, x):
        # linear function
        out = self.fc1(x)
        # non-linearity
        out = self.tanh(out)
        #linear function
        out = self.fc2(out)
        return out       
    

In [14]:
#  Instantiate model class
input_dim = 28*28
output_dim = 10
hidden_dim = 100 

model = FeedforwardNeuralNetworkModel(input_dim, hidden_dim, output_dim)

#  Instantiate loss class
criterion=nn.CrossEntropyLoss()

#  Instantiate optimizer class
learning_rate = 0.1
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

In [15]:
# Train the model
iter = 0
for epoch in range(num_epochs):
    # one iteration = 
    #{
    for i, (images, labels) in enumerate(train_loader):
        # load images as variables
        images = Variable(images.view(-1, 28*28)) # number of images in each iteration is equal to batch_size=100
        labels = Variable(labels)
        
        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        # 100x10 
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
    #}
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                images = Variable(images.view(-1, 28*28))
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                #100x1
                _, predicted = torch.max(outputs.data, 1) # torch.size(100, 1) containing largest predictions for each image
                
                # Total number of labels
                total += labels.size(0)
           
                # Total correct predictions
                correct += (predicted == labels).sum()
            
            accuracy = 100 * correct // total
            
            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.data, accuracy))

Iteration: 500. Loss: 0.3384842574596405. Accuracy: 91
Iteration: 1000. Loss: 0.214825838804245. Accuracy: 92
Iteration: 1500. Loss: 0.32922565937042236. Accuracy: 93
Iteration: 2000. Loss: 0.2514549195766449. Accuracy: 94
Iteration: 2500. Loss: 0.18627643585205078. Accuracy: 94
Iteration: 3000. Loss: 0.2388889193534851. Accuracy: 94


In [16]:
save_model = True
if save_model is True:
    torch.save(model.state_dict(), 'feedforward_neural_network_pytorch_tanh_mnist.pkl')# only save parameters

### Model C: 1 Hidden Layer Feedforward Neural Network (ReLU Activation)
Just create model class is different from Models A and B.

In [17]:
# import libraries
import torch
import torch.nn as nn
from torch.autograd import Variable
import torchvision.transforms as transforms
import torchvision.datasets as datasets

# read MNIST dataset
train_dataset = datasets.MNIST(root='.\data',
                           train=True,
                           transform=transforms.ToTensor(),
                           download=True)

test_dataset = datasets.MNIST(root='.\data', 
                           train=False, 
                           transform=transforms.ToTensor())

# make dataset iterable
batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

# create iterable objects
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                          batch_size=batch_size,
                                          shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

### Create model class

In [18]:
class FeedforwardNeuralNetworkModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(FeedforwardNeuralNetworkModel, self).__init__()
        # linear function
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        # non-linearity
        self.relu = nn.ReLU()
        # linear function
        self.fc2 = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, x):
        # linear function
        out = self.fc1(x)
        # non-linearity
        out = self.relu(out)
        #linear function
        out = self.fc2(out)
        return out       
    

In [19]:
#  Instantiate model class
input_dim = 28*28
output_dim = 10
hidden_dim = 100 

model = FeedforwardNeuralNetworkModel(input_dim, hidden_dim, output_dim)

#  Instantiate loss class
criterion=nn.CrossEntropyLoss()

#  Instantiate optimizer class
learning_rate = 0.1
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

In [20]:
# Train the model

iter = 0
for epoch in range(num_epochs):
    # one iteration = 
    #{
    for i, (images, labels) in enumerate(train_loader):
        # load images as variables
        images = Variable(images.view(-1, 28*28)) # number of images in each iteration is equal to batch_size=100
        labels = Variable(labels)
        
        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        # 100x10 
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
    #}
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                images = Variable(images.view(-1, 28*28))
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                #100x1
                _, predicted = torch.max(outputs.data, 1) # torch.size(100, 1) containing largest predictions for each image
                
                # Total number of labels
                total += labels.size(0)
           
                # Total correct predictions
                correct += (predicted == labels).sum()
            
            accuracy = 100 * correct // total
            
            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.data, accuracy))

Iteration: 500. Loss: 0.2624063789844513. Accuracy: 91
Iteration: 1000. Loss: 0.3202899992465973. Accuracy: 93
Iteration: 1500. Loss: 0.18649150431156158. Accuracy: 93
Iteration: 2000. Loss: 0.3143783211708069. Accuracy: 94
Iteration: 2500. Loss: 0.11973336338996887. Accuracy: 95
Iteration: 3000. Loss: 0.19717523455619812. Accuracy: 95


In [21]:
save_model = True
if save_model is True:
    torch.save(model.state_dict(), 'feedforward_neural_network_pytorch_relu_mnist.pkl')# only save parameters

### Model D: 2 Hidden Layers Feedforward Neural Network (ReLU Activation)

![alt text](2-hidden-relu.png)

In [22]:
# import libraries
import torch
import torch.nn as nn
from torch.autograd import Variable
import torchvision.transforms as transforms
import torchvision.datasets as datasets

# read MNIST dataset
train_dataset = datasets.MNIST(root='.\data',
                           train=True,
                           transform=transforms.ToTensor(),
                           download=True)

test_dataset = datasets.MNIST(root='.\data', 
                           train=False, 
                           transform=transforms.ToTensor())

# make dataset iterable
batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

# create iterable objects
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                          batch_size=batch_size,
                                          shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

### Create model class

In [23]:
class FeedforwardNeuralNetworkModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(FeedforwardNeuralNetworkModel, self).__init__()
        # linear function 1: 784->100
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        # non-linearity 1
        self.relu1 = nn.ReLU()
        
        # linear function 2: 100->100
        self.fc2 = nn.Linear(hidden_dim, hidden_dim)
        # non-linearity 2
        self.relu2 = nn.ReLU()
        
        # linear function (readout) 100->10
        self.fc3 = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, x):
        # linear function 1
        out = self.fc1(x)
        # non-linearity 1
        out = self.relu1(out)
        
        # linear function 2
        out = self.fc2(out)
        # non-linearity 2
        out = self.relu2(out)
        
        #linear function 3 (readout)
        out = self.fc3(out)
        return out       
    

In [24]:
#  Instantiate model class
input_dim = 28*28
output_dim = 10
hidden_dim = 100 

model = FeedforwardNeuralNetworkModel(input_dim, hidden_dim, output_dim)

#  Instantiate loss class
criterion=nn.CrossEntropyLoss()

#  Instantiate optimizer class
learning_rate = 0.1
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

In [25]:
# Train the model

iter = 0
for epoch in range(num_epochs):
    # one iteration = 
    #{
    for i, (images, labels) in enumerate(train_loader):
        # load images as variables
        images = Variable(images.view(-1, 28*28)) # number of images in each iteration is equal to batch_size=100
        labels = Variable(labels)
        
        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        # 100x10 
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
    #}
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                images = Variable(images.view(-1, 28*28))
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                #100x1
                _, predicted = torch.max(outputs.data, 1) # torch.size(100, 1) containing largest predictions for each image
                
                # Total number of labels
                total += labels.size(0)
           
                # Total correct predictions
                correct += (predicted == labels).sum()
            
            accuracy = 100 * correct // total
            
            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.data, accuracy))

Iteration: 500. Loss: 0.24841484427452087. Accuracy: 90
Iteration: 1000. Loss: 0.3699096739292145. Accuracy: 93
Iteration: 1500. Loss: 0.08417633920907974. Accuracy: 94
Iteration: 2000. Loss: 0.2003040760755539. Accuracy: 95
Iteration: 2500. Loss: 0.10492083430290222. Accuracy: 95
Iteration: 3000. Loss: 0.13262341916561127. Accuracy: 96


In [26]:
save_model = True
if save_model is True:
    torch.save(model.state_dict(), 'feedforward_neural_network_pytorch_relu2_mnist.pkl')# only save parameters

### Model E: 3 Hidden Layers Feedforward Neural Network (ReLU Activation)

![alt text](3-hidden-relu.png)

In [27]:
# import libraries
import torch
import torch.nn as nn
from torch.autograd import Variable
import torchvision.transforms as transforms
import torchvision.datasets as datasets

# read MNIST dataset
train_dataset = datasets.MNIST(root='.\data',
                           train=True,
                           transform=transforms.ToTensor(),
                           download=True)

test_dataset = datasets.MNIST(root='.\data', 
                           train=False, 
                           transform=transforms.ToTensor())

# make dataset iterable
batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

# create iterable objects
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                          batch_size=batch_size,
                                          shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

### Create model class

In [28]:
class FeedforwardNeuralNetworkModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(FeedforwardNeuralNetworkModel, self).__init__()
        # linear function 1: 784->100
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        # non-linearity 1
        self.relu1 = nn.ReLU()
        
        # linear function 2: 100->100
        self.fc2 = nn.Linear(hidden_dim, hidden_dim)
        # non-linearity 2
        self.relu2 = nn.ReLU()
        
        # linear function 3: 100->100
        self.fc3 = nn.Linear(hidden_dim, hidden_dim)
        # non-linearity 3
        self.relu3 = nn.ReLU()
        
        # linear function 4 (readout) 100->10
        self.fc4 = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, x):
        # linear function 1
        out = self.fc1(x)
        # non-linearity 1
        out = self.relu1(out)
        
        # linear function 2
        out = self.fc2(out)
        # non-linearity 2
        out = self.relu2(out)
        
        # linear function 3
        out = self.fc3(out)
        # non-linearity 3
        out = self.relu3(out)
        
        #linear function 4 (readout)
        out = self.fc4(out)
        return out       
    

In [29]:
#  Instantiate model class
input_dim = 28*28
output_dim = 10
hidden_dim = 100 

model = FeedforwardNeuralNetworkModel(input_dim, hidden_dim, output_dim)

#  Instantiate loss class
criterion=nn.CrossEntropyLoss()

#  Instantiate optimizer class
learning_rate = 0.1
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

In [30]:
# Train the model

iter = 0
for epoch in range(num_epochs):
    # one iteration = 
    #{
    for i, (images, labels) in enumerate(train_loader):
        # load images as variables
        images = Variable(images.view(-1, 28*28)) # number of images in each iteration is equal to batch_size=100
        labels = Variable(labels)
        
        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        # 100x10 
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
    #}
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                images = Variable(images.view(-1, 28*28))
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                #100x1
                _, predicted = torch.max(outputs.data, 1) # torch.size(100, 1) containing largest predictions for each image
                
                # Total number of labels
                total += labels.size(0)
           
                # Total correct predictions
                correct += (predicted == labels).sum()
            
            accuracy = 100 * correct // total
            
            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.data, accuracy))

Iteration: 500. Loss: 0.20820769667625427. Accuracy: 91
Iteration: 1000. Loss: 0.27557432651519775. Accuracy: 93
Iteration: 1500. Loss: 0.15087616443634033. Accuracy: 95
Iteration: 2000. Loss: 0.16060371696949005. Accuracy: 95
Iteration: 2500. Loss: 0.03184164687991142. Accuracy: 96
Iteration: 3000. Loss: 0.04542054980993271. Accuracy: 96


In [31]:
save_model = True
if save_model is True:
    torch.save(model.state_dict(), 'feedforward_neural_network_pytorch_relu3_mnist.pkl')# only save parameters

## Deep learning
**2 ways to expand a neural network**
   - More non-linear activation units (hidden dimension) (neurons)
   - More hidden layers
    
**Cons**
   - Need a larger dataset (curse of dimensionality)
   - Does not necessarily mean higher accuracy

## Building a Feedforward Neural Network Model with PyTorch (On GPU)

### 3 Hidden Layers Feedforward Neural Network (ReLU Activation)

In [32]:
# import libraries
import torch
import torch.nn as nn
from torch.autograd import Variable
import torchvision.transforms as transforms
import torchvision.datasets as datasets

# read MNIST dataset
train_dataset = datasets.MNIST(root='.\data',
                           train=True,
                           transform=transforms.ToTensor(),
                           download=True)

test_dataset = datasets.MNIST(root='.\data', 
                           train=False, 
                           transform=transforms.ToTensor())

# make dataset iterable
batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

# create iterable objects
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                          batch_size=batch_size,
                                          shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

### Create model class

In [33]:
class FeedforwardNeuralNetworkModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(FeedforwardNeuralNetworkModel, self).__init__()
        # linear function 1: 784->100
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        # non-linearity 1
        self.relu1 = nn.ReLU()
        
        # linear function 2: 100->100
        self.fc2 = nn.Linear(hidden_dim, hidden_dim)
        # non-linearity 2
        self.relu2 = nn.ReLU()
        
        # linear function 3: 100->100
        self.fc3 = nn.Linear(hidden_dim, hidden_dim)
        # non-linearity 3
        self.relu3 = nn.ReLU()
        
        # linear function 4 (readout) 100->10
        self.fc4 = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, x):
        # linear function 1
        out = self.fc1(x)
        # non-linearity 1
        out = self.relu1(out)
        
        # linear function 2
        out = self.fc2(out)
        # non-linearity 2
        out = self.relu2(out)
        
        # linear function 3
        out = self.fc3(out)
        # non-linearity 3
        out = self.relu3(out)
        
        #linear function 4 (readout)
        out = self.fc4(out)
        return out       
    

In [34]:
#  Instantiate model class
input_dim = 28*28
output_dim = 10
hidden_dim = 100 

model = FeedforwardNeuralNetworkModel(input_dim, hidden_dim, output_dim)

#######################
#  USE GPU FOR MODEL  #
#######################
if torch.cuda.is_available():
    model.cuda()
    

In [35]:
#  Instantiate loss class
criterion=nn.CrossEntropyLoss()

#  Instantiate optimizer class
learning_rate = 0.1
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

In [36]:
# Train the model

iter = 0
for epoch in range(num_epochs):
    # one iteration = 
    #{
    for i, (images, labels) in enumerate(train_loader):
        #######################
        #  USE GPU FOR MODEL  #
        #######################
        if torch.cuda.is_available():
            images = Variable(images.view(-1, 28*28).cuda())
            labels = Variable(labels.cuda())
        else:
            images = Variable(images.view(-1, 28*28))
            labels = Variable(labels)
        
        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        # 100x10 
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
    #}
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                #######################
                #  USE GPU FOR MODEL  #
                #######################
                if torch.cuda.is_available():
                    images = Variable(images.view(-1, 28*28).cuda())
                else:
                    images = Variable(images.view(-1, 28*28))    
                    
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                #100x1
                _, predicted = torch.max(outputs.data, 1) # torch.size(100, 1) containing largest predictions for each image
                
                # Total number of labels
                total += labels.size(0)
           
                #######################
                #  USE GPU FOR MODEL  #
                #######################
                # Total correct predictions
                if torch.cuda.is_available():
                    correct += (predicted.cpu() == labels.cpu()).sum() # because .sum() function does not work on GPU
                else:                
                    correct += (predicted == labels).sum()
            
            accuracy = 100 * correct // total
            
            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.data, accuracy))

Iteration: 500. Loss: 0.44783785939216614. Accuracy: 87
Iteration: 1000. Loss: 0.2271917313337326. Accuracy: 94
Iteration: 1500. Loss: 0.0881962701678276. Accuracy: 95
Iteration: 2000. Loss: 0.130946084856987. Accuracy: 96
Iteration: 2500. Loss: 0.057556651532649994. Accuracy: 96
Iteration: 3000. Loss: 0.12545844912528992. Accuracy: 96
