# Multilayer perceptrons
In the logistic regression example, the way we performed the transformation was with a fully-connected layer, which consisted of a linear transform (matrix multiply plus a bias). A neural network consisting of multiple successive fully-connected layers is commonly called a Multi-Layer Perceptron (MLP).
![Multilayer perceptron](https://miro.medium.com/max/563/1*4_BDTvgB6WoYVXyxO8lDGA.png)

## Nonlinearities
We typically include nonlinearities between layers of a neural network. There's a number of reasons to do so. For one, without anything nonlinear between them, successive linear transforms (fully connected layers) collapse into a single linear transform, which means the model isn't any more expressive than a single layer. On the other hand, intermediate nonlinearities prevent this collapse, allowing neural networks to approximate more complex functions.

There are a number of nonlinearities commonly used in neural networks, but one of the most popular is the rectified linear unit (ReLU)
![](https://upload.wikimedia.org/wikipedia/commons/thumb/4/42/ReLU_and_GELU.svg/1200px-ReLU_and_GELU.svg.png)

In [1]:
import torch

x = torch.rand(5, 3)*2 - 1
x_relu_max = torch.max(torch.zeros_like(x),x)

print("x: {}".format(x))
print("x after ReLU with max: {}".format(x_relu_max))

x: tensor([[-0.5843,  0.6873,  0.3989],
        [-0.5014,  0.9489,  0.1273],
        [ 0.9962,  0.6230,  0.3723],
        [ 0.9754,  0.1385,  0.9238],
        [ 0.3493, -0.4552, -0.7402]])
x after ReLU with max: tensor([[0.0000, 0.6873, 0.3989],
        [0.0000, 0.9489, 0.1273],
        [0.9962, 0.6230, 0.3723],
        [0.9754, 0.1385, 0.9238],
        [0.3493, 0.0000, 0.0000]])


In [7]:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
from tqdm.notebook import tqdm

class MNIST_Multilayer_Perceptron(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
          nn.Linear(28 * 28 , 128),
          nn.ReLU(),
          nn.Linear(128, 40),
          nn.GELU(),
          nn.Linear(40, 10)
        )

    def forward(self, x):
        return self.layers(x)

# Load the data
mnist_train = datasets.MNIST(root="./datasets", train=True, transform=transforms.ToTensor(), download=True)
mnist_test = datasets.MNIST(root="./datasets", train=False, transform=transforms.ToTensor(), download=True)
train_loader = torch.utils.data.DataLoader(mnist_train, batch_size=100, shuffle=True)
test_loader = torch.utils.data.DataLoader(mnist_test, batch_size=100, shuffle=False)

## Training
# Instantiate model
model = MNIST_Multilayer_Perceptron()

# Loss and Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

# Iterate through train set minibatchs 
for images, labels in tqdm(train_loader):
    # Zero out the gradients
    optimizer.zero_grad()
    
    # Forward pass
    x = images.view(-1, 28*28)
    y = model(x)
    loss = criterion(y, labels)
    # Backward pass
    loss.backward()
    optimizer.step()

## Testing
correct = 0
total = len(mnist_test)

with torch.no_grad():
    # Iterate through test set minibatchs 
    for images, labels in tqdm(test_loader):
        # Forward pass
        x = images.view(-1, 28*28)
        y = model(x)
        
        predictions = torch.argmax(y, dim=1)
        correct += torch.sum((predictions == labels).float())
    
print('Test accuracy: {}'.format(correct/total))

  0%|          | 0/600 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

Test accuracy: 0.9143000245094299
