##### Neural Networks

Neural networks can be constructed using the *torch.nn* package.

Now that you had a glimpse of *autograd*, *nn* depends on *autograd* to define models and differentiate them. An *nn.Module* contains layers, and a method *forward(input)* that returns the *output*.

For example, look at this network that classifies digit images:

<img src = "files/mnist.png">

*convnet*

It is a simple feed-forward network. It takes the input, feeds it through several layers one after the other, and then finally gives the output.

A typical training procedure for a neural network is as follows:

- Define the neural network that has some learnable parameters (or weights)
- Iterate over a dataset of inputs
- Process input through the network
- Compute the loss (how far is the output from being correct)
- Propagate gradients back into the network's parameters
- Update the weights of the network, typically using a simple update rule: *weight = weight - (learning rate * gradient)*

##### Define the Network

Let's define this network:

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    
    def __init__(self):
        
        super(Net, self).__init__()
        
        # 1 input image channel, 6 output channels, 5 * 5 square convolutions
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
        
    def forward(self, x):
        
        # max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        
        # if the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
    
    def num_flat_features(self, x):
        
        size = x.size()[1:]                               # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features
    
net = Net()
print(net)        

Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


You just have to define the *forward* function, and the *backward* function (where gradients are computed) is automatically defined for you using *autograd*. You can use any of the Tensor operations in the *forward* function.

The learnable parameters of a model are returned by *net.parameters()*

In [2]:
params = list(net.parameters())
print(len(params))
print(params[0].size())                     # conv1's .weight

10
torch.Size([6, 1, 5, 5])


Let's try a random 32 by 32 input. Note: Expected input size to this net (LeNet) is 32 by 32. To use this net on MNIST dataset, please resize the images from the dataset to 32 by 32

In [3]:
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)

tensor([[ 0.0493,  0.0562, -0.0642,  0.0336,  0.1542, -0.0809,  0.0267,  0.0579,
          0.0639,  0.0382]], grad_fn=<ThAddmmBackward>)


Zero the gradient buffers of all parameters and backprops with random gradients:

In [4]:
net.zero_grad()
out.backward(torch.randn(1, 10))

##### Note

- torch.nn only supports mini-batches. The entire torch.nn package only supports inputs that are a mini-batch of samples, and not a single sample.
- For example, nn.Conv2d will take in a 4D Tensor of *nSamples * nChannels * Height * Width.
- If you have a single sample, just use *input.unsqueeze(0)* to add a fake batch dimension

Before proceeding further, let's recap all the classes you've seen so far.

##### Recap:

- torch.Tensor - A *multi-dimensional array* with support for autograd operations like backward(). Also *holds the gradient* w.r.t. the tensor
- nn.Module - Neural network module. *Convenient way of encapsulating parameters*, with helpers for moving them to GPU, exporting, loading, etc.
- nn.Parameter - A kind of Tensor, that is *automatically registered as a parameter when assigned as an attribute to* a *Module*
- autograd.Function - Implements *forward and backward definitions of an autograd operation*. Every Tensor operation, creates at least a single Function node, that connects to functions that created a Tensor and encodes its history.

##### At this point, we covered:

- Defining a neural network
- Processing inputs and calling backward

##### Still Left:

- Computing the loss
- Updating the weights of the network