Neural net from scratch (no `torch.nn`)
=====================================
Modified from [the original post by Jeremy Howard, `fast.ai <https://www.fast.ai>`_. Thanks to Rachel Thomas and Francisco Ingham.](https://pytorch.org/tutorials/beginner/nn_tutorial.html)

Let's first create a model using nothing but PyTorch tensor operations. We're assuming
you're already familiar with the basics of neural networks. (If you're not, you can
learn them at `course.fast.ai <https://course.fast.ai>`_).

PyTorch provides methods to create random or zero-filled tensors, which we will
use to create our weights and bias for a simple linear model. These are just regular
tensors, with one very special addition: we tell PyTorch that they require a
gradient. This causes PyTorch to record all of the operations done on the tensor,
so that it can calculate the gradient during back-propagation *automatically*!

For the weights, we set ``requires_grad`` **after** the initialization, since we
don't want that step included in the gradient. (Note that a trailing ``_`` in
PyTorch signifies that the operation is performed in-place.)

<div class="alert alert-info"><h4>Note</h4><p>We are initializing the weights here with
   `Xavier initialisation <http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf>`_
   (by multiplying with 1/sqrt(n)).</p></div>



In [7]:
import math
import torch
from functions import describe

weights = torch.randn(784, 10) / math.sqrt(784)
weights.requires_grad_()
bias = torch.zeros(10, requires_grad=True)


In [8]:
describe(weights)
describe(bias)

Type:torch.FloatTensor
Shape/size:torch.Size([784, 10])
Values: 
tensor([[ 0.0259,  0.0505, -0.0084,  ...,  0.0241,  0.0283, -0.0094],
        [ 0.0239,  0.0579, -0.0445,  ...,  0.0226, -0.0054,  0.0286],
        [-0.0380, -0.0031,  0.0112,  ..., -0.0462, -0.0563,  0.0191],
        ...,
        [-0.0662,  0.0090,  0.0195,  ...,  0.0300,  0.0177, -0.0577],
        [ 0.0218,  0.0199, -0.0001,  ..., -0.0380,  0.0245, -0.0410],
        [-0.0257,  0.0007,  0.0277,  ..., -0.0483,  0.0097,  0.0151]],
       requires_grad=True)
Type:torch.FloatTensor
Shape/size:torch.Size([10])
Values: 
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], requires_grad=True)


Thanks to PyTorch's ability to calculate gradients automatically, we can
use any standard Python function (or callable object) as a model! So
let's just write a plain matrix multiplication and broadcasted addition
to create a simple linear model. We also need an activation function, so
we'll write `log_softmax` and use it. Remember: although PyTorch
provides lots of pre-written loss functions, activation functions, and
so forth, you can easily write your own using plain python. PyTorch will
even create fast GPU or vectorized CPU code for your function
automatically.



## log_softmax

$$LogSoftmax(x_i) = \log (\frac{e^{x_i}}{\sum_{j}e^{x_j}}) = \log e^{x_i} - log \sum_{j} e^{x_j} = x_i - log \sum_{j} e^{x_j} $$

<img src="../_images/negative_log_likelihood.jpeg" width="80%">

<br/>

In [9]:
def softmax(x):
    return x.exp()/x.exp().sum()

x = torch.tensor(([5, 4, 2],
                  [4, 2, 8],
                  [4, 4, 1]))
describe(x)  
print(torch.split(x,1))
softmax(torch.split(x,1))  

Type:torch.LongTensor
Shape/size:torch.Size([3, 3])
Values: 
tensor([[5, 4, 2],
        [4, 2, 8],
        [4, 4, 1]])
(tensor([[5, 4, 2]]), tensor([[4, 2, 8]]), tensor([[4, 4, 1]]))


AttributeError: 'tuple' object has no attribute 'exp'

In [None]:
def log_softmax(x):
    return x - x.exp().sum(-1).log().unsqueeze(-1)

def model(xb):
    return log_softmax(xb @ weights + bias)

In the above, the ``@`` stands for the dot product operation. We will call
our function on one batch of data (in this case, 64 images).  This is
one *forward pass*.  Note that our predictions won't be any better than
random at this stage, since we start with random weights.



In [None]:
from mnist_data_setup import mnist_dataloader
x_train, y_train = mnist_dataloader()
print(x_train, y_train)
print(x_train.shape)
print(y_train.min(), y_train.max())

In [None]:
bs = 64  # batch size

xb = x_train[0:bs]  # a mini-batch from x
preds = model(xb)  # predictions
preds[0], preds.shape
print(preds[0], preds.shape)

As you see, the ``preds`` tensor contains not only the tensor values, but also a
gradient function. We'll use this later to do backprop.

Let's implement negative log-likelihood to use as the loss function
(again, we can just use standard Python):



In [None]:
def nll(input, target):
    return -input[range(target.shape[0]), target].mean()

loss_func = nll

Let's check our loss with our random model, so we can see if we improve
after a backprop pass later.



In [None]:
yb = y_train[0:bs]
print(loss_func(preds, yb))

Let's also implement a function to calculate the accuracy of our model.
For each prediction, if the index with the largest value matches the
target value, then the prediction was correct.



In [None]:
def accuracy(out, yb):
    preds = torch.argmax(out, dim=1)
    return (preds == yb).float().mean()

Let's check the accuracy of our random model, so we can see if our
accuracy improves as our loss improves.



In [None]:
print(accuracy(preds, yb))

We can now run a training loop.  For each iteration, we will:

- select a mini-batch of data (of size ``bs``)
- use the model to make predictions
- calculate the loss
- ``loss.backward()`` updates the gradients of the model, in this case, ``weights``
  and ``bias``.

We now use these gradients to update the weights and bias.  We do this
within the ``torch.no_grad()`` context manager, because we do not want these
actions to be recorded for our next calculation of the gradient.  You can read
more about how PyTorch's Autograd records operations
`here <https://pytorch.org/docs/stable/notes/autograd.html>`_.

We then set the
gradients to zero, so that we are ready for the next loop.
Otherwise, our gradients would record a running tally of all the operations
that had happened (i.e. ``loss.backward()`` *adds* the gradients to whatever is
already stored, rather than replacing them).

.. tip:: You can use the standard python debugger to step through PyTorch
   code, allowing you to check the various variable values at each step.
   Uncomment ``set_trace()`` below to try it out.




In [None]:
from IPython.core.debugger import set_trace

lr = 0.5  # learning rate
epochs = 2  # how many epochs to train for
n = x_train.shape[0] # the training data size

for epoch in range(epochs):
    for i in range((n - 1) // bs + 1):
        # set_trace()
        start_i = i * bs
        end_i = start_i + bs
        xb = x_train[start_i:end_i]
        yb = y_train[start_i:end_i]
        pred = model(xb)
        loss = loss_func(pred, yb)

        loss.backward()
        with torch.no_grad():
            weights -= weights.grad * lr
            bias -= bias.grad * lr
            weights.grad.zero_()
            bias.grad.zero_()

That's it: we've created and trained a minimal neural network (in this case, a
logistic regression, since we have no hidden layers) entirely from scratch!

Let's check the loss and accuracy and compare those to what we got
earlier. We expect that the loss will have decreased and accuracy to
have increased, and they have.



In [None]:
print(loss_func(model(xb), yb), accuracy(model(xb), yb))