# PyTorch Prerequisites

In [13]:
import torch
import numpy as np

## Tensors

Tensors are the most basic unit of storing manipulating data in PyTorch. These are n-dimensional vectors. 

Creating a tensor from a numpy array.

In [31]:
x = torch.tensor(np.linspace(1, 10, 10))
print(x.shape)
print(x)

torch.Size([10])
tensor([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.], dtype=torch.float64)


Random tensors can be created using the `randn` method. \
An example that generates a tensor that contains 10 batches of data, each with 20 datapoints, each data point with 3 features. (Maybe the x,y,z) values from accelerometer. 

In [36]:
x = torch.randn(10, 20, 3)
print(x.shape)

torch.Size([10, 20, 3])


We can fill a tensor with inital value using `full`

In [46]:
x = torch.full(size=(10, 20, 3), fill_value=1)
print(x.shape)
print(x[-1])

torch.Size([10, 20, 3])
tensor([[1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1]])


## GPU and Devices

Tensors can be moved to the GPU. \
**Note** - This will fail if the machine you are running this on does not have a CUDA enabled GPU.

In [49]:
x = torch.randn((10, 1), device='cuda')
x.device('cuda')

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

You can also move a tensor after it has been created.

In [51]:
x = torch.randn(10, 1)
x.to('cuda')

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

To write code that will work on both CPU and GPU it is often a good practice to initiate a `device` variable

In [59]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

x = torch.randn(10,1)
x = x.to(device)

cpu


## Gradients

At the core of training neural networks is SGD (Stochastic gradient descent). SGD computes partial derivates on the equation created by the loss function with respect to each learnable parameter. This is computed using the chain rule. \
In order to facilitate this, operations on tensors are represented as a DAG (directed acyclic graph) if gradient is enabled for these tensors.

Learn more about automatic differentition using `autograd` here - https://pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html

Let us take a look at this by simulating what neural layer might do. \
A simple neural network hidden layer can be defined using \
x : Input vector \
W : Learnable weights \
B : bias or regularization parameter \
y : the output 

We can then define the operation performed in this simple layers as \
$Y = xW + B$

Because we want the tensors representing $W$ and $B$ to have trainable parameters, we will ask pytorch to track the operations performed on them using a DAG by setting the `require_grad` parameter to `True`.

In [99]:
x = torch.ones(5)
y = torch.zeros(3)

"""
Remember matrix multiplication? x and W are both matrix. 
So if the dimensions of x is (1x5) and the output is (1x3) then the dimension of the weight
matrix W needs to be (5x3)
"""
W = torch.randn(5, 3, requires_grad=True)
B = torch.randn(3, requires_grad=True)

In [100]:
print(W)

tensor([[ 0.0416,  0.3248,  0.7654],
        [ 0.4105,  1.8604, -0.8947],
        [ 1.1354,  2.4903, -1.5443],
        [ 0.8963, -1.6701,  1.2917],
        [ 0.9105, -0.0292, -0.3431]], requires_grad=True)


We can see that W has gradient tracking enabled. Any tensor that uses this in its operation will also inherrit gradient tracking. \
Let us now perform the $xW$ using matrix multiplication.

In [89]:
Y = torch.matmul(x, W)
print(Y)

tensor([ 3.0089, -1.1907,  3.0813], grad_fn=<SqueezeBackward4>)


We can see that this operation has added a function to the tensor. Let us now perform our addition operation.

In [90]:
Y = Y + B
print(Y)

tensor([ 4.0948, -1.1965,  3.9432], grad_fn=<AddBackward0>)


We can see the addition operation has been recorded as function in the DAG as well.

We can define a simple loss function as MSE

In [91]:
loss_fn = torch.nn.MSELoss()

In [92]:
loss = loss_fn(Y, y) # pass the true and predicted values to the loss function.

In [93]:
print(loss)

tensor(11.2495, grad_fn=<MseLossBackward0>)


We can see that the operations performed by the loss function adds another function to the grad.

In [107]:
# The initial gradients are None. We need to do a backward pass to first calculate the gradients.
print(W.grad)
print(x.grad) # This does not error, even though x does not have requires_grad enabled. 

None
None


We can now calculate the derivates with respect to each of the learnable parameters as - 

In [94]:
loss.backward()

We can only calculate the gradients ones. Trying to run this again will give an error.

In [95]:
loss.backward()

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

We can view the gradients now.

In [97]:
print(W.grad)

tensor([[ 2.7299, -0.7977,  2.6288],
        [ 2.7299, -0.7977,  2.6288],
        [ 2.7299, -0.7977,  2.6288],
        [ 2.7299, -0.7977,  2.6288],
        [ 2.7299, -0.7977,  2.6288]])


In [98]:
print(B.grad)

tensor([ 2.7299, -0.7977,  2.6288])


## Linear Layer

A linear layer is one of the most basic layers in PyTorch. \
Documentation - https://pytorch.org/docs/stable/generated/torch.nn.Linear.html

In [118]:
x = torch.randn(1, 10)
linear = torch.nn.Linear(10, 1)
y = linear(x)

In [115]:
print(y)

tensor([[-0.3906, -0.7349, -0.8838, -0.1558,  0.2673]],
       grad_fn=<AddmmBackward0>)


## LSTM Layer

Long short term memory layer. \
Documentation - https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html

In [133]:
x = torch.randn(5, 30, 1)

If we were using this in a deep network 

In [128]:
LSTM = torch.nn.LSTM(1, 16, 1)  
Linear = torch.nn.Linear(16, 1)

In [132]:
out, (hn, cn) = LSTM(x)
Y = Linear(out)
print(out.shape)
print(Y.shape)

torch.Size([5, 30, 16])
torch.Size([5, 30, 1])
