### PyTorch Basics:

In [2]:
import torch
import numpy as np

Creating Tensors:

PyTorch is interesting in the sense that the mathematical operations are stored in a Dynamic Computational Graph (DCG). PyTorch runs in a "Symbolic Programming" way. Reading in commands, storing as a DCG. Computation is done when the script is run.

In [3]:
x = torch.ones(3, dtype=float)
y = 5 
w = torch.from_numpy(np.array([1,2,3])).type(torch.DoubleTensor)
w.requires_grad = True
y_hat = torch.dot(w, x) + 3 
Loss = (y-y_hat) ** 2


PyTorch uses the Backpropogation technique to compute the derivatives of a function. Usually the cost/loss function for a model is a sequence of operations applied to the model parameters. These operations are stored in a DCG as mentioned above. To compute the gradient of a cost function (L):

- Set "requires_grad" attribute of the parameter tensor to true. This tells PyTorch that we will be evaluating the derivative of a function at this point. Thus we should track the operations, we will utilise the DCG through chain rule to compute the derivative. 
- Call the "backward" method on the loss function to output the derivative of cost at these model params. 
- The gradient of the loss is stored as the "grad" attribute of the **model parameters**, not the loss function.

The code below shows a demonstration of this, we're expecting the gradient of this to be (8,8,8). 

In [4]:
Loss.backward()
w.grad

tensor([8., 8., 8.], dtype=torch.float64)

## Linear Regression Example:

The below is a simple linear regression model fit to the California Housing Project dataset from Sci-Kit Learn. I demonstrate:

- Tensors Basics
- Implementing Gradient Descent using some of the modules within PyTorch. 

Import Modules:

In [93]:
import pandas as pd
from sklearn import datasets
import torch.nn as nn 
import numpy as np

Note that PyTorch requires data to be floats, in particular float32. 

In [94]:
X_train, y_train = datasets.fetch_california_housing(as_frame=True, return_X_y=True)
# PyTorch functions with floats. 
X_train = X_train.astype(np.float32)
y_train = y_train.astype(np.float32)
print(X_train.shape)
print(y_train.shape)

(20640, 8)
(20640,)


It's well known that normalising data can speed up Gradient Descent. We should also make sure that the dimensions of our training data is consistent: y_train is a 1D numpy array however when feeding data forward through our neural network, the ouput will be 2D where the number of columns is set to 1. 

In [49]:
X_tensor = torch.nn.functional.normalize(torch.from_numpy(X_train.values))
y_tensor = torch.from_numpy(y_train.values)
y_tensor = y_tensor.view(-1, 1)
num_features = X_tensor.shape[1]
num_datapoints = X_tensor.shape[0]

Model parameters are initialised to be random. 

In [96]:
model = nn.Linear(num_features, 1)
model_params = [t.data for t in model.parameters()]
print(model_params)

[tensor([[ 0.0280,  0.1968,  0.1816, -0.3387, -0.3232, -0.0663,  0.2792, -0.3475]]), tensor([-0.1839])]


In [97]:
print(model(X_tensor).shape)

torch.Size([20640, 1])


Here I set some model hyper parameters, again most of the useful functions are built in:

In [99]:
num_iters = 50000
learning_rate = 0.5
loss_func = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

Run Gradient Descent Algorithm:

When running gradient descent, it's important to know that gradients will accumulate if we don't set them to 0. Calling optimizer.zero_grad() sets all gradients to be 0. 

In [100]:
for epoch in range(1, num_iters+1):
    # What do we need to do in each iteration of SGD?:
    # Feed forward, Compute loss, gradient of loss, update model params.

    # Feed Forward:
    predictions = model(X_tensor)

    # Computing Loss
    loss = loss_func(y_tensor, predictions)

    # Call the backward method to compute the gradient. Recall done implicitly as model params are store in the instance. 
    loss.backward()

    # Apply a step of gradient descent
    optimizer.step()

    # Set the gradient of model params to be 0. 
    optimizer.zero_grad()

    # Printing loss every 200 iterations.
    if epoch % 10000 == 0:
        print(f'The loss for Epoch {epoch} is: {loss:.2f}')

The loss for Epoch 10000 is: 1.25
The loss for Epoch 20000 is: 1.20
The loss for Epoch 30000 is: 1.16
The loss for Epoch 40000 is: 1.13
The loss for Epoch 50000 is: 1.10
