### PyTorch Basics:

In [1]:
import torch
import numpy as np

Creating Tensors:

PyTorch is interesting in the sense that the mathematical operations are stored in a Dynamic Computational Graph (DCG). PyTorch runs in a "Symbolic Programming" way. Reading in commands, storing as a DCG. Computation is done when the script is run.

In [2]:
x = torch.ones(3, dtype=float)
y = 5 
w = torch.from_numpy(np.array([1,2,3])).type(torch.DoubleTensor)
w.requires_grad = True
y_hat = torch.dot(w, x) + 3 
Loss = (y-y_hat) ** 2


PyTorch uses the Backpropogation technique to compute the derivatives of a function. Usually the cost/loss function for a model is a sequence of operations applied to the model parameters. These operations are stored in a DCG as mentioned above. To compute the gradient of a cost function (L):

- Set "requires_grad" attribute of the parameter tensor to true. This tells PyTorch that we will be evaluating the derivative of a function at this point. Thus we should track the operations, we will utilise the DCG through chain rule to compute the derivative. 
- Call the "backward" method on the loss function to output the derivative of cost at these model params. 
- The gradient of the loss is stored as the "grad" attribute of the **model parameters**, not the loss function.

The code below shows a demonstration of this, we're expecting the gradient of this to be (8,8,8). 

In [3]:
Loss.backward()
w.grad

tensor([8., 8., 8.], dtype=torch.float64)