# Introduction to `TORCH.AUTOGRAD`
`torch.autograd` is pytorch's automatic differntiation engine that powes neural network training.

## Background
Neural networks (NNs) are a collention of nested functions that are executed on some input data. These functions are defined by parameters (consisting of `weights` and `biases`)\
Training a NN happens in two steps:\
\
**Forward Propagation** In foward prop, the NN makes its best guess about the correct output. It runs the input data through each of its function to make the guess\
\
**Backward Propagation** In back prop, the NN adjusts it parameters propotionate to the erro in its guess. It does this by traversing backwards from the output, collecting the derivatives of the error with respect to the parameters of the function, and optimizing the parameters using `gradient descent`.

In [5]:
import torch, torchvision
model = torchvision.models.resnet18(pretrained=True)
data = torch.rand(1, 3, 64, 64)
labels = torch.rand(1, 1000)

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /Users/theaveasso/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
52.5%IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)

100.0%


In [15]:
# run the input data through the model through each of its layer to make prediction
prediction = model(data)

In [16]:
# use model's prediction and the corresponding label to calc error.
loss = (prediction - labels).sum()
print(loss)

# back prop this loss through the network
# autograd then calculates and stores the gradient for each model parameters (wandb)
loss.backward()

tensor(-495.9272, grad_fn=<SumBackward0>)


In [17]:
# load an SGD optimizer, 
optim = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
print(optim)

SGD (
Parameter Group 0
    dampening: 0
    lr: 0.1
    momentum: 0.9
    nesterov: False
    weight_decay: 0
)


In [21]:
# step() initiate gradient descent
optim.step()