<a href="https://colab.research.google.com/github/viv-bad/pytorch-course/blob/master/01_pytorch_workflow_video.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## PyTorch Workflow

Let's explore an example PyTorch end to end workflow

Resources:
* Ground truth notebook: https://www.learnpytorch.io/01_pytorch_workflow/
* Book version of notebook: https://www.learnpytorch.io/01_pytorch_workflow/
* Ask a question: Github

In [None]:
what_were_covering = {1: "data (prepare and load)", 2: "build model", 3: "fitting the model to data (training)", 4: "making predictions and evaluating a model (inference)", 5: "saving and loading a model", 6: "putting it all together"}
what_were_covering

In [None]:
import torch
from torch import nn # nn contains all of PyTorch's building blocks for neural networks
import matplotlib.pyplot as plt

# Check PyTorch version
torch.__version__

## 1. Data (preparing and loading)

Data can be almost anything in ML:

* Excel spreadsheet
* Images
* Videos
* Audio
* DNA
* Text

Machine learning is a game of two parts:
1. Get data into numerical representation (in tensors)
2. Build a model to learn patterns in that numerical representation

To showcase this, let's create some *known* data using the linear regression formula

We'll use a linear regression formula to make a straight line with *known* **parameters**. A parameter is something that a model learns.





In [None]:
#Create *known* parameters

weight = 0.7 # m in y = mx + b
bias = 0.3 # c in y = mx + b

# Create

start = 0
end = 1
step = 0.02

X = torch.arange(start, end, step).unsqueeze(dim = 1) #X is a matrix or tensor, and a capiutal X represents a tensor, lower case = vector, input numbers
y = weight * X + bias # output numbers

X[:10], y[:10], len(X), len(y) # learn the representation of the input and how it maps to the output



### Splitting data into training and test sets (very important in machine learning)

* Training - 60 - 80%
* Validation - 10 - 20% - often used but not always
* Test - 10 - 20%

Let's create a training and test set with our data.



In [None]:
# Create a train/test split
train_split = int(0.8 * len(X))
train_split # these samples are used to train the model, and we will then use the rest to test

X_train, y_train = X[:train_split], y[:train_split] #index up until the train split index
X_test, y_test = X[train_split:], y[train_split:]

len(X_train), len(y_train), len(X_test), len(y_test)

How might we better visualise our data?


In [None]:
def plot_predictions(train_data=X_train, train_labels=y_train, test_data=X_test, test_labels=y_test, predictions=None):
  """
  Plots training data, test data and compares predictions.
  """

  plt.figure(figsize=(10,7))

  #Plot training data in blue
  plt.scatter(train_data, train_labels, c="b", s=4, label="Training data")

  # Plot test data in green
  plt.scatter(test_data, test_labels, c="g", s=4, label="Testing data")

  # Are there predictions?
  if predictions is not None:
    # plot predictions if exist
    plt.scatter(test_data, predictions, c="r", s=4, label="Predictions")

  plt.legend(prop={"size": 14})




In [None]:
plot_predictions()

## 2. Build model

Our first PyTorch Model

What our model does:
* Starts with random values for weight and bias
* Looks at training data and adjusts the random values to better represent (or get closer to) the ideal values (the weight and bias values to used to create the data)

How does it do so?
Through two main algorithms:
1. Gradient descent - set by requires_grad=True
2. Back propogation



Learning Process

The model starts with random or pre-initialized weights and biases
During training, it adjusts these values incrementally to minimize prediction errors
This adjustment happens through algorithms like gradient descent
The goal is to find weight and bias values that make the model fit the training data well


In [None]:
import torch
from torch import nn
# Create linear regression model class

class LinearRegressionModel(nn.Module): #Almost everything in PyTorch inherits from nn.Module
  def __init__(self):
    super().__init__()
    self.weight = nn.Parameter(torch.randn(1, requires_grad=True, dtype=torch.float)) # start with random numbers, look at training data, and update those random numbers to represent the pattern in the training data, same for bias
    self.bias = nn.Parameter(torch.randn(1, requires_grad=True, dtype=torch.float))

    #Forward method to define the computation in the model
  def forward(self, x: torch.Tensor) -> torch.Tensor: #x is the input training data
      return self.weight * x + self.bias # This is the linear regression formula




### PyTorch model building essentials

* torch.nn - contains all building blocks for computation graphs (a neural network can be considered a computational graph)
* torch.nn.Parameter - what paremeters should our model try and learn, often a PyTorch layer from torch.nn will set these for us
* torch.nn.Module - The base class for all neural network modules, if you subclass it, you should overwrite forward()
* torch.optim - this is where the optimisers live which help with gradient descent
* def forward() - All nn.Module subclasses require you to overwrite forward(), this is run to run the computation

Cheat sheet - https://pytorch.org/tutorials/beginner/ptcheat.html

### Checking the contents of our PyTorch model

Now we've created a model let's see what's inside...

We can check out our model parameters or whats inside our model using `.parameters()`

In [None]:
# Create a random seed
torch.manual_seed(42)

# Create an instance of the model (this is a subclass of nn.Module)

model_0 = LinearRegressionModel()

# Check out the parameters
list(model_0.parameters())

In [None]:
# List named parameters - we want to write code now, that will get these values closer to the set weight and bias from earlier (0.7, 0.3) using the data, gradient descent and back propogration.
# Usually you won't know what the ideal values for weight and bias are. we do here as a test
model_0.state_dict()

### Making predictions using `torch.inference_mode()`

To check our model's predictive power, let's see how well it predicts `y_test` based on `X_test`.

When we pass data through our model, it's going to run it through the `forward` method.


In [None]:
# Make  predictions with model
with torch.inference_mode(): #during inference, we dont do training, so we don't need gradient descent, so it is turned off here, this means pytorch behind the scenes is keeping track of less data, gives faster predictions
  y_preds = model_0(X_test) #pass X_test data through model forward method

y_preds

In [None]:
plot_predictions(predictions=y_preds) # because we are initialized with random weight and bias, the initial prediction is bad, and this is where we now write some code to get these predictions closer

### 3. Train model

The whole idea of training is for a model to move from some *unknown* parameters (these may be random), to some known parameters.

From a poor representation of the data to a better representation of the data.

One way to measure how poorly our model's prediction is, is to use a loss function.

Note: Loss function may also be called cost function or criterion in different areas. For our case, we're going to refer to it as a loss function.

* **Loss function: ** A function to measure how wrong your models predictions are to the ideal outputs. Lower is better.

Mean absolute error, mean squared error

* **Optimiser: ** Takes into account the loss of the model and adjusts the model's parameters (e.g. weight and bias, in our case) to improve the loss function

And specifically for PyTorch we need:

* A training loop
* A testing loop



In [None]:
# Check out our model's parameters (a parameter is a value that the model sets itself)
model_0.state_dict()

In [None]:
# Set up a loss function
loss_fn = nn.L1Loss()

# Set up an optimiser (stochastic gradient descent) - increases or decreases the weight and bias to reduce loss
optimizer = torch.optim.SGD(model_0.parameters(), lr=0.01) #lr = learning rate, very important hyperparameter (value that we set ourselves), the higher the lr, the more it adjusts the parameters in one hit


### Building a training loop (and testing loop) in Pytorch

A couple things we need in a training loop:

0. Loop through the data
1. Forward pass (this involves data moving through our model's `forward()` functions) to make predictions on data - also called forward propogration
2. Calculate the loss (compare forward pass predictions to ground truth labels)
3. Optimizer zero grad
4. Loss backward - move backwards through the network to calculate the gradients of each of the parameters of our model with respect to the loss (**backpropogration**)
5. Optimizer step - use the optimizer to adjust out model's parameters to try to improve the loss (**gradient descent**).



In [None]:
list(model_0.parameters())

In [None]:
torch.manual_seed(42)

# An epoch is one loop through the data... (hyperparameter)
epochs = 200

epoch_count = []
loss_values = []
test_loss_values = []


### Training
# 0. Loop through the data
for epoch in range(epochs):
  # Set the model to training mode
  model_0.train() # train mode in PyTorch sets all parameters that require gradients to require gradients

  # 1. Forward pass
  y_pred = model_0.forward(X_train) # learn patterns on training data to then eval model on test data

  # 2. Calculate the loss - MAE - difference between models predictions on training data set and ideal training values
  loss = loss_fn(y_pred, y_train)
  print(f"loss: {loss}")

  # 3. Optimizer zero grad
  optimizer.zero_grad()

  # 4. Perform backpropogation on the loss with respect to the parameters of the model
  loss.backward()

  # 5. Step the optimizer (perform the gradient descent)
  optimizer.step() # by default, how the optimizer changes will accumulate through the loop, so we have to zero them above in step 3 for the next iteration of the loop

  ### Testing - this turns off different settings in the model not needed for evaluation/testing (dropout/batch norm layers)

  model_0.eval() # turns off gradient tracking
  with torch.inference_mode(): #turns off gradient tracking and more behind the scenes - don't need to do learning here when testing
    # 1. do forward pass
    test_pred = model_0(X_test) # now that the model has been optimised/trained above, we now want to test/evaluate the model with the test data, to get test predictions and test loss
    # 2. calculate the loss with test data
    test_loss = loss_fn(test_pred, y_test)


  if epoch % 10 == 0:
    epoch_count.append(epoch)
    loss_values.append(loss)
    test_loss_values.append(test_loss)
    print(f"Epoch:  {epoch} | Loss: {loss} | test loss: {test_loss}")



    # print out model state_dict
    print(model_0.state_dict())


In [None]:
# Plot the loss curves

epoch_count, loss_values, test_loss_values




In [None]:
import numpy as np
plt.plot(epoch_count, np.array(torch.tensor(loss_values).numpy()), label="Train loss")
plt.plot(epoch_count, test_loss_values, label="Test loss")
plt.title("Training and test loss curves")
plt.ylabel("Loss")
plt.xlabel("Epochs")
plt.legend()

In [None]:
with torch.inference_mode():
  y_preds_new = model_0(X_test)

In [None]:
plot_predictions(predictions=y_preds)

In [None]:
plot_predictions(predictions=y_preds_new)

### Saving a model in PyTorch

There are three main methods you should know about for saving and loading models in PyTorch:

1. `torch.save()` - allows you to save a PyTorch object in Python's pickle format.
2. `torch.load()` - allows you to load a saved PyTorch object.
3. `torch.nn.Module.load_state_dict()` - this allows you to load a model's saves state dictionary.


In [None]:
model_0.state_dict() # model params are stored here in a Python dictionary. Here we only have two params, but in the future you could be working with a model with millions of parameters.

In [None]:
from pathlib import Path
# 1. Create models directory
MODEL_PATH = Path("models")
MODEL_PATH.mkdir(parents=True, exist_ok=True)

# 2. Create a model save path
MODEL_NAME = "01_pytorch_orkflow_model_0.pth" #pytorch objects usually saved in .pth or .pt
MODEL_SAVE_PATH = MODEL_PATH / MODEL_NAME
MODEL_SAVE_PATH

# 3. Save model state_dict()
print(f"Saving model to {MODEL_SAVE_PATH}")
torch.save(obj=model_0.state_dict(), f=MODEL_SAVE_PATH)

In [None]:
## Loading a PyTorch model, this creates a new instance of the model class and loads the state dict into that
loaded_model_0 = LinearRegressionModel()

# Load saves state_dict
print(f"Loading model from {MODEL_SAVE_PATH}")
loaded_model_0.load_state_dict(torch.load(f=MODEL_SAVE_PATH))

In [None]:
loaded_model_0.state_dict()

In [None]:
model_0.eval()
with torch.inference_mode():
  y_preds = model_0(X_test)
y_preds

In [None]:
# Make some predictions with the loaded model
loaded_model_0.eval()
with torch.inference_mode():
  loaded_model_preds = loaded_model_0(X_test)
loaded_model_preds

In [None]:
y_preds == loaded_model_preds

## 6. Putting it all together

Let's go back through the steps above and see it all in one place.

In [None]:
import torch
from torch import nn
import matplotlib.pyplot as plt

# Check PyTorch version
torch.__version__

Create device agnostic code to switch between CPU and GPU

In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"

print(f"Using device: {device}")

### 6.1 Data


In [None]:
# Create some data using the linear regression formula y = mx + c, or y = weight * X + bias
weight = 0.7
bias = 0.3

# Create range values
start = 0
end = 1
step = 0.02

# Create X and y (features and labels)
X = torch.arange(start, end, step).unsqueeze(dim=1) # unsqueeze to add another dimension
y = weight * X + bias
X[:10], y[:10]

In [None]:
# Split data
train_split = int(0.8 * len(X))
X_train, y_train = X[:train_split], y[:train_split]

X_test, y_test = X[train_split:], y[train_split:]
len(X_train), len(y_train), len(X_test), len(y_test)

In [None]:
# Plot the data
plot_predictions(X_train, y_train, X_test, y_test)

## 6.2 Building a PyTorch linear model

In [None]:
class LinearRegressionModelV2(nn.Module):
  def __init__(self):
    super().__init__()
    # self.weight = nn.Parameter(torch.randn(1, requires_grad=True, dtype=torch.float))
    # self.bias = nn.Parameter(torch.randn(1, requires_grad=True, dtype=torch.float))
    # Use nn.Linear() for creating the model parameters instead this time
    self.linear_layer = nn.Linear(in_features=1, out_features=1) # input and output features - highly dependent on the data shape you work with in X_train and y_train

  def forward(self, x: torch.Tensor) -> torch.Tensor:
    return self.linear_layer(x)

torch.manual_seed(42)
model_1 = LinearRegressionModelV2()
model_1, model_1.state_dict()

In [None]:
model_1.to(device)
next(model_1.parameters()).device

### 6.1 Training
For training we need:

* Loss function
* Optimizer
* Training loop
* Testing loop


In [None]:
# Set up loss function
loss_fn = nn.L1Loss() # same as MAE

#Optimizer

optimizer = torch.optim.SGD(params = model_1.parameters(),lr = 0.01)



In [None]:
torch.manual_seed(42)

epochs = 200

for epoch in range(epochs):
  model_1.train()

  # Forward pass
  y_pred = model_1(X_train) # data will go through linear layer

  # loss
  loss = loss_fn(y_pred, y_train)

  # optimizer zero grad

  optimizer.zero_grad()

  #Backpropogration
  loss.backward()

  # optimizer step
  optimizer.step()

  ### Testing
  model_1.eval()
  with torch.inference_mode():
    test_pred = model_1(X_test)
    test_loss= loss_fn(test_pred, y_test)

  # print
  if epoch % 10 == 0:
    print(f"Epoch: {epoch} | Loss: {loss} | test loss: {test_loss}")

In [None]:
model_1.state_dict()

### 6.4 Making and evaluating predictions

In [None]:
model_1.eval()
with torch.inference_mode():
  y_preds = model_1(X_test)
y_preds

In [None]:
plot_predictions(predictions=y_preds)