<a href="https://colab.research.google.com/github/xpdlaldam/PyTorch/blob/main/01_pytorch_workflow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Linear Regression

In [None]:
import torch
from torch import nn ## nn contains all of pytorch's building blocks for nerual networks
import matplotlib.pyplot as plt

torch.__version__

In [None]:
# Create known parameters
weight = .7
bias = .3

# Create data & model
start = 0
end = 1
step = .02

## unsqueeze(dim=1)
# adds 1-D i.e., [ => [[
# we need 2-D for modeling
X = torch.arange(start, end, step).unsqueeze(dim=1)
y = weight * X + bias
y

In [None]:
torch.arange(start, end, step)

# Split to train & test

In [None]:
split_ratio = int(.8 * len(X))
X_train, y_train = X[:split_ratio], y[:split_ratio]
X_test, y_test = X[split_ratio:], y[split_ratio:]

len(X_train), len(y_train), len(X_test), len(y_test)

# Plot

In [None]:
def plot_pred(
    train_data=X_train,
    train_labels=y_train,
    test_data=X_test,
    test_labels=y_test,
    pred=None,
):
  """
  Plots training & test data and compares against predictions
  """
  plt.figure(figsize=(10, 7))
  plt.scatter(train_data, train_labels, c="b", s=4, label="Train set") # plots train set
  plt.scatter(test_data, test_labels, c="g", s=4, label="Test set") # plots train set

  # are there predictions?
  # if pred: # this checks if pred is True
  if pred is not None: # this checks only the reference pred with None to see if they are the same
    plt.scatter(test_data, pred, c="r", s=4, label="Predictions")

  # legend
  plt.legend(prop={"size": 14})


In [None]:
plot_pred()

# Build model

In [None]:
from torch import nn

## nn.Module:
  # almost everything in pytorch inherits nn.module
  # subclasses nn.Module which contains all the building blocks for neural networks
## 1: start with a random weight
## requires_grad=True: can this parameter be updated via gradient descent?
## dtype=torch.float: pytorch loves torch.float32
class LinearRegressionModel(nn.Module):
  def __init__(self):
    super().__init__()

    ## Initialize model parameters
    self.weights = nn.Parameter(torch.randn(1, requires_grad=True, dtype=torch.float)) # default: float32

    self.bias = nn.Parameter(torch.randn(1, requires_grad=True, dtype=torch.float))

  # Forward
  def forward(self, x: torch.Tensor) -> torch.Tensor: # x is the input data (tensor)
    return self.weights * x + self.bias

# pytorch model building essentials

* torch.nn - contains all the buildings for computational graphs i.e., a neural network
* torch.nn.Parameter - what parameters should our model try and learn, often a pytorch layer from torch.nn will set these for us
* torch.nn.Module - The base class for all neural network modules, if you subclass it, you should override forward()
* torch.optim - this is where optimizers in pytorch lives which helps with gradient descent i.e., instead of random initialization
* def forward() - all nn.Module subclasses require you to override

In [None]:
torch.randn(1)

In [None]:
torch.manual_seed(42)

lin_reg = LinearRegressionModel()
list(lin_reg.parameters())

In [None]:
# list named parameters
lin_reg.state_dict()

## Predict using `torch.inference_mode()`

In [None]:
X_test

In [None]:
y_test

In [None]:
## * CRUCIAL CONCEPT & TIP: Benefits of using a context manager
# Turns off/disables gradient => because we're only doing inference, we don't need to track gradient
# This context manager becomes useful when we have a much larger dataset
# The prediction will be much faster than w/o using a context manager as it disables unnecessary steps used for training
# lin_reg(X_test) leaves the gradient
with torch.inference_mode():
  y_preds = lin_reg(X_test)

# or
# ctrl + /
# with torch.no_grad():
#   y_preds = lin_reg(X_test)

y_preds

In [None]:
plot_pred(pred=y_preds)

# Train model

* note: loss function = cost function = criterion

Things we need to train:
* **Loss function**:
* **Optimizer**: Takes into account the loss of a model and adjusts the model's parameters (e.g. weight & bias)
* A training loop
* A testing loop

In [None]:
## Setup a loss function
loss_fn = nn.L1Loss() # MAE

## parameter vs hyperparameter
# parameter - the model finds it
# hyperparameter - the data scientists define it

## Setup an optimizer
# params - the model parameters you'd like to optimize
# lr - a hyperparameter that defines the magnitude of change for the optimizer with each step
optimizer = torch.optim.SGD(
    params=lin_reg.parameters(),
    lr=.01
) # SGD

# Build a training loop in pytorch

## What we need in a training loop
0. Loop through the data
1. Forward pass/propagation to make predictions
2. Compute the loss: compare forward pass predictions vs ground truth
3. Optimizer zero grad
4. Loss backward / **backpropagation** - computes the gradients of each of the parameters with respect to the loss
5. Optimizer step / **gradient descent** - adjusts our model's parameters to improve the loss

In [None]:
torch.manual_seed(42)

# An epoch: one loop through the data (a hyperparameter); a single forward pass
epochs = 200

# Track different values
epoch_count = []
loss_values = []
test_loss_values = []

### Train
# 0.
for epoch in range(epochs):
  lin_reg.train() # set the model to training mode which sets all parameters to require gradients

  ## 1. forward pass
  y_pred = lin_reg(X_train)

  ## 2. loss
  loss = loss_fn(y_pred, y_train)
  # print(f"Loss: {loss}")

  ## 3. optimizer zero grad
  # starts fresh
  optimizer.zero_grad()

  ## 4. backpropagation - on the loss with respect to the parameters of the model
  loss.backward()

  ## 5. step the optimizer (perform gradient descent)
  # by default, how the optimizer changes will accumulate
  # through the loop, hence we have to zero them in step 3
  # for the next iteration of the loop
  optimizer.step()

  ### Testing mode
  lin_reg.eval() # turns off different settings in the model not needed for testing (ex) dropout, batch norm layers)
  with torch.inference_mode(): # turns off gradient tracking + a couple more things behind
    # 1. forward pass
    test_pred = lin_reg(X_test)

    # 2. compute the loss
    test_loss = loss_fn(test_pred, y_test)

  if epoch & 10 == 0:
    epoch_count.append(epoch)
    loss_values.append(loss)
    test_loss_values.append(test_loss)
    print(f"Epoch: {epoch} | Test: {loss} | Test loss: {test_loss}")
    print(lin_reg.state_dict())

In [None]:
loss_values

In [None]:
test_loss_values

In [None]:
## Plot loss curve
# plt.plot(epoch_count, loss_values, label="Train Loss") # Need to convert tensor type from loss_values to numpy to plot

import numpy as np
plt.plot(epoch_count, np.array(torch.tensor(loss_values).numpy()), label="Train Loss")
plt.plot(epoch_count, test_loss_values, label="Test Loss")
plt.title("Train & Test Loss Curves")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()

In [None]:
with torch.inference_mode():
  y_preds_new = lin_reg(X_test)

In [None]:
lin_reg.state_dict()

In [None]:
plot_pred(pred=y_preds)

In [None]:
plot_pred(pred=y_preds_new)

# Saving a model in pytorch

In [None]:
lin_reg.state_dict()

In [None]:
from pathlib import Path

# 1. Create model directory
MODEL_PATH = Path("models")
MODEL_PATH.mkdir(parents=True, exist_ok=True) # exist_ok=True - if the dir already exists, it won't throw an error

# 2. Create model save path
MODEL_NAME = "01_pytorch_workflow_model_0.pth" # A common pytorch convention is to save models using either a .pt or .pth file extension
MODEL_SAVE_PATH = MODEL_PATH / MODEL_NAME
MODEL_SAVE_PATH

# Save state_dict()
print(f"Saving model to: {MODEL_SAVE_PATH}")
torch.save(obj=lin_reg.state_dict(), f=MODEL_SAVE_PATH)

In [None]:
!ls -l models

# Loading a model in pytorch

In [None]:
loaded_lin_reg = LinearRegressionModel() # a subclass of NN.Module
loaded_lin_reg.state_dict()

In [None]:
loaded_lin_reg.load_state_dict(torch.load(f=MODEL_SAVE_PATH))

In [None]:
loaded_lin_reg.state_dict()

In [None]:
loaded_lin_reg.eval()
with torch.inference_mode():
  loaded_model_preds = loaded_lin_reg(X_test)
loaded_model_preds

In [None]:
loaded_lin_reg.eval()
with torch.inference_mode():
  model_preds = lin_reg(X_test)
model_preds

In [None]:
loaded_model_preds == model_preds

# 6.1 Data

In [43]:
X = torch.arange(start, end, step).unsqueeze(dim=1)
y = weight * X + bias

In [44]:
# Split data
train_split = int(0.8 * len(X))
X_train, y_train = X[:train_split], y[:train_split]
X_test, y_test = X[train_split:], y[train_split:]

len(X_train), len(y_train), len(X_test), len(y_test)

(40, 40, 10, 10)

# 6.2. Build pytorch Linear Model (this time using nn.Linear())

In [None]:
# Setup device agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

In [45]:
from torch import nn

## nn.Module:
  # almost everything in pytorch inherits nn.module
  # subclasses nn.Module which contains all the building blocks for neural networks
## 1: start with a random weight
## requires_grad=True: can this parameter be updated via gradient descent?
## dtype=torch.float: pytorch loves torch.float32
class LinearRegressionModel(nn.Module):
  def __init__(self):
    super().__init__()

    ## Use nn.Linear() for creating model parameters
    # torch.nn.Linear() - applies a linear transformation to the incoming data:
    # y = xA` + b
    # in_features=1
      # number of input data dimensions
      # 1 input feature (x) as we're modeling a simple linear regression
    # out_features=1:
      # number of output data dimensions
      # 1 output feature (y) as we're modeling a univariate model
    self.linear_layer = nn.Linear(in_features=1, out_features=1)

  # Forward
  def forward(self, x: torch.Tensor) -> torch.Tensor: # x is the input data (tensor)
    return self.linear_layer(x)

In [65]:
## 1. Need to execute this line to start over before the training loop starts
torch.manual_seed(42)
lin_reg = LinearRegressionModel()
lin_reg, lin_reg.state_dict()

(LinearRegressionModel(
   (linear_layer): Linear(in_features=1, out_features=1, bias=True)
 ),
 OrderedDict([('linear_layer.weight', tensor([[0.7645]])),
              ('linear_layer.bias', tensor([0.8300]))]))

In [None]:
next(lin_reg.parameters())

In [None]:
next(lin_reg.parameters()).device

In [47]:
# Set the model to use the target device
lin_reg.to(device)
next(lin_reg.parameters()).device

device(type='cpu')

# 6.3. Training

In [68]:
## 2. Need to execute this line to start over before the training loop starts
loss_fn = nn.L1Loss()
optimizer = torch.optim.SGD(params=lin_reg.parameters(), lr=0.01)

In [69]:
torch.manual_seed(42)

# An epoch: one loop through the data (a hyperparameter); a single forward pass
epochs = 200

# Put data on the target device(device-agnostic code for data) => error w/o this part
X_train = X_train.to(device)
y_train = y_train.to(device)
X_test = X_test.to(device)
y_test = y_test.to(device)

for epoch in range(epochs):
  ### Training
  lin_reg.train()
  y_pred = lin_reg(X_train)
  loss = loss_fn(y_pred, y_train)

  optimizer.zero_grad()

  loss.backward()

  # Update the parameters with requires_grad=True w.r.t the loss gradients to improve them
  optimizer.step()

  ### Testing mode
  lin_reg.eval()
  with torch.inference_mode():
    test_pred = lin_reg(X_test)
    test_loss = loss_fn(test_pred, y_test)

  if epoch % 10 == 0:
    print(f"Epoch: {epoch} | Train loss: {loss} | Test loss: {test_loss}")


Epoch: 0 | Train loss: 0.0012645035749301314 | Test loss: 0.013801807537674904
Epoch: 10 | Train loss: 0.0012645035749301314 | Test loss: 0.013801807537674904
Epoch: 20 | Train loss: 0.0012645035749301314 | Test loss: 0.013801807537674904
Epoch: 30 | Train loss: 0.0012645035749301314 | Test loss: 0.013801807537674904
Epoch: 40 | Train loss: 0.0012645035749301314 | Test loss: 0.013801807537674904
Epoch: 50 | Train loss: 0.0012645035749301314 | Test loss: 0.013801807537674904
Epoch: 60 | Train loss: 0.0012645035749301314 | Test loss: 0.013801807537674904
Epoch: 70 | Train loss: 0.0012645035749301314 | Test loss: 0.013801807537674904
Epoch: 80 | Train loss: 0.0012645035749301314 | Test loss: 0.013801807537674904
Epoch: 90 | Train loss: 0.0012645035749301314 | Test loss: 0.013801807537674904
Epoch: 100 | Train loss: 0.0012645035749301314 | Test loss: 0.013801807537674904
Epoch: 110 | Train loss: 0.0012645035749301314 | Test loss: 0.013801807537674904
Epoch: 120 | Train loss: 0.001264503574

In [70]:
lin_reg.state_dict()

OrderedDict([('linear_layer.weight', tensor([[0.6968]])),
             ('linear_layer.bias', tensor([0.3025]))])