# Stochastic Gradient Descent (SGD)

## Theory

Stochastic Gradient Descent (SGD) is an optimization algorithm commonly used in machine learning to train models. It updates the model's parameters iteratively using a gradient computed from a randomly selected subset of data.

The formula for the update step in SGD is:

$$ \theta_{\text{new}} = \theta_{\text{old}} - \eta \cdot \nabla_\theta J(\theta_{\text{old}}, x^{(i)}, y^{(i)}) $$

Here:
- $ \theta_{\text{old}} $ represents the current parameters of the model.
- $ \eta $ is the learning rate, a hyperparameter that determines the size of the steps taken towards the minimum of the loss function.
- $ \nabla_\theta J(\theta_{\text{old}}, x^{(i)}, y^{(i)}) $ is the gradient of the loss function $ J $ with respect to the parameters $ \theta $, evaluated at the current parameter values and based on a single data point (or a small batch) $ (x^{(i)}, y^{(i)}) $.
- $ \theta_{\text{new}} $ represents the updated parameters after the current iteration.

In each iteration, a data point or a small batch of data points is randomly selected, and the gradient of the loss function with respect to the model parameters is computed using only this subset. The parameters are then updated in the direction that reduces the loss, with the magnitude of the update controlled by the learning rate.

SGD is particularly useful when dealing with large datasets, as it allows for faster convergence compared to batch gradient descent. However, it introduces some level of randomness due to the random selection of data points, which can lead to noisy updates.

It is important to tune the learning rate carefully, as a high learning rate can cause the algorithm to overshoot the minimum of the loss function, while a low learning rate can result in slow convergence.

SGD is a widely used optimization algorithm in various machine learning models, including neural networks, linear regression, and logistic regression.



## Implementation from Scratch

In [6]:
import torch

The following code block shows the implementation of the `SGDScratch` class, which is a custom implementation of the Stochastic Gradient Descent (SGD) optimization algorithm.

In [2]:
class SGDScratch:
    def __init__(self, parameters, lr=0.01):
        self.parameters = parameters
        self.lr = lr

    def step(self):
        with torch.no_grad():
            # Update parameters (gradient descent)
            for param in self.parameters:
                param -= self.lr * param.grad

    def zero_grad(self):
        for param in self.parameters:
            # Zero gradients (if they exist)
            if param.grad is not None:
                param.grad.zero_()

## Trainer Implementation from Scratch

The code block defines a class called `TrainerScratch` that is used for training and evaluating a model using the Stochastic Gradient Descent (SGD) optimization algorithm. Here's a breakdown of the class and its methods:

- `__init__(self, model, train_dataloader, val_dataloader, optimizer, criterion, custom_metrics=None)`: This is the constructor method that initializes the `TrainerScratch` object. It takes the following parameters:
  - `model`: The model to be trained and evaluated.
  - `train_dataloader`: The dataloader for the training data.
  - `val_dataloader`: The dataloader for the validation data.
  - `optimizer`: The optimizer used for updating the model's parameters.
  - `criterion`: The loss function used for computing the loss.
  - `custom_metrics`: Optional custom metrics to be computed during validation.

- `train_epoch(self)`: This method performs one epoch of training. It iterates over the training data, computes the loss, performs backpropagation, and updates the model's parameters. It returns the average loss for the epoch.

- `validate_epoch(self)`: This method performs one epoch of validation. It iterates over the validation data, computes the loss, and computes any custom metrics specified. It returns the average loss and the computed metrics for the epoch.

- `fit(self, num_epochs)`: This method trains the model for the specified number of epochs. It iterates over the epochs, performs training and validation, prints the training and validation losses, and plots the training and validation losses using matplotlib.

The `TrainerScratch` class provides a convenient way to train and evaluate models using SGD in a Jupyter Notebook.


In [3]:
import torch
import matplotlib.pyplot as plt

class TrainerScratch:
    """
    A class that performs training and validation using a custom model.

    Args:
        model (torch.nn.Module): The model to be trained.
        train_dataloader (torch.utils.data.DataLoader): The dataloader for training data.
        val_dataloader (torch.utils.data.DataLoader): The dataloader for validation data.
        optimizer (torch.optim.Optimizer): The optimizer used for training.
        criterion (torch.nn.Module): The loss function used for training.
        custom_metrics (dict, optional): A dictionary of custom metrics to evaluate during validation. 
            The keys are the names of the metrics and the values are instances of custom metric classes.

    Attributes:
        model (torch.nn.Module): The model to be trained.
        train_dataloader (torch.utils.data.DataLoader): The dataloader for training data.
        val_dataloader (torch.utils.data.DataLoader): The dataloader for validation data.
        optimizer (torch.optim.Optimizer): The optimizer used for training.
        criterion (torch.nn.Module): The loss function used for training.
        custom_metrics (dict): A dictionary of custom metrics to evaluate during validation. 
            The keys are the names of the metrics and the values are instances of custom metric classes.
    """

    def __init__(self, model, train_dataloader, val_dataloader, optimizer, criterion, custom_metrics=None):
        # Initialize the TrainerScratch object with the provided arguments
        self.model = model
        self.train_dataloader = train_dataloader
        self.val_dataloader = val_dataloader
        self.optimizer = optimizer
        self.criterion = criterion
        self.custom_metrics = custom_metrics if custom_metrics else {}

    def train_epoch(self):
        """
        Trains the model for one epoch using the training data.

        Returns:
            float: The average loss for the epoch.
        """
        # Set the model to training mode
        self.model.train()
        total_loss = 0
        for batch in self.train_dataloader:
            inputs, targets = batch

            # Zero the gradients
            self.optimizer.zero_grad()

            # Forward pass
            outputs = self.model(inputs)

            # Compute the loss
            loss = self.criterion(outputs, targets)

            # Backward pass
            loss.backward()

            # Update the weights
            self.optimizer.step()

            total_loss += loss.item()

        avg_loss = total_loss / len(self.train_dataloader)
        return avg_loss

    def validate_epoch(self):
        """
        Validates the model for one epoch using the validation data.

        Returns:
            tuple: A tuple containing the average loss for the epoch and a dictionary of metric results.
        """
        # Set the model to evaluation mode
        self.model.eval()
        total_loss = 0
        with torch.no_grad():
            for batch in self.val_dataloader:
                inputs, targets = batch

                # Forward pass
                outputs = self.model(inputs)

                # Compute the loss
                loss = self.criterion(outputs, targets)
                total_loss += loss.item()

                # Update the custom metrics
                for name, metric in self.custom_metrics.items():
                    metric.update(outputs, targets)

        avg_loss = total_loss / len(self.val_dataloader)

        # Compute the metric results
        metrics_results = {name: metric.compute() for name, metric in self.custom_metrics.items()}
        return avg_loss, metrics_results

    def fit(self, num_epochs):
        """
        Trains the model for the specified number of epochs.

        Args:
            num_epochs (int): The number of epochs to train the model for.
        """
        train_losses = []
        val_losses = []

        for epoch in range(num_epochs):
            # Train the model for one epoch
            train_loss = self.train_epoch()

            # Validate the model for one epoch
            val_loss, val_metrics = self.validate_epoch()
            
            # Append the losses to the lists
            train_losses.append(train_loss)
            val_losses.append(val_loss)

            # Print the epoch information
            print(f"Epoch {epoch+1}/{num_epochs}, Training Loss: {train_loss:.4f}, Validation Loss: {val_loss:.4f}")
            for name, value in val_metrics.items():
                print(f"Validation {name}: {value:.4f}")

            # Reset custom metrics for next epoch
            for metric in self.custom_metrics.values():
                metric.reset()
        
        # Plot the training and validation losses
        plt.plot(train_losses, label="Training Loss")
        plt.plot(val_losses, label="Validation Loss")
        plt.xlabel("Epoch")
        plt.ylabel("Loss")
        plt.title("Training and Validation Loss")
        plt.legend()
        plt.show()