# Learnin_Rate_Schedules
* https://www.youtube.com/watch?v=lMMlbmfvKDQ&list=PLjy4p-07OYzuy_lHcRW8lPTLPTTOmUpmi&index=18
* https://github.com/jeffheaton/app_deep_learning/blob/main/t81_558_class_04_2_schedule.ipynb

In [5]:
import warnings
warnings.filterwarnings('ignore')

In [1]:
import copy
import torch

try:
    import google.colab

    COLAB = True
    print("Note: using Google CoLab")
except:
    print("Note: not using Google CoLab")
    COLAB = False

# Make use of a GPU or MPS (Apple) if one is available.  (see module 3.2)
import torch
has_mps = torch.backends.mps.is_built()
device = "mps" if has_mps else "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

Note: using Google CoLab
Using device: cpu


## Early Stopping Class

In [2]:
class EarlyStopping:
    def __init__(self, patience=5, min_delta=0, restore_best_weights=True):
        self.patience = patience
        self.min_delta = min_delta
        self.restore_best_weights = restore_best_weights
        self.best_model = None
        self.best_loss = None
        self.counter = 0
        self.status = ""

    def __call__(self, model, val_loss):
        if self.best_model is None:
            self.best_model = copy.deepcopy(model.state_dict())
            self.best_loss = val_loss
        elif self.best_loss - val_loss > self.min_delta:
            self.best_model = copy.deepcopy(model.state_dict())
            self.best_loss = val_loss
            self.counter = 0
            self.status = f"Improvement found, counter reset to {self.counter}"
        else:
            self.counter += 1
            self.status = f"No improvement in the last {self.counter} epochs"
            if self.counter >= self.patience:
                print(f"Early stopping after {self.counter} epochs")
                if self.restore_best_weights:
                    model.load_state_dict(self.best_model)
                return True
        return False

# Training Schedules for PyTorch
Learning rate schedules are mechanisms used during the training of neural networks to adjust the learning rate over time. They're designed to decrease the learning rate as the training progresses, allowing the network to make large adjustments in the initial stages of training, when the weights are likely far from their optimal values, and then make smaller adjustments as the training progresses, to fine-tune the weights. This adjustment helps mitigate the risk of overshooting the minimum point of the loss function and helps to reach convergence more smoothly.



In PyTorch, one of the learning rate scheduling tools is the **StepLR** class, found in the **torch.optim.lr_scheduler** module. **StepLR** is a type of learning rate schedule that decreases the learining rate by a certain factor every few epochs. This allows the learning rate to decrease in a step-wise fashion rather than continuoulsy, which can be beneficial in some cases, as it gives the model time to 'settle' into areas of the loss landscape before the learning rate is reduced further.


StepLR takes three parameters:
* **optimizer**: The optimizer you're using to train your model (e.g., SGD, Adam)
* **step_size**: This is the number of epochs after which you want to reduce the learining rate. For instance, if step_size=10, then the learning rate will be reduced every 10 epochs.
* **gamma**: This is the factor by which the learning rate will be reduced at each step. For instance, if gamma=0.1, the learning rate will be multiplied by 0.1 at each step. effectively reducing it by 90%.



The **StepLR** scheduler is used during the training loop. After each step of the optimizer (after **optimizer.step()**), you call **scheduler.step()** to adjust the learning rate according to the schedule.



It's worth noting that the choice of **step_size** and gamma can be important, and may need to be tuned based on your specific problem and dataset. <u>Too large a **step_size** and the learning may not reduce quickly enough</u>; <u>too small and it may reduce too quickly</u>. Similarly, a gamma <u>too close 1 may not reduce the learning rate significantly enough, while a gamma too small may reduce it too quickly</u>.



We now apply a learning rate to the k-fold cross validation example from the previous section.

In [3]:
import pandas as pd
from scipy.stats import zscore
from sklearn.model_selection import train_test_split

# Read the data set
df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/jh-simple-dataset.csv",
    na_values=['NA','?'])

# Generate dummies for job
df = pd.concat([df,pd.get_dummies(df['job'],prefix="job",dtype=int)],axis=1)
df.drop('job', axis=1, inplace=True)

# Generate dummies for area
df = pd.concat([df,pd.get_dummies(df['area'],prefix="area",dtype=int)],axis=1)
df.drop('area', axis=1, inplace=True)

# Generate dummies for product
df = pd.concat([df,pd.get_dummies(df['product'],prefix="product",dtype=int)],axis=1)
df.drop('product', axis=1, inplace=True)

# Missing values for income
med = df['income'].median()
df['income'] = df['income'].fillna(med)

# Standardize ranges
df['income'] = zscore(df['income'])
df['aspect'] = zscore(df['aspect'])
df['save_rate'] = zscore(df['save_rate'])
df['subscriptions'] = zscore(df['subscriptions'])

df.head()

Unnamed: 0,id,income,aspect,subscriptions,dist_healthy,save_rate,dist_unhealthy,age,pop_dense,retail_dense,...,area_b,area_c,area_d,product_a,product_b,product_c,product_d,product_e,product_f,product_g
0,1,-0.60755,-0.664918,-0.208449,9.017895,-0.215764,11.738935,49,0.885827,0.492126,...,0,1,0,0,1,0,0,0,0,0
1,2,0.338053,-0.207748,0.839031,7.766643,0.196869,6.805396,51,0.874016,0.34252,...,0,1,0,0,0,1,0,0,0,0
2,3,-0.184205,1.127906,-0.208449,3.632069,-0.714362,13.671772,44,0.944882,0.724409,...,0,1,0,0,1,0,0,0,0,0
3,4,-0.526467,-0.440815,-0.208449,5.372942,-0.542432,4.333286,50,0.889764,0.444882,...,0,1,0,0,1,0,0,0,0,0
4,5,-2.851675,1.638861,1.886511,3.822477,-0.47366,5.967121,38,0.744094,0.661417,...,0,0,1,1,0,0,0,0,0,0


Now that the feature vector is created a 5-fold cross-validation can be performed to generate out-of-sample predictions. We will assume 500 epochs and not use early stopping. Later we will see how we can estimate a more optimal poch count.

In [4]:
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim.lr_scheduler import StepLR
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler

In [12]:
# Convert to PyTorch tensors
x_columns = df.columns.drop(['age', 'id'])
x = torch.tensor(df[x_columns].values, dtype=torch.float32, device=device)
y = torch.tensor(df['age'].values, dtype=torch.float32, device=device)

torch.manual_seed(42)

# Cross-Validation
kf = KFold(n_splits=5, shuffle=True, random_state=42)

# Early stopping parameters
patience = 10

fold = 0
for train_index, test_index in kf.split(x):
    fold += 1
    print(f"Fold {fold}")

    x_train, x_test = x[train_index], x[test_index]
    y_train, y_test = y[train_index], y[test_index]

    # PyTorch DataLoader
    train_dataset = TensorDataset(x_train, y_train)
    train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

    # Create the model and optimizer
    model = nn.Sequential(
        nn.Linear(x.shape[1], 64),
        nn.ReLU(),
        nn.Linear(64, 32),
        nn.ReLU(),
        nn.Linear(32, 1)
    )
    model = torch.compile(model, backend="aot_eager").to(device)

    optimizer = optim.Adam(model.parameters(), lr=0.01)
    # adjust learning rate every 50 epochs
    scheduler = StepLR(optimizer, step_size=50, gamma=0.90)
    loss_fn = nn.MSELoss()

    # Early Stopping variables
    best_loss = float('inf')
    early_stopping_counter = 0

    # Training loop
    EPOCHS = 500
    epoch = 0
    done = False
    es = EarlyStopping()

    while not done and epoch < EPOCHS:
        epoch += 1
        model.train()
        for x_batch, y_batch in train_loader:
            optimizer.zero_grad()
            output = model(x_batch)
            loss = loss_fn(output, y_batch)
            loss.backward()
            optimizer.step()

        scheduler.step() # apply learning rate schedule
        # Print learning rate
        print(f"Epoch {epoch}, LR: {scheduler.get_last_lr()}")

        # Validation
        model.eval()
        with torch.no_grad():
            val_output = model(x_test)
            val_loss = loss_fn(val_output, y_test)

        # Check Early Stopping
        if es(model, val_loss):
            done = True

    print(f"Epoch {epoch}/{EPOCHS}, Validation Loss: {val_loss.item()}, {es.status}")


Fold 1
Epoch 1, LR: [0.01]
Epoch 2, LR: [0.01]
Epoch 3, LR: [0.01]
Epoch 4, LR: [0.01]
Epoch 5, LR: [0.01]
Epoch 6, LR: [0.01]
Epoch 7, LR: [0.01]
Epoch 8, LR: [0.01]
Epoch 9, LR: [0.01]
Epoch 10, LR: [0.01]
Epoch 11, LR: [0.01]
Epoch 12, LR: [0.01]
Epoch 13, LR: [0.01]
Epoch 14, LR: [0.01]
Epoch 15, LR: [0.01]
Epoch 16, LR: [0.01]
Epoch 17, LR: [0.01]
Early stopping after 5 epochs
Epoch 17/500, Validation Loss: 14.946890830993652, No improvement in the last 5 epochs
Fold 2
Epoch 1, LR: [0.01]
Epoch 2, LR: [0.01]
Epoch 3, LR: [0.01]
Epoch 4, LR: [0.01]
Epoch 5, LR: [0.01]
Epoch 6, LR: [0.01]
Epoch 7, LR: [0.01]
Epoch 8, LR: [0.01]
Epoch 9, LR: [0.01]
Epoch 10, LR: [0.01]
Epoch 11, LR: [0.01]
Early stopping after 5 epochs
Epoch 11/500, Validation Loss: 17.777740478515625, No improvement in the last 5 epochs
Fold 3
Epoch 1, LR: [0.01]
Epoch 2, LR: [0.01]
Epoch 3, LR: [0.01]
Epoch 4, LR: [0.01]
Epoch 5, LR: [0.01]
Epoch 6, LR: [0.01]
Epoch 7, LR: [0.01]
Epoch 8, LR: [0.01]
Epoch 9, LR: [0

In [13]:
# Final evaluation
model.eval()
with torch.no_grad():
    oos_pred = model(x_test)
score = torch.sqrt(loss_fn(oos_pred, y_test))
print(f"Fold score (RMSE): {score.item()}")

Fold score (RMSE): 3.5678391456604004


In [11]:
scheduler.get_last_lr()

[0.01]

# explanation by gpt4
**StepLR** is a learning rate scheduler available in PyTorch's **torch.optim.lr_scheduler** module. It is used to adjust the learning rate during training by decreasing it at specified intervals. This can help in fine-tuning the model's training process, potentially leading to better performance and faster convergence.

## How `StepLR` Works
**StepLR** reduces the learning rate of each parameter group by a factor of **gamma** every **step_size** epochs. The idea is to decrease the learning rate by some factor after a certain number of epochs, which can help in getting coloser to the global minimum of the loss function.

## Basic Usage of `StepLR`
Here's how you typically set up and use **StepLR**
1. **Initialize Your Optimizer**: First, define an optimizer (like SGD, Adam, etc.) which will update the model's weights.
2. **Define the Scheduler**: Set up the **StepLR** shceduler by specifying the optimizer, the **step_size** (number of epochs after which to adjust the learning rate), and **gamma** (the factor by which the learning rate is reduced).
3. **Training Loop**: During the training loop, you execute the optimizer to update the weights, and then you step the scheduler ata the end of each epoch (or at another specified point).

Here is a simple example in code:

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim.lr_scheduler import StepLR

# Sample model
model = nn.Linear(10, 2)

# Optimizer
optimizer = optim.SGD(model.parameters(), lr=0.1)

# Scheduler
scheduler = StepLR(optimizer, step_size=30, gamma=0.1)

# Example training loop
num_epochs = 100
for epoch in range(num_epochs):
    # Training code here
    # model.train()
    # loss.backward()
    # optimizer.step()

    # Step the learning rate scheduler
    scheduler.step()

    # Print learning rate
    print(f"Epoch {epoch+1}, LR: {scheduler.get_last_lr()}")

In this example:
* Ther learning rate starts at 0.1.
* After every 30 epochs, ot is reduced to 10% of its previous value (**gamma=0.1**).
* **scheduler.get_last_lr()** is used to check the learning rate at each epoch.

## Example code

In [14]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim.lr_scheduler import StepLR
import numpy as np

# Setting random seed for reproducibility
torch.manual_seed(0)
np.random.seed(0)

In [15]:
# Synthetic data generation
# 100 data points, 1 feature each
x = np.random.rand(100, 1).astype(np.float32)  # Feature
y = 3 * x + np.random.randn(100, 1).astype(np.float32) * 0.1  # Target
display(x[:10])
display(y[:10])

array([[0.5488135 ],
       [0.71518934],
       [0.60276335],
       [0.5448832 ],
       [0.4236548 ],
       [0.6458941 ],
       [0.4375872 ],
       [0.891773  ],
       [0.96366274],
       [0.3834415 ]], dtype=float32)

array([[1.5299256],
       [2.2356505],
       [1.8548563],
       [1.4810251],
       [1.4197896],
       [2.1272714],
       [1.4306395],
       [2.6573265],
       [2.7839131],
       [1.2557697]], dtype=float32)

In [16]:
# Convert numpy array to torch tensors
x_train = torch.from_numpy(x)
y_train = torch.from_numpy(y)

# Model definition (simple linear regression)
class LinearRegressionModel(nn.Module):
    def __init__(self):
        super(LinearRegressionModel, self).__init__()
        self.linear = nn.Linear(1, 1)

    def forward(self, x):
        return self.linear(x)

model = LinearRegressionModel()

# Loss function
criterion = nn.MSELoss()

# Optimizer
optimizer = optim.SGD(model.parameters(), lr=0.1)

# Learning rate scheduler
scheduler = StepLR(optimizer, step_size=10, gamma=0.9)

In [18]:
# Training loop
num_epochs = 50
for epoch in range(num_epochs):
    # Forward pass
    output = model(x_train)
    loss = criterion(output, y_train)

    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # Step the learning rate scheduler
    scheduler.step()

    if (epoch+1) % 5 == 0:
        current_lr = scheduler.get_last_lr()[0]
        print(f"Epoch {epoch+1}/{num_epochs}, Loss: {loss.item():.4f}, LR: {current_lr:.4f}")

# Check the final parameters of the model
print("Final parameters:")
for name, param in model.named_parameters():
    print(f"{name}: {param.data}")

Epoch 5/50, Loss: 0.4610, LR: 0.0900
Epoch 10/50, Loss: 0.4035, LR: 0.0900
Epoch 15/50, Loss: 0.3582, LR: 0.0810
Epoch 20/50, Loss: 0.3214, LR: 0.0810
Epoch 25/50, Loss: 0.2891, LR: 0.0729
Epoch 30/50, Loss: 0.2625, LR: 0.0729
Epoch 35/50, Loss: 0.2388, LR: 0.0656
Epoch 40/50, Loss: 0.2190, LR: 0.0656
Epoch 45/50, Loss: 0.2014, LR: 0.0590
Epoch 50/50, Loss: 0.1864, LR: 0.0590
Final parameters:
linear.weight: tensor([[1.5580]])
linear.bias: tensor([0.7498])


* The model is trained on a synthetic dataset where the true relationship is **y = 3x + noise**.
* **StepLR** decrease the learning rate by 10 % every 10 epochs, helping in potentially finer adjustments towards the end of training.
* The training loop points the loss and the current learning rate every 5 epochs.