# Early Stopping during Learning
Early stopping is a form of regularization used to avoid overfitting when training a machine learning model. It works by monitoring the model's performance on a validation dataset and stopping the training process if the model's performance ceases to improve after a certain number of epochs. this technique not only helps in preventing overfitting but also can save computational resources by stopping the training early if further training does not lead to better results.


## Implementing Early Stopping in PyTorch
PyTorch doesn't have built-in early stopping support like some other libraries (e.g., Keras), but it can be impleneted easily with a few lines of custom code. Here's a basic outline of how you can implenet early stopping:

1. **Monitor a performance metric** on a validation set.
2. **Stop training** if this metirc does not improve for a set number of epochs.
3. Optionally, **restore the weights** from when the validation loss was at its minimum.

Here's how you could implenet this:

In [None]:
import torch
import numpy as np
import torch.nn as nn
import torch.optim as optim

# Define your model
model = nn.Linear(1, 1)
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Assuming x_val and y_val are your validation dataset
def validate(model, data_loader):
    model.eval()  # Set model to evaluation mode
    total_loss = 0
    with torch.no_grad():  # No need to track gradients
        for x_batch, y_batch in data_loader:
            y_pred = model(x_batch)
            loss = criterion(y_pred, y_batch)
            total_loss += loss.item()
    return total_loss / len(data_loader)

# Training setup
num_epochs = 100
patience = 10
best_loss = np.inf
trigger_times = 0

for epoch in range(num_epochs):
    model.train()  # Set model to training mode
    # Training code (forward, loss, backward, step)
    # ...

    # Validate the model
    val_loss = validate(model, val_data_loader)
    print(f'Epoch {epoch}, Validation loss: {val_loss}')

    # Check if current validation loss is less than the best loss
    if val_loss < best_loss:
        print(f"Validation loss decreased {best_loss} -> {val_loss}")
        best_loss = val_loss
        best_model_wts = model.state_dict()  # Save the best model weights
        trigger_times = 0  # reset trigger times
    else:
        trigger_times += 1
        print(f"Validation loss did not decrease, count: {trigger_times}")

    # Early stopping condition
    if trigger_times >= patience:
        print("Early stopping!")
        model.load_state_dict(best_model_wts)  # Restore best model weights
        break

# Further code, e.g., testing or saving the model


** Key Components of the Early stopping Logic**:
* **Validation Loss Calculation**: The current loss is evaluated on the validation dataset after each epoch.
* **Loss improvement Check**: the current loss is compared with the best observed loss so far. If it's better, the model weights are saved, and the patience couter reset.
* **Patience Counter**: If the validation loss doesn't improve for a given number of consecutive epochs (**patience**), the training stops.
* **Model State Restoration**: If early stopping is triggered, the model's weights are restored to the state when it achieved the best validation loss.


This approach allows you to integrate early stopping into your training loop, providing control over how long you allow training to continue without improvement and ensuring you keep the best performing model as judged by the validation loss.

# YouTube
* https://www.youtube.com/watch?v=lS0vvIWiahU&list=PLjy4p-07OYzuy_lHcRW8lPTLPTTOmUpmi&index=15

* https://github.com/jeffheaton/app_deep_learning/blob/main/t81_558_class_03_4_early_stop.ipynb

In [1]:
# Make use of a GPU if one is available
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cpu


Because PyTorch does not include a built-in early stopping function, we must define one of our own. We will use the following **EarlyStopping** class throughout this course.

We can provide several parameters to the **EarlyStopping** object:

* **min_delta**: This value should be kept small; it specifies the minimum change that should be considered an improvement. Setting it even smaller will not likely have a great deal of impact.
* **patience**: How long should the training wait for the validation error to improve?
* **restore_best_weights**: You should usually set this to ture, as it resotores the weights to the values they were at when the validation set is the highest.

In [None]:
import copy

class EarlyStopping:
    def __init__(self, patience=5, min_delta=0, restore_best_weights=True):
        self.patience = patience
        self.min_delta = min_delta
        self.restore_best_weights = restore_best_weights
        self.best_model = None
        self.best_loss = None
        self.counter = 0
        self.status = ""

    def __call__(self, model, val_loss):
        if self.best_loss is None:
            self.best_loss = val_loss
            self.best_model = copy.deepcopy(model.state_dict())
        elif self.best_loss - val_loss > self.min_delta:
            self.best_model = copy.deepcopy(model.state_dict())
            self.best_loss = val_loss
            self.counter = 0
            self.status = f"Improvement found, counter reset to {self.counter}"
        else:
            self.counter += 1
            self.status = f"No improvement in the last {self.counter} epochs"
            if self.counter >= self.patience:
                self.status = f"Early stopping triggered after {self.counter} epochs"
                if self.restore_best_weights:
                    model.load_state_dict(self.best_model)
                return True
        return False


## explanation by gpt4 about above class
The **EarlyStopping** class is designed to halt the training process when the validation loss stops improving over a specified number of epochs. This is an important technique used to prevent overfitting, ensuring the model generalizes well to new, unseen data.

Let's break down the **EarlyStopping** class step-by-step:

### Constructor: `__init__`
* **patience**: Number of epochs to continue training without improvement in validation loss. If the validation loss does not decrease for this number of epochs, training is stopped.
* **min_delta**: The minimum change in the monitored quantity to qualify as as improvement, i.e., the change in loss must be greater than this value to count as improvement.
* **restore_best_weights**: If set to **True**, the model weights will be reverted(元に戻す) to the state when the validation loss was ata its lowest.

### Instance Variables
* **best_model**: Stores the model weights when the validation loss was at its lowest.
* **best_loss**: The lowest recorded validation loss.
* **counter**: Counts the number of consecutive epochs without an improvement in validation loss.
* **status**: A message indicating the current status of early stopping (improvement found, no improvement, or early stopping triggered.)

### Method: `__call__`
This method is called at the end of each epoch with the current model and the valdiation loss.

* **Initialization**: If **best_loss** is **None** (first epoch), initialize it with the current validation loss and save the model state.
* **Improvement Check**:
    * If the validation loss decrease by more than **min_delta** from **best_loss**, update **best_loss**, save the new best model state, and reset the **counter** to zero. Update **status** to indeicate improvement.
    * If no improvement is seen, increase the **counter** by one and update the **status** to indicate no improvement.
* **Patience Check**:
    * If the **counter** reaches the **patience** limit, update the **status** to indicate that early stopping is triggered. If **restore_best_weights** is true, load the best model weights. Finally return **True** to indicate training should stop.
* **Return `False`** if the training should continue.


### Example Usage:
1. **Initiate EarlyStopping**: An instance lf **EarlyStopping** is created with default parameters.
2. **Training Loop**: During the training loop, after every epoch, the model's performance is evaluated on the validation set.
3. **Early Stopping Check**: After evaluating, the **EarlyStopping** instance is called with the current model and the validation loss. It will return **True** if training should stop, otherwise **False**.

This mechanism helps in keeping the best performing model and stopping the training when the model starts to overfit. It's a widely used strategy in training deep learning models to ensure they perform well on unseen data.

## Early Stopping with Classification
We will now see an example of classification training with early stopping. We will train the neural network until the error no longer improves on the validation set.

In [None]:
import time

import numpy as np
import pandas as pd
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, Dataset, TensorDataset
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder, StandardScaler
from torch.autograd import Variable
import tqdm

In [None]:
# Set random seed for reproducibility
np.random.seed(42)
torch.manual_seed(42)

def load_data():
    df = pd.read_csv(
        "https://data.heatonresearch.com/data/t81-558/iris.csv", na_values=["NA", "?"]
        )

    le = LabelEncoder()

    x = df[["sepal_l", "sepal_w", "petal_l", "petal_w"]].values
    y = le.fit_transform(df["species"])
    species = le.classes_

    # Split into validation and training sets
    x_train, x_test, y_train, y_test = train_test_split(
        x, y, test_size=0.25, random_state=42
        )

    scaler = StandardScaler()
    x_train = scaler.fit_transform(x_train)
    x_test = scaler.transform(x_test)

    # Numpy to Torch Tensor
    x_train = torch.tensor(x_train, device=device, dtype=torch.float32)
    y_train = torch.tensor(y_train, device=device, dtype=torch.long)

    x_test = torch.tensor(x_test, device=device, dtype=torch.float32)
    y_test = torch.tensor(y_test, device=device, dtype=torch.long)

    return x_train, x_test, y_train, y_test, species

x_train, x_test, y_train, y_test, species = load_data()

In [None]:
print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)

torch.Size([112, 4])
torch.Size([112])
torch.Size([38, 4])
torch.Size([38])


In [None]:
x_train[-1,:]

tensor([ 1.5570, -0.0923,  1.2115,  1.1895])

In [None]:
# Create datasets
BATCH_SIZE = 16

dataset_train = TensorDataset(x_train, y_train)
dataloader_train = DataLoader(dataset_train, batch_size=BATCH_SIZE, shuffle=True)

dataset_test = TensorDataset(x_test, y_test)
dataloader_test = DataLoader(dataset_test, batch_size=BATCH_SIZE, shuffle=False)

# Create model using nn.Sequential
model = nn.Sequential(
    nn.Linear(x_train.shape[1], 50),
    nn.ReLU(),
    nn.Linear(50, 25),
    nn.ReLU(),
    nn.Linear(25, len(species)),
    nn.LogSoftmax(dim=1),
    )

model = torch.compile(model, backend="aot_eager").to(device)

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

es = EarlyStopping()

In [None]:
epoch = 0
done = False
while epoch < 1000 and not done:
    epoch += 1
    steps = list(enumerate(dataloader_train))
    pbar = tqdm.tqdm(steps)
    model.train()
    for i, (x_batch, y_batch) in pbar:
        y_batch_pred = model(x_batch.to(device))
        loss = loss_fn(y_batch_pred, y_batch.to(device))
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        loss, current = loss.item(), (i + 1) * len(x_batch)
        if i == len(steps) - 1: # 1epoch分のバッチ処理終了後に、eval処理
            model.eval()
            with torch.no_grad():
                pred = model(x_test)
                vloss = loss_fn(pred, y_test)

            if es(model, vloss):
                done = True
            pbar.set_description(
                f"Epoch {epoch:03d} | Loss: {loss:.4f}, vloss: {vloss:>7f}, {es.status}")
        else:
            pbar.set_description(
                f"Epoch {epoch:03d} | Loss: {loss:.4f}"
                )

Epoch 001 | Loss: 0.6039, vloss: 0.463542, : 100%|██████████| 7/7 [00:00<00:00, 14.27it/s]
Epoch 002 | Loss: 0.2977, vloss: 0.255949, Improvement found, counter reset to 0: 100%|██████████| 7/7 [00:00<00:00, 125.54it/s]
Epoch 003 | Loss: 0.0834, vloss: 0.191380, Improvement found, counter reset to 0: 100%|██████████| 7/7 [00:00<00:00, 129.71it/s]
Epoch 004 | Loss: 0.2334, vloss: 0.127771, Improvement found, counter reset to 0: 100%|██████████| 7/7 [00:00<00:00, 132.53it/s]
Epoch 005 | Loss: 0.1344, vloss: 0.081698, Improvement found, counter reset to 0: 100%|██████████| 7/7 [00:00<00:00, 146.73it/s]
Epoch 006 | Loss: 0.0627, vloss: 0.058430, Improvement found, counter reset to 0: 100%|██████████| 7/7 [00:00<00:00, 99.58it/s]
Epoch 007 | Loss: 0.1053, vloss: 0.047269, Improvement found, counter reset to 0: 100%|██████████| 7/7 [00:00<00:00, 121.06it/s]
Epoch 008 | Loss: 0.0103, vloss: 0.026243, Improvement found, counter reset to 0: 100%|██████████| 7/7 [00:00<00:00, 123.81it/s]
Epoch 0

In [None]:
pred = model(x_test)
vloss = loss_fn(pred, y_test)
print(f"Loss = {vloss}")

Loss = 0.00976768508553505


As you can see from above, we did not use the total number of requested epochs. The neural network training stopped once the validation set no longer improved.

In [None]:
from sklearn.metrics import accuracy_score

pred = model(x_test)
_, predict_classes = torch.max(pred, 1)
correct = accuracy_score(y_test.cpu(), predict_classes.cpu())
print(f"Accuracy: {correct}")

Accuracy: 1.0


# Merit of early stopping

Early stopping is a regularization technique used to prevent overfitting in deep learning models. It involves monitoring the model's performance on a validation dataset and halting the training process when the performance stops improving. Here's how it works and its potential impact on predictive power:

### How Early Stopping Works

1. **Training and Validation Splits**: During training, the dataset is typically split into training and validation sets. The model is trained on the training set, and its performance is periodically evaluated on the validation set.
2. **Monitoring Performance**: A specific metric, such as validation loss or accuracy, is monitored. Early stopping keeps track of this metric over epochs (iterations of training).
3. **Patience Parameter**: A patience parameter is set, defining the number of epochs to wait for an improvement in the monitored metric before stopping the training. If no improvement is observed within the patience period, training is halted.

### Impact on Predictive Power

- **Preventing Overfitting**: Early stopping helps prevent overfitting, which occurs when a model performs well on the training data but poorly on unseen data. By stopping training before the model starts to overfit, early stopping aims to improve generalization to new, unseen data.
- **Optimal Performance**: The goal of early stopping is to find the point where the model performs best on the validation data, which is assumed to correlate with good performance on future data.

### Will Predictive Power Decrease?

The concern about predictive power on future data when using early stopping can be addressed as follows:

- **Improved Generalization**: By preventing overfitting, early stopping usually improves the model's ability to generalize to new data. This means that, in most cases, the predictive power on future data should not become weaker but rather more reliable.
- **Validation Set Quality**: The effectiveness of early stopping heavily depends on the quality and representativeness of the validation set. If the validation set is not representative of future data, the benefits of early stopping might not fully materialize.
- **Appropriate Patience**: Setting the patience parameter appropriately is crucial. If the patience is too low, the model might stop training too early, leading to underfitting. If it's too high, the model might still overfit despite early stopping.

### Practical Considerations

- **Cross-Validation**: Using cross-validation can provide a more robust estimate of the model's performance and help in better tuning of the early stopping parameters.
- **Multiple Runs**: Running the training process multiple times with different random seeds and averaging the results can provide a more reliable assessment of the model's predictive power.

In summary, when used appropriately, early stopping is designed to enhance the model's ability to generalize to future data, thus maintaining or even improving predictive power. However, careful tuning and validation are essential to ensure its effectiveness.

# Is it possible to use both train and validation data, afeter validation phase?
Your approach to re-train the model using both the training and validation data after the early stopping is a reasonable and sometimes practiced method, though it is **not always standard**. Let's break down the considerations and scenarios where this approach may or may not be used.

### Standard Practice
The typical workflow involves the following steps:
1. **Training Phase**: The model is trained using the training data, and its performance is monitored on the validation data.
2. **Validation Phase**: Hyperparameters and early stopping criteria are tuned based on the validation performance.
3. **Final Evaluation**: The best model configuration (based on validation performance) is evaluated on a separate test set to estimate its generalization performance.

### Using Validation Data for Final Training
Once the best hyperparameters and early stopping point are determined, re-training the model using both the training and validation data can potentially improve the model's performance. This is because the model can benefit from the additional data provided by the validation set.

#### Pros
- **Utilizing More Data**: By incorporating the validation data, you increase the amount of data available for training, which can lead to better model performance, especially if the dataset is small.
- **Improved Model**: The final model may generalize better because it has been trained on more data.

#### Cons
- **Validation Bias**: There's a risk of introducing bias because the model has been influenced by the validation data used during hyperparameter tuning and early stopping.
- **No True Validation**: By using the validation data for training, you lose the validation set's independent assessment. This makes it crucial to have a separate test set to evaluate the final model.

### Practical Scenarios
1. **When to Retrain**: Retraining using both training and validation data is more common when:
   - **Data is Limited**: When the dataset is small and the benefit of more training data outweighs the risk of validation bias.
   - **Final Model Deployment**: When deploying a final model in production, leveraging as much data as possible can be advantageous.

2. **When Not to Retrain**: It might not be ideal to retrain in scenarios where:
   - **Abundant Data**: If there is ample training data, the gains from adding validation data might be minimal.
   - **Evaluation Integrity**: If a separate, large, and representative test set is available, maintaining the integrity of the validation set for hyperparameter tuning and early stopping is more critical.

### Implementation Steps
1. **Initial Training**: Train the model using the training data and determine the optimal hyperparameters and early stopping criteria based on the validation set.
2. **Retraining with Combined Data**:
   - Combine the training and validation data.
   - Retrain the model using the combined dataset with the previously determined hyperparameters and stopping criteria.
3. **Final Evaluation**: Use a separate test set to evaluate the final model's performance.

### Summary
Re-training the model using both training and validation data can be beneficial in scenarios where additional data is needed to improve model performance. However, it should be done with caution, considering the potential introduction of validation bias. Ensuring a separate test set is available for the final evaluation is crucial to maintain an unbiased estimate of the model's generalization performance.