# train, validation, test in PyTorch

# v1. from blog
https://blog.paperspace.com/training-validation-and-accuracy-in-pytorch/

When it comes to Neural Networks it becomes essential to set optimal architecture and hyper parameters. While training a neural network the training loss always keeps reducing provided the learning rate is optimal. But it's important that our network performs better not only on data it's trained on but also data that it has never seen before. One way to measure this is by introducing a validation set to keep track of the testing accuracy of the neural network. In this article we'll how we can keep track of validation accuracy at each training step and also save the model weights with the best validation accuracy.

## import data

In [None]:
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, random_split

transforms = transforms.Compose([
                                 transforms.ToTensor()
])

train = datasets.MNIST('', train = True, transform = transforms, download = True)
train, valid = random_split(train,[50000,10000])

trainloader = DataLoader(train, batch_size=32)
validloader = DataLoader(valid, batch_size=32)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 79766759.36it/s]


Extracting MNIST/raw/train-images-idx3-ubyte.gz to MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 18921539.18it/s]


Extracting MNIST/raw/train-labels-idx1-ubyte.gz to MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 30194861.19it/s]


Extracting MNIST/raw/t10k-images-idx3-ubyte.gz to MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 4293560.69it/s]


Extracting MNIST/raw/t10k-labels-idx1-ubyte.gz to MNIST/raw



## Building Model

In [None]:
import torch
from torch import nn
import torch.nn.functional as F

class Network(nn.Module):
    def __init__(self):
        super(Network,self).__init__()
        self.fc1 = nn.Linear(28*28, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 10)

    def forward(self, x):
        x = x.view(1,-1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

model = Network()

if torch.cuda.is_available():
    model = model.cuda()

In the above code, we defined a neural network with the following architecture:-

* Input Layer: 784 nodes, MNIST images are of dimension 28*28 which have 784 pixels so when flatted it’ll become the input to the neural network with 784 input nodes.
* Hidden Layer 1: 256 nodes
* Hidden Layer 2: 128 nodes
* Output Layer: 10 nodes, for 10 classes i.e. numbers 0-9

## Defining Criterion and Optimizer

In [None]:
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01)

## Training Neural Network with Validation
The training step inPyTorch is almost identical(同一の) almost every time you train it. But before implementing that let's learn about 2 modes of the model object:
* **Training Mode**: Set by **`model.train()`**, it tells your model that you are training the model. So layers like dropout etc. which behave differently while training and testing can behave accordingly.
* **Evaluation Mode**: Set by **`model.eval()`**, it tells your model that you are testing the model.


If you add the validation loop it'll be the same but with forward pass and loss calculation only. But it may happen that <u>your last iteration is'nt the one that gave you the least (最も少ない) validation loss</u>. To tackle this we can set a max valid loss which can be **np.inf** and if the current valid loss is lesser than we can save the **state dictionary** of the model which we can load later, like a checkpoint. **state dict** is an OrderedDict object that maps each layer to its parameter tesor.

In [None]:
import numpy as np
epochs = 5
min_valid_loss = np.inf

for epoch in range(epochs):
    train_loss = 0.0
    model.train() # Optional when not using Model Specific layer
    for data, labels in trainloader:
        if torch.cuda.is_available():
            data, labels = data.cuda(), labels.cuda()
        optimizer.zero_grad()
        target = model(data)
        loss = criterion(target, labels)
        loss.backward()
        optimizer.step()
        train_loss += loss.item()

    valid_loss = 0.0
    model.eval() # Optional when not using Model Specific layer
    for data, labels in validloader:
        if torch.cuda.is_available():
            data, labels = data.cuda(), labels.cuda()

        target = model(data)
        loss = criterion(target, labels)
        valid_loss += loss.item()

    train_loss = train_loss / len(trainloader)
    valid_loss = valid_loss / len(validloader)
    print(f'Epoch {e+1} \t\t Training Loss: {train_loss} \t\t Validation Loss: {valid_loss}')

    # validation lossが低いモデルを保存する
    if min_valid_loss > valid_loss:
        print(f'Validation Loss Decreased({min_valid_loss:.6f}--->{valid_loss:.6f}) \t Saving The Model')
        min_valid_loss = valid_loss
        # Saving State Dict
        torch.save(model.state_dict(), 'save_model.pt')

# train, valid, test explanation by gpt4
These three stages(train, validation, test) are pivotal in developing, tuning, and evaluating machine learning models to ensure they perform well on unseen data, not just on the data they were trained on.

## 1. Train Step
Role in PyTorch:
The training step involves using a subset of your data (the training set) to teach your model to make predictions or perform its intended task. In PyTorch, this step usually involves defining a model architecture (using **torch.nn** module), a loss functioon (like MSE or CrossEntropyLoss), and an optimizer (such as SGD or Adam). During training, the model makes predictions on the training data, calculates the error (loss) by comparing these predictions to the actual targets, and then adjusts the model's weights through backpropagation to reduce this error.


Why It's Needed:
Training is crucial because it's how your model learns the relationship between inputs and outputs. It's the phase where your model iteratively improves its accuracy on the dataset provided.

## 2. Validation Step
Role in PyTorch:
Validation involves periodically(定期的に) evaluationg the model on a separate subset of the data (the validation set) that is not used for training. This step is crucial for hyperparameter tuning and for avoiding overfitting. Overfitting occurs when the model performs well on the training data but poorly on unseed data, meaning it has learned the training data too well, including its noise and outliers. In PyTorch, you typically run the validation step after certain number of training epochs to check the model's performance and make decisions about which model to save, when to stop training, or it's time to <u>adjust learning rates or other hyperparameters</u>.


Why It's Needed:
The validation step is essential for making sure your model generalizes well and doesn't just memorize the training data. It provides a reliable estimate of the performance of your model on unseen data without touching the test set.


## 3. Test Step
Role in PyTorch:
After training and validation, the final step is to evaluate your model on the test set, another separate subset of data that the model has never seen during training or vaidation. This step gives you the final mesure of how well your model is expected to perform in the real world on completely unseen data. In PyTorch, you'd typically load your best model and run it on the test dataset to assess its performance using appropriate metrics(accuracy, precision, recall, F1 score, etc.), depending on your task.


Why It's Needed:
The test step is critila for assessing the final model's performance. It's your best estimate of how the model will perform in real-world scenarios or on completely new, unseen data.


## Conclusion
Separating into training, validation, and testing phases allows you to train models effectively, tune hyperparameters without bias, and evaluate performance with an unbiased estimate of real-world performance. This approach helps in developing robust models that generalize well beyond the specific data they were trained on, ensureing reliability and effectiveness in practical applications.

# Tuning Parameters in validation for LSTM
Adjusting hyperparameters is a crucial part of tuning models, especially complex ones like LSTMs, for tasks such as predicting future stock prices based on features like opening price, closing price, volume, and high/low values. Here are some key hyperparameters you might consider adjusting during the validation step when working with LSTMs and linear layers in such a predictive model:


## For LSTM Layers
1. **Number of Layers**: The depth of the LSTM can significantly impact performance. Adding more layers can help the model learn complex patterns but also increases the risk of overfitting and requires more computational power.
2. **Hidden Units**: The number of units in the LSTM layers. More units can increase the model's capacity but also its complexity and likelihood of overfitting.
3. **Dropout Rate**: Dropout is a regularization technique where randomly selected neurons are ignored during training, reducing the risk of overfitting. Adjusting the dropout rate can help find a balance between model complexity and generalizatioon.
4. **Learning Rate**: This controls how much the model's weights are adjusted with respect to the loss gradient. Too high a learning rate can cause the model to converge too quickly to a suboptimal solution, while too low a rate can make training excessively slow.
5. **Batch Size**: The number of training examples used in one itertion. Smaller batch sizes can offer a regularizing effect and more stable convergence, while larger batches provide computational efficiency.


## For Linear Layers
1. **Output Features**: In linear layers, particularly the final one, you need to adjust the output features to match your prediction target. For stock price prediction, this might be 1 (predicting the future price) or more if predictiong multiple future points or features.


## For Both LSTM and Linear Layers
1. **Optimizer**: The coice of optimizer (e.g., Adam, SGD, RMSprop) can affect training dynamics and final model performance.
2. **Learning Rate Schedule**: Adjusting the learning rate over time (e.g., reducing it when the validation loss plateaus) can help fine-tuning the model.


## General Hyperparameters
* **Epochs**: While technically not adjusted during training, deciding when to stop training (early stopping) based on validation performance is critical.
* **Sequence Length**: For time-series data like stock prices, the length of the input sequences can impact model performance.


## Model Architecture Specifics
* **Combining LSTM and Linear Layers**: How you structure the model - such as the sequence of LSTM and linear layers, use of dropout or batch normalization between layers, and whether to stack multiple LSTM layers before feeding the output to linear layers - can all be considered hyperparameters to some extent.


Adjusting these hyperparameters requires a careful balance. Typically, this is done through a process of experimentation, guided by performance on the validation set. Tools and techniques like grid search, random search, or more advanced methods like Bayesian optimization can be used to explore the hyperparameter space efficiently.


Remember, the goal is to improve the model's ability to generalize to unseed data, not just to perform well on the training or validation set. Monitoring validation performance while adjusting these hyperparameters is key to finding a wel-tuned model for predicting future stock prices.


# Example code of train, validation, and test step
Could you make sample program runing in Jupyter notebook for me understand each step of train, validatio, and test steps? Please realize following condition, when you make code.

---condition---
1. Please make sample daily data having 1 target ("stock price") and 4 features ("yesterday price", "last change rate", "Oil price", "CPI").
2. Please separate data into train, validation, and test.
3. Use pytorch and create LSTM model to predict tomorrow stock price.
4. Sequence length of LSTM is 60 days.
5. Make code to run jupyter notebook.
6. Make code in accordance with general manner of pytorch.
7. Please make code step by step such "step1-generate sample data", "step2-set dataset & dataloader", ....
8. Please add evaluation step when training model.
9. Pease add test step.
10. Afeter finishing test step, please predict tomorrows stock price giving last 60 days data. (make Synthetic Data of 60 days)

**Step1: Generate Sample Data**
First, we'll generate synthetic daily data for a hypothetical stock. This data will include 1 target variable (stock price) and 4 features ('yesterday price', 'last change rate', 'Oil price', 'CPI').


**Step2: Set Dataset & DataLoader**
We'll define a PyTorch **`Dataset`** to handle our synthetic stock data, then create **`DataLoader`** for training, validation, and testing.


**Step3: Define the LSTM Model**
We will design a simple LSTM model suitable for our task.


**Step4: Training, Validation, and Test Steps**
We'll write the training loop, including validation within the loop to adjust hyperparameters or for early stopping. After training, we will evaluate the model on the test set.


**Step5: Prediction**
Finally, we'll predict tomorrow's stock price using the last 60 days of data.

In [None]:
import torch
from torch import nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import numpy as np
import pandas as pd

## Step1: Generate Sample Data

In [None]:
# Step 1: Generate Sample Data
np.random.seed(0)  # For reproducibility
data_size = 365  # Let's generate data for one year as an example
dates = pd.date_range('2023-01-01', periods=data_size)
features = np.random.randn(data_size, 4)  # Generating random features
target = features[:, 0] * 0.5 + np.random.randn(data_size) * 0.1  # A simple relation to 'yesterday price' plus noise
print(features.shape, target.shape)

(365, 4) (365,)


In [None]:
features.shape[0]

365

## Step2: Set Dataset & DataLoader

In [None]:
# Step2: Set Dataset & DataLoader
class StockDataset(Dataset):
    def __init__(self, features, target, seq_length=60):
        self.features = features
        self.target = target
        self.length = seq_length

    def __len__(self):
        # 使えるdata数を返す
        return len(self.target) - self.length

    def __getitem__(self, index):
        return self.features[index:index+self.length], self.target[index+self.length]
        # (self.features[index:index+self.seq_length], self.target[index+self.seq_length])

# Splitting the dataset into train, validation, test
train_size = int(0.6 * data_size)
val_size = int(0.2 * data_size)
test_size = data_size - train_size - val_size

# make dataset object
train_dataset = StockDataset(features[:train_size+60], target[:train_size+60])
val_dataset = StockDataset(features[train_size:train_size+val_size+60], target[train_size:train_size+val_size+60])
test_dataset = StockDataset(features[-(test_size+60):], target[-(test_size+60):])

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=False)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

In [None]:
print(features[-(test_size+60):].shape, target[-(test_size+60):].shape)

(133, 4) (133,)


## Step3: Define the LSTM Model

In [None]:
for x, y in train_loader:
    print(x.shape, y.shape)
    break

torch.Size([32, 60, 4]) torch.Size([32])


In [None]:
# Step3: Define the LSTM Model
class LSTMModel(nn.Module):
    def __init__(self, input_size=4, hidden_layer_size=100, output_size=1):
        super(LSTMModel, self).__init__()
        self.hidden_layer_size = hidden_layer_size
        self.lstm = nn.LSTM(input_size, hidden_layer_size, batch_first=True)
        self.linear = nn.Linear(hidden_layer_size, output_size)

    def forward(self, input_seq):
        # batch_first=True時の入力順は、lstm(batch_size, sequence_length, features)
        lstm_out, _ = self.lstm(input_seq)
        predictions = self.linear(lstm_out.contiguous().view(-1, self.hidden_layer_size))
        return predictions[-input_seq.size(0):]

### lstm_out.contiguous().view(-1, self.hidden_layer_size)in self.linear()
**`lstm_out.contiguous().view(-1, self.hidden_layer_size)`** plays an important role in preparing the output of the LSTM layer for the linear layer that follows. Let's break down what each part of this line does and why it's necessary:

#### Understanding **`lstm_out`**
First, **lstm_out** is the output tensor from the LSTM layer. When **batch_first=True** is used, <u>the shape of **lstm_out** will be **[batch_size, seq_len, hidden_layer_size]**</u>. This tensor contains the output features from the LSTM layer for each time step in the sequence, for each sequence in the batch.


#### Purpose of **`.contiguous()`**
* **.contiguous()**: This method is used to ensure that the tensor is stored in a contiguous(連続した) block of memory. After certain operations, such as **permute**(次元の並び替え), the layout of the tensor in memory can becom disorganized, meaning that the tensor is not stored in a linear sequence in memory. This can happen even if you don't explicitly use **permute** but after other operations that change tensor shape or structure. Calling **.contiguous()** rearranges the tensor in memory to ensure it's stored sequentially, which is required for some operations like **.view()** that expect tensors to be contiguous. (contiguousにより、データがメモリ上に要素順に再配置される)

#### Function of **`.view(-1, self.hidden_layer_size)`**
* **.view(-1, self.hidden_layer_size)**: This method is used to reshape the tensor. The **-1** tells PyTorch to infer(推測する) the size of this dimension based on the original size of the tensor and the other specified dimensions. Gives that **lstm_out** originally has the shape **[batch_size, seq_len, hidden_layer_size]**, reshaping it to <u>**[-1, self.hidden_layer_size]**</u> essentially <u>flattens the first two dimensions (batch and sequence length)</u> while preserving the **hidden_layer_size** dimension. This results in a 2D tensor where the first dimension is **batch_size * seq_len**, and the second dimension is **hidden_layer_size**. this reshaping is necessary because the linear layer expects inputs of a fixed size (in this case, **hidden_layer_size**), and we're preparing the LSTM outputs (from all timesteps of all sequences) to be processed by this layer.


#### In the Context of the Model
The reshaping is crucial for passing the LSTM's output to the linear layer because:
* It allows processing the outputs from all time steps of all sequences through the linear layer in a single batch operation, which is efficient and leverages parallelism.
* It prepares the data correctly if you want to make predicitons or transformations on the outputs of each time step individually.


Here's simplified representation to give you an idea:


Before **.view()**, you have a tensor shaped **[batch_size, seq_len, hidden_layer_size]** representing the LSTM'S output for each time step of each sequence. After applying **.contiguous().view(-1, self.hidden_layer_size)**, you get a 2D tensor where each row corresponds to an output from a specific time step of a specific sequence, ready to be fed into the linear layer.



This operation effectively transforms the output for each time step into a format that can bbe individually processed by the linear layer, enabling the model to learn from the temporal features extracted by the LSTM for every single time step across all sequences.

## Step4: Training and Validation

In [None]:
# Step4: Training and Validation
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = LSTMModel().to(device)
loss_function = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

epochs = 10
for epoch in range(epochs):
    # training step
    model.train()
    for inputs, targets in train_loader:
        inputs = inputs.float().to(device)
        targets = targets.float().to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = loss_function(outputs, targets)
        loss.backward()
        optimizer.step()

    # Validation step
    with torch.no_grad():
        model.eval()
        val_loss = 0.0
        for inputs, targets in val_loader:
            inputs = inputs.float().to(device)
            targets = targets.float().to(device)
            outputs = model(inputs)
            val_loss += loss_function(outputs, targets).item()
        val_loss /= len(val_loader)
        print(f"Epoch {epoch+1}, Train Loss: {loss.item():.4f}, Val Loss: {val_loss:.4f}")

  return F.mse_loss(input, target, reduction=self.reduction)
  return F.mse_loss(input, target, reduction=self.reduction)
  return F.mse_loss(input, target, reduction=self.reduction)


Epoch 1, Train Loss: 0.1979, Val Loss: 0.2085
Epoch 2, Train Loss: 0.1885, Val Loss: 0.2072
Epoch 3, Train Loss: 0.1818, Val Loss: 0.2067
Epoch 4, Train Loss: 0.1784, Val Loss: 0.2065
Epoch 5, Train Loss: 0.1771, Val Loss: 0.2063
Epoch 6, Train Loss: 0.1763, Val Loss: 0.2059
Epoch 7, Train Loss: 0.1747, Val Loss: 0.2053
Epoch 8, Train Loss: 0.1717, Val Loss: 0.2044
Epoch 9, Train Loss: 0.1672, Val Loss: 0.2031
Epoch 10, Train Loss: 0.1626, Val Loss: 0.2025


## Step5: Evaluate on Test Set

In [None]:
# Step5: Evaluate on Test Set
with torch.no_grad():
    model.eval()
    test_loss = 0.0
    for inputs, targets in test_loader:
        inputs = inputs.float().to(device)
        targets = targets.float().to(device)
        outputs = model(inputs)
        test_loss += loss_function(outputs, targets).item()
    test_loss /= len(test_loader)

print(f"Test Loss: {test_loss:.4f}")

Test Loss: 0.2756


In [None]:
for x, y in test_loader:
    print(x.shape, y.shape)


torch.Size([32, 60, 4]) torch.Size([32])
torch.Size([32, 60, 4]) torch.Size([32])
torch.Size([9, 60, 4]) torch.Size([9])


In [None]:
outputs.shape
# batch size の分予測される

torch.Size([9, 1])

## Step6: Predict Tomorrow's Stock Price

In [None]:
# Step6: Predict Tomorrow's Stock Price
last_60_days_features = features[-60:]  # Assuming this is new unseen data
last_60_days_features.shape

(60, 4)

In [None]:
with torch.no_grad():
    model.eval()
    features_added_batch_dimension = torch.tensor(last_60_days_features).float().to(device).unsqueeze(0)
    # last_60_days_featuresのshape: (60, 4)
    # features_added_batch_dimensionのshape: (1, 60, 4)
    prediction = model(features_added_batch_dimension)
    print(f"Predicted stock price for tomorrow: {prediction.item()}")

Predicted stock price for tomorrow: 0.07109205424785614


In [None]:
torch.tensor(last_60_days_features).float().to(device).unsqueeze(0).shape

torch.Size([1, 60, 4])

In [None]:
last_60_days_features.shape

(60, 4)

## add: Saving and Reloading the Model

### Saving the Model
After the training process, you can save the model to a file. PyTorch offers two main ways to save a model: saving the entire model using **torch.save()** or saving just the model state dictionary. Saving the state dictionary is the recommended way, asi it allows for more flexibility if you need to change the model architecture later but still want to use the pretrained weights.


Here's how to save the model state dictionaly:

In [None]:
# Save the model state
model_path = '/content/drive/MyDrive/study_DeepLearning/data/train_val_test_in_pytorch_model.pth'
torch.save(model.state_dict(), model_path)

This code will save the model's weights to a file named "train_val_test_in_pytorch_model.pth"

### Loading the Model
To load the model for future predictions, you first need to recreate the model structure and then load the state dictionary into this structure. Here's how:

In [None]:
# Difine the model structure again
loaded_model = LSTMModel()

# Load the model state
model_path = '/content/drive/MyDrive/study_DeepLearning/data/train_val_test_in_pytorch_model.pth'

# Load the model state
loaded_model.load_state_dict(torch.load(model_path))

# Ensure the model is in evaluation mode
loaded_model.eval()

LSTMModel(
  (lstm): LSTM(4, 100, batch_first=True)
  (linear): Linear(in_features=100, out_features=1, bias=True)
)

### Making a Predcition

In [None]:
# Predict tomorrow's stock price using the loaded model
with torch.no_grad():
    loaded_model.eval()
    features_added_batch_dimension = torch.tensor(last_60_days_features).float().to(device).unsqueeze(0)
    prediction = loaded_model(features_added_batch_dimension)
    print(f"Predicted stock price for tomorrow: {prediction.item()}")

Predicted stock price for tomorrow: 0.07109205424785614
