# LSTM_having_features_for_TimeSeries_Forecasting
* this code was made by chat-GPT

For time series forecasting, especially when predicting future electricity consumption based on multiple features, deep learning models can significantly benefit from proper feature vectorization. This process involves transforming your raw data into a format that the neural network can effectively learn from. Given your scenario with 10 features, here are several strategies to vectorize these features for deep learning models:

### 1. **Feature Scaling**

First and foremost, normalize or standardize your features. This is crucial for models like neural networks to converge quickly. You can use Min-Max scaling to normalize the data or Z-score normalization to standardize it.

- **Normalization (Min-Max Scaling)**: Scales the features to a fixed range, usually [0, 1].
- **Standardization (Z-score normalization)**: Scales the features so they have the properties of a standard normal distribution with a mean of 0 and a standard deviation of 1.

### 2. **Sequence Windowing**

For time series data, it's important to structure your input data into sequences that the model can learn from. This is often done by creating "windows" of past observations to predict future values.

- **Fixed Windowing**: Create fixed-size input sequences (windows) of your 10 features. For instance, use the past 24 hours of data (assuming hourly sampling) to predict the next hour's electricity consumption.
- **Sliding Windows**: Similar to fixed windowing but the window slides by a certain step. For example, you might slide by one hour at a time, creating overlapping windows of data.

### 3. **Time Embeddings**

If your data includes explicit time stamps (e.g., hour of the day, day of the week), you can convert these into cyclical features using sine and cosine transformations. This helps the model capture time-based patterns like daily or weekly cycles.

### 4. **Feature Embeddings for Categorical Data**

If any of your 10 features are categorical (e.g., type of day: holiday/weekend/workday), consider using embeddings to convert these categories into continuous vectors. This can be more effective than one-hot encoding for models to capture the nuances of categorical data.

### 5. **Lag Features**

Create features that are lagged versions of the existing features. For instance, the electricity consumption from the previous day (or the same hour the previous day) can be a powerful feature for predicting future consumption.

### 6. **Rolling Window Statistics**

Generate statistical features based on rolling windows, such as the mean, median, variance, or sum of the past N hours/days. These features can capture trends and seasonality in the data.

### 7. **Differencing**

For non-stationary time series data, differencing can help stabilize the mean of the time series by removing changes in the level of a time series, and thus eliminate (or reduce) trend and seasonality.

### Incorporating into a Deep Learning Model

Once you've vectorized your features using the strategies above, you can feed them into various types of deep learning models suitable for time series forecasting, such as:

- **Recurrent Neural Networks (RNNs)**: Good for capturing temporal dependencies.
- **Long Short-Term Memory (LSTM) networks**: A special kind of RNNs, effective in learning long-term dependencies.
- **Gated Recurrent Units (GRUs)**: Similar to LSTMs, but simpler and often faster to train.
- **Convolutional Neural Networks (CNNs)**: Not just for image data; 1D CNNs can be effective for sequence data.
- **Transformer models**: Though originally designed for natural language processing, transformers can be adapted for time series forecasting by treating the time series data as a sequence.

Remember, the effectiveness of each vectorization technique and model architecture can vary based on the specifics of your dataset and the nature of the forecasting problem. It's often beneficial to experiment with different approaches and combinations thereof.

## **tips: When do we need to normailze target variable?**
The need to normalize a target variable in time series (or any other type of data) largely depends on its characteristics and the modeling approach you're using. Here are types of target variables that often require normalization:

1. **Continuous Variables with Large Range**: If your target variable is a continuous variable that spans a large range of values, normalization can help to ensure that the optimization algorithm works efficiently. This is especially true for deep learning models, where having targets on a similar scale can significantly impact the convergence rate and stability of the learning process.

2. **Skewed Variables**: For target variables that are highly skewed, normalization (or even log transformation, which is a form of normalization) can help make the distribution more symmetric, improving model performance by making it easier for the model to learn the underlying patterns.

3. **Variables with Different Units and Scales**: In the context of multivariate time series forecasting, where you might be predicting multiple targets, normalization ensures that all variables contribute equally to the error term. Without normalization, a variable with a large scale can dominate the gradient updates, potentially leading to suboptimal performance.

4. **High Magnitude Variables**: Variables with values that have a high magnitude can lead to numerical instability in deep learning models due to the way floating-point arithmetic is handled in computers. Normalizing these variables to a lower range can help prevent issues like overflow, underflow, or vanishing/exploding gradients.

### When You Might Not Need to Normalize:
- **Binary or Categorical Targets**: For classification tasks where the target variable is binary or categorical (after being one-hot encoded or otherwise transformed), normalization of the target variable itself is not typically necessary. The focus would instead be on the features.

- **Targets with Narrow Range**: If the target variable inherently falls within a narrow range and you're using a model that's less sensitive to the scale of the input (like decision trees or certain ensemble methods), normalization might not be necessary.

- **Count Data with Low Variance**: If you're dealing with count data that doesn't vary widely, normalization might not offer significant benefits. However, for highly skewed count data, transformations like log scaling can still be beneficial.

It’s important to consider the nature of your target variable and the requirements of your modeling approach when deciding on normalization. Also, the decision to normalize should be guided by experimentation and validation on your specific dataset, as the benefits can vary depending on the context and the peculiarities of the data at hand.

# LSTM basic tips
An LSTM (Long Short-Term Memory) model is a type of recurrent neural network (RNN) that is well-suited for sequence prediction problems, including time series forecasting. LSTMs are specifically designed to address and overcome the limitations of traditional RNNs, such as the vanishing gradient problem, which makes them less effective for learning long-term dependencies in data sequences.

### Why LSTMs Are Good for Time Series Forecasting

1. **Learning Long-Term Dependencies:** LSTMs can learn and remember over long sequences of inputs, which is crucial for time series data that often contains long-term patterns and dependencies. For example, in financial time series, the effects of a particular event may be felt for a long duration.

2. **Handling Sequential Data:** Since time series data is inherently sequential, the LSTM's architecture is naturally suited for this format. LSTMs process data points in sequence, allowing for the prediction of future values based on learned patterns from past observations.

3. **Flexibility in Sequence Length:** LSTMs can handle variable-length input sequences, which is beneficial for time series forecasting where the relevant history length may vary.

4. **Capability to Process Multivariate Time Series:** LSTMs can handle multiple input variables (features) at each time step, making them ideal for multivariate time series forecasting, where you predict a variable based on its own past values and other covariates.

### LSTM Model Architecture

An LSTM unit typically consists of three gates that regulate the flow of information:

- **Forget Gate:** Decides what information is discarded from the cell state.
- **Input Gate:** Updates the cell state with new information from the current input.
- **Output Gate:** Determines what the next hidden state should be, which is used for predictions and transferred to the next time step.

### Example Use Case in Time Series Forecasting

Consider a dataset where you're trying to predict future electricity demand based on past consumption patterns, weather conditions, and time indicators (like hour of the day, day of the week, etc.). An LSTM model can learn from the historical data, recognizing patterns (e.g., increased demand on hot days due to air conditioning use) and using these insights to make accurate future predictions.

### Implementation

In PyTorch, you can use the `torch.nn.LSTM` class to build your LSTM model. The key parameters to configure are:

- `input_size`: The number of features in each input timestep.
- `hidden_size`: The number of features in the hidden state.
- `num_layers`: Number of layers in the LSTM.
- `batch_first`: Whether the input and output tensors are provided with the batch dimension first.

### Conclusion

LSTMs are powerful for time series forecasting because they can capture long-term dependencies, handle sequential data effectively, and process both univariate and multivariate time series. Their ability to remember and learn from historical data makes them superior for tasks where understanding the context and the sequence of events is crucial for making accurate predictions.

## Input Size and Sequence length in LSTM
In a PyTorch LSTM model, understanding the terms **input_size** and **sequence_length** is crucial for correctly configuring your model and preparing your data. Let's break down what each of these terms means:

<br>

**input_size**

* The **input_size** parameter in an LSTM model refers to <u>the number of features</u> in each input element of the sequence. For example, if you are working with time series data where each timestep's data point includes measurements like temperature, humidity, and wind speed, and you want to include all three in your model, your **input_size** would be 3.
* It's important to note that **input_size** is not related to the sequence length or the batch size. It strictly refers to the dimensionality of each timestep within your input sequence.

<br>

**sequence_length**

* The term **sequence_length** is not an explicit parameter you pass to the LSTM in PyTorch but is a concept you need to understand to structure your input data correctly. It reffers to <u>the length of the input sequences that your model will process</u>. This can vary depending on your specific task and how you preprocess your data.
* When you feed a batch of sequences into an LSTM, PyTorch expects the input tensor to have a shape of **(seq_len, batch_size, input_size)** if you are using the default settings without setting **batch_first=True**. If **batch_firtst=True** is set, the input tensor is expected to be of shape **(batch_size, seq_len, input_size)**.
    * **seq_len**: is the sequence length, indicating how many timesteps are in each sequence.
    * **batch_size**: is the batch size, representing how many sequences are processed in parallel.
    * **input_size**: is as described above.

<br>

**Practical Example**

Imagin you're analyzing sensor data to predict future measurements, and your dataset includes features like temperature, humidity, and wind speed recorded every hour. If you decide to use the past 24 hours of data to predict the temperature in the next hour, your **input_size** would be 3 (assuming you use all three measurements as features), and your **sequence_length** would be 24. This means each input sequence fed into the LSTM would contain 24 timesteps, with each timestep containing a vector of 3 values.

<br>

When structuring your LSTM model and preparing your data, it's essential to align these dimensions correctly to ensure your model trains as expected.



# GPT4 - using Dataset and DataLoader

## About Data Leakage
Data leakage occurs when information from outside the training dataset is used to create the model. This can happen during normalization if you normalize your entire dataset before splitting it into training and test sets. To prevent this, we'll split the dataset first and then apply normalization separately to each split.


Normalization will be applied within the **Dataset** class, ensuring that it's based only on the statistics of the training set when preparing both training and validation/test sets.

## Step1: Generate Sample data & split the Dataset

In [None]:
import pandas as pd
import numpy as np

np.random.seed(42)  # For reproducibility

# Generate a DataFrame with datetime information
num_hours = 365 * 24  # A year's worth of hourly data
date_rng = pd.date_range(start='1/1/2020', end='31/12/2020', freq='H')
df = pd.DataFrame(date_rng, columns=['date'])
df['weekday'] = df['date'].dt.weekday
df['hour'] = df['date'].dt.hour
df['season'] = df['date'].dt.month % 12 // 3 + 1

# Generate synthetic features and target variable
for i in range(7):  # Additional 7 features
    df[f'feature_{i}'] = np.random.rand(len(df))
df['electricity_consumption'] = np.random.rand(len(df)) * 100  # Target variable

df.head()

Unnamed: 0,date,weekday,hour,season,feature_0,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6,electricity_consumption
0,2020-01-01 00:00:00,2,0,1,0.37454,0.671368,0.40998,0.421576,0.137686,0.120749,0.616654,1.923384
1,2020-01-01 01:00:00,2,1,1,0.950714,0.523158,0.838483,0.280547,0.260339,0.520433,0.003229,47.550482
2,2020-01-01 02:00:00,2,2,1,0.731994,0.898639,0.185176,0.895044,0.48954,0.095159,0.792586,26.352564
3,2020-01-01 03:00:00,2,3,1,0.598658,0.164393,0.554842,0.332239,0.061339,0.256357,0.243121,53.995885
4,2020-01-01 04:00:00,2,4,1,0.156019,0.804109,0.722233,0.578596,0.095686,0.451709,0.299217,17.865769


In [None]:
# Placeholder split logic (actual logic may vary based on time series considerations)
train_df = df[:int(0.8*len(df))]
test_df = df[int(0.8*len(df)):]

## Step2: Dataset and DataLoader
Apply Normalization Separately


Normalization is applied within the **Dataset** class. When you create instances of this class for training and test sets, you fit the **MinMaxScaler** on the training set and then apply this fitted scaler to transform the test set data:

In [None]:
from torch.utils.data import Dataset, DataLoader
from sklearn.preprocessing import MinMaxScaler

class TimeSeriesDataset(Dataset):
    def __init__(self, dataframe, input_steps, forecast_steps, scaler):
        """
        Initialization of the dataset with a pre-fitted scaler.
        input_steps: encoder_lenght
        forecast_steps: forecast_length
        scaler: MinMaxScaler.fit()などすでにfit済みのscaler
        """
        self.input_steps = input_steps
        self.forecast_steps = forecast_steps
        self.scaler = scaler

        # Separate features and target
        features = dataframe.drop(columns=['electricity_consumption'])
        target = dataframe[['electricity_consumption']]

        # Transform features using the already fitted scaler
        self.features = self.scaler.transform(features)
        self.target = target.values # Numpyにして渡す ※shape(レコード数, 1)の2次元データ

    def __len__(self):
        return len(self.features) - self.input_steps - self.forecast_steps

    def __getitem__(self, idx):
        X = self.features[idx:idx+self.input_steps].to_numpy()
        y = self.target[idx+self.input_steps:idx+self.input_steps+self.forecast_steps].flatten()
        # yはflattenで1次元にしている。（この1次元のデータをバッチ化して2次元にするのは、DataLoaderクラスで行われる）
        return torch.tensor(X, dtype=torch.float), torch.tensor(y, dtype=torch.float)


### flatten() in the __getitem__:
The use of **flatten()** in the **__getitem__** method of your **TimeSeriesDataset** class serves an important purpose: it ensures that the labels (targets for prediction) are in the correct shape for comparison against the model's predictions during the loss caluculation phase of training.

<br>

**UnderStanding the Shapes**

* **Model's Prediction Shape**: In the modified **ElecticityConsumptionModel**, the output predictions have a shape of **[batch_size, forecast_length]**. For instance, if you're predicting electricity consumptuon for the next 24 hours **(forecasting_length = 24)** for a batch of 20 samples **(batch_size = 20)**, the output predictions will have a shape of **[20, 24]**.
* **Target Labels Shape**: Idealy, the targete labels should match this shape exactly for proper loss computation. However, when slicing arrays or tensors, there's a risk of introducing or retaining an unnecesssary extra dimension, resulting in a shape like **[20, 24, 1]** instead of **[20, 24]**.

<br>

**The role of flatten()**

* **Flattening Labels**: By applying **flatten()**, you remove any extra dimensions in the labels, converting a potential shape of **[20, 24, 1]** to **[20, 24]**. This operation ensures that the labels are directly comparable to the model's output without dimension mismatch issues.
* **Why It's Necessary**: During the training phase, specifically in the loss calculation step, PyTorch expects the predictions and labels to have compatible shapes. A mismatch, such as an extra dimension in the labels, can lead to errors or incorrect loss calculations. Using **flatten()** (or similarly **squeeze()**) standardizes the shapes, facilitating correct and efficient tarining.

<br>

**Example**

Suppose your lables tensor initially has a shape of **[20, 24, 1]** due to how the data was sliced or prepared. this shape indicates that each of the 24 forecasted hours has been encapsulated in its own dimension (the extra **1**), which is unnecessary for comparison with the model's output. Flattening adjusts this to **[20, 24]**, aligning it with the prediction shape and allowing for correct loss computation.

In [None]:
import torch
import torch.nn as nn

encoder_length = 168  # 7 days of hourly records (= sequence length)
forecast_length = 24  # Predicting the next 24 hours

# Fit scaler on training features
scaler = MinMaxScaler()

# Drop the 'date' column along with 'electricity_consumption' to prepare features for scaling
features_train = train_df.drop(columns=['date', 'electricity_consumption'])
scaler.fit(features_train)

# When initializing your datasets, ensure the 'date' column is also excluded from the features
train_dataset = TimeSeriesDataset(train_df.drop(columns=['date']), encoder_length, forecast_length, scaler)
test_dataset = TimeSeriesDataset(test_df.drop(columns=['date']), encoder_length, forecast_length, scaler)

# set DataLoader
train_loader = DataLoader(train_dataset, batch_size=20, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=20, shuffle=False)

In [None]:
for X, y in train_loader:
    print(X.shape)
    print(y.shape)
    break

# By using 168 hours records with 10 features, we predict next 24 hours.
# 20 is the numbers of batch size

torch.Size([20, 168, 10])
torch.Size([20, 24])


### Data Structure of `self.target` in `__getitem__`
The data structure of **self.target** in the **__getitem__** method is initially determined by how it's set in the **__init__** method. Since **self.target** is assigned as **target.values**, wehere **target** is a DataFrame containing only the **electicity_consumption** column, **self.target** will be a 2D numpy array with shape **(n, 1)**, where **n** is the number of rows in the DataFrame. This shape corresponds to the total number of data points in your dataset for the target variable.


When you access **self.target** within **__getitem__**, for each item, you're slicing this array to get a portion of it based on **idx**, **input_steps**, and **forecast_steps**. This slicing operation for **y**:

```
y = self.target[idx+self.input_steps:idx+self.input_steps+self.forecast_steps].flatten()
```

This line takes a slice of **self.target**, corresponding to the forecast period, and then flatten it. The flattening operation changes its **<u>shape from a 2D array to a 1D array</u>**. Therefore, after flattening, if **forecast_steps** were 24, for example, **y** would have a shape of **(24, )**. The flattening is done because your target varialbe (**y**) for each sample is expected to be a 1D tensor representing the series of electricity consumption values you're trying to predict for the forecast period.


to summarize, before flattening, each slice of **self.target** that corresponds to a single **y** in **__getitem__** would have a shape like **(forecast_steps, 1)**, after flattening, its shape would be **(forecast_steps, )**.


In [None]:
df[['electricity_consumption']].values.shape

(8761, 1)

In [None]:
df[['electricity_consumption']].values.flatten().shape

(8761,)

### Where is batch size data created?
The batch dimension is not explicitly created within the **__getitem__()** method of a PyTorch **Dataset** class. Instead, the batching logic is hadled by the **DataLoader**, which wraps around the **Dataset**.

<br>

**__getitem__() metohd**:

* The **__getitem__()** method is responsible for retrieving a <u>single item</u> from the dataset. When you implement a custom dataset by subclassing PyTorch's **Dataset**, you define how a single sample of data is processed and returned by this method. In your case, for each index **idx**, **__getitem__()** returns a single sample (and its correspoding label or target) where both input(**x**) and target(**y**) are shaped according to the individual sample's requirements. For the target, this means a 1D tensor with th length equal to **forecast_steps**, as per your setup.

<br>

**DataLoader and Batching**:
* The **DataLoader** takes your **Dataset** instance and allows for easy iteration over the dataset in mini-batches. When you use a **DataLoader** with your dataset, it automatically gathers samples into batches. It does this by calling the **__getitem__()** method of your dataset multiple times to fetch individual samples and then stacking these samples together to form a batch.
* By default, <u>the **DataLoader** adds an extra dimension (the batch dimension) as the first dimension of the tensors</u> it creates. This means if your **__getitem__()** method returns a target tensor **y** with shape **(forecast_steps, )** for a single sample, and you set your **DataLoader**'s **batch_size** to **N**, the DataLoader will combine these individual samples into a batch where the shape of **y** in each batch will be **(N, forecast_steps)**. This is because it stacks **N** such 1D tensors along a new dimension, resulting in a 2D tensor.


For example, if you create a DataLoader with your **TimeSeriesDataset** like this:

```
dataset = TimseSeriesDataset(dataframe=df, input_steps=12, forecast_steps=24, scaler=scaler)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
```

For each iteration over the **dataloader**, it will yield batches where each **X** has the shape **(32, input_steps, number_of_features)** and each **y** has the shape **(32, forecast_steps)**, assuming **input_steps** is the length of the input sequence and **number_of_features** is the number of features per tiemstep.

<br>

This batching mechanism is crucial for training neural network efficiently, as it allows for parallel processing of multiple data smples, reducing trainig time and leveraging optimization techniques like mini-batch gradient descent.

## Step3: Define the Model
* compared to GPT4(No Dataset) edition, My `ElectricityConsumptionModel` might need some modifications to work effectively with the **train_loader** and **test_loader** from the modified dataset and batching approach. Specifically, the modifications will adress how the model processes batches of data and integrates with the changed shape of input sequences and labels generated by the **DataLoader**.

### Key considerations:
1. **Batch Processing**: Ensure the model can handle batches of sequences as input. This involves correctly handling the input dimensions expected by the LSTM layer.
2. **Output Size**: Since you're predicting the next 7 days (168 hours) but your model's current output is set for the last 24 hours (this is menthion about No Dataset edition model below), you need to adjust the output size if your intention is to predict a different time frame.
3. **Sequence Dimensions**: LSTM in PyTorch expects inputs of the shape **(seq_len, batch, input_size)**, but if you're using batch_first=True, it exptects **(batch, seq_len, input_size)**. Make sure your data conforms to these expectations.


Given these points, here is an updated version of your model assuming you wish to predict the next 24 hours and that your datase sequences are prepared accordingly:

In [None]:
class ElectricityConsumptionModel(nn.Module):
    def __init__(self, input_size, hidden_layer_size, output_size=24):
        super(ElectricityConsumptionModel, self).__init__()
        self.hidden_layer_size = hidden_layer_size

        # Assuming your data is batch_first
        self.lstm = nn.LSTM(input_size, hidden_layer_size, batch_first=True)

        # Adjust the linear layer to output one value per time step
        self.linear = nn.Linear(hidden_layer_size, output_size)

    def forward(self, input_seq):
        # No need to manually reshape if your DataLoader already structures batches correctly
        lstm_out, _ = self.lstm(input_seq)

        # lstm_out shape is [batch_size, seq_len, hidden_layer_size]
        # We want to apply the linear layer to each time step, so we reshape accordingly
        # mistake
        # predictions = self.linear(lstm_out.contiguous().view(-1, self.hidden_layer_size))
        # Reshape predictions to [batch_size, seq_len, output_size]
        # Assuming output_size corresponds to the forecast length, e.g., 168 for 7 days
        # predictions = predictions.view(input_seq.size(0), -1, self.linear.out_features)

        # Taking the output of the last step from LSTM which is relevant for the prediction
        # Note: lstm_out[:, -1, :] gives us the last step output for all batches
        predictions = self.linear(lstm_out[:, -1, :])

        return predictions


In [None]:
X.size()

torch.Size([20, 168, 10])

In [None]:
print('y.shape: ', y.shape)

y.shape:  torch.Size([20, 24])


In [None]:
lstm_layer = nn.LSTM(input_size=10, hidden_size=100)
liner_layer = nn.Linear(100, 24)

In [None]:
lstm_out, _ = lstm_layer(X)
print(lstm_out.shape)

torch.Size([20, 168, 100])


In [None]:
predictions = liner_layer(lstm_out[:, -1, :])
print(predictions.shape)

torch.Size([20, 24])


In [None]:
### 以下は、forward() stepにて、データ構造を間違えたバージョン
# 参考用にとっておく

# predictions = self.linear(lstm_out.contiguous().view(-1, self.hidden_layer_size))
liner_input = lstm_out.contiguous().view(-1, 100)
print(liner_input.shape)
# 20 * 168 = 3360
'''
torch.Size([3360, 100])
'''

predictions = liner_layer(liner_input)
print(predictions.shape)
'''
torch.Size([3360, 24])
'''

print('input_seq.size(0)= X.size(0): ',X.size(0) ) # <- batchの数を取り出し
print('self.linear.out_features: ',liner_layer.out_features) # <- linear層の出力要素数
# predictions = predictions.view(input_seq.size(0), -1, self.linear.out_features)
print('shape before view(input_seq.size(0), -1, self.linear.out_features): ',predictions.shape)
predictions = predictions.view(X.size(0), -1, liner_layer.out_features)
print('shape after predictions.view(): ', predictions.shape)
'''
input_seq.size(0)= X.size(0):  20
self.linear.out_features:  24
shape before view(input_seq.size(0), -1, self.linear.out_features):  torch.Size([3360, 24])
shape after predictions.view():  torch.Size([20, 168, 24])

upper result is fault by gpt4

input_seq.size(0)= X.size(0):  20
self.linear.out_features:  24
shape before view(input_seq.size(0), -1, self.linear.out_features):  torch.Size([3360, 24])
shape after predictions.view():  torch.Size([20, 168, 24])


* Modifications Expained
    * **Batch Handling**: The model's LSTM layer is set with **batch_first=True**, meaning the input sequences should be shaped as **(batch, seq_len, input_size)**, which aligns with how the data is prepared by the **DataLoader**.
    * **Output Size**: The ``output_size** parameter in the **__init__**method defaults to 24, assuming you want to predict the next 24 hours. The **Linear** layer is adjusted accordingly.
    * **Forward Pass Adjustments**: The forward pass doesn't need to reshape the input sequence if your **DataLoader** correctly batches the data. The LSTM output is directly passed to the **Linear** layer after reshaping it to match the expected dimensions. The output predictions are then reshaped to ensure they're in the format of **(batch_size, seq_len,output_size)** before being returned.


This model is now prepared to work with your batched data for both training and validation steps, assuming the sequences and labels are structures and labels are structured correclty by your custom **Dataset** and **DataLoader** setup.

## Step4: Training Step & Test(Validation) Step

### Training Step
The training step involves iterating over the train_loader, passing each batch of data through the model, calculating the loss, and updating the model's parameters based on this loss.

In [None]:
def train_model(model, train_loader, loss_function, optimizer, epochs=10):
    model.train() # Set the model to training mode

    for epoch in range(epochs):
        total_loss = 0.0
        for seq, labels in train_loader:
            optimizer.zero_grad() # Clears existing gradients
            y_pred = model(seq) # Generate prediction
            loss = loss_function(y_pred, labels) # Calculate loss
            loss.backward() # Backpropagation
            optimizer.step() # Update model parameters

            total_loss += loss.item()
        average_loss = total_loss / len(train_loader)
        if epoch % 2 == 0:
            print(f"Epoch {epoch+1}/{epochs}, Loss: {average_loss:.4f}")


### Test (Validation) Step
The test step involves iterating over the test_loader to evaluate the model's performance on unseen data. It's crucial to disable gradient computations during this phase since we're only interested in assessing the model, not training it.

In [None]:
def validate_model(model, test_loader, loss_function):
    model.eval() # Set the model to evaluation mode
    total_loss = 0
    with torch.no_grad(): # Disable gradient computation
        for seq, labels in test_loader:
            y_pred = model(seq) # Generate prediction
            loss = loss_function(y_pred, labels) # Calculate loss
            total_loss += loss.item()

    average_loss = total_loss / len(test_loader)
    print(f"Test Loss: {average_loss:.4f}")

### Integrating the Training and Validation Steps
Now, let's integrate these steps with the rest of your setup. We assume the ElectricityConsumptionModel, loss_function, and optimizer are already defined and instantiated based on your provided model class and settings.

In [None]:
# Model instantiation
input_size = 10
hidden_layer_size = 100
output_size = forecast_length
model = ElectricityConsumptionModel(input_size, hidden_layer_size, output_size)

# Loss and optimizer
loss_function  = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Training the model
train_model(model, train_loader, loss_function, optimizer, epochs=10)

# Validating the model
validate_model(model, test_loader, loss_function)

Epoch 1/10, Loss: 1978.8811
Epoch 3/10, Loss: 840.8022
Epoch 5/10, Loss: 834.6953
Epoch 7/10, Loss: 834.7721
Epoch 9/10, Loss: 834.8583
Test Loss: 834.9776


# GPT4 - No Dataset and DataLoader edition
* **caution**
    * This section's code has data leakage!!!

## Step1: Environment Setup
First, ensure you have PyTorch installed in your environment. If not, you can install it using pip:

In [None]:
!pip install torch torchvision

## Step 2: Generate Sample Data
We'll create synthetic electricity consumption data with the specified features and hourly records. For simplicity, our features will be randomly generated but will follow a logical pattern for a time series scenario.

In [None]:
import pandas as pd
import numpy as np

np.random.seed(42)  # For reproducibility

# Generate a DataFrame with datetime information
num_hours = 365 * 24  # A year's worth of hourly data
date_rng = pd.date_range(start='1/1/2020', end='31/12/2020', freq='H')
df = pd.DataFrame(date_rng, columns=['date'])
df['weekday'] = df['date'].dt.weekday
df['hour'] = df['date'].dt.hour
df['season'] = df['date'].dt.month % 12 // 3 + 1

# Generate synthetic features and target variable
for i in range(7):  # Additional 7 features
    df[f'feature_{i}'] = np.random.rand(len(df))
df['electricity_consumption'] = np.random.rand(len(df)) * 100  # Target variable

df.head()


  date_rng = pd.date_range(start='1/1/2020', end='31/12/2020', freq='H')


Unnamed: 0,date,weekday,hour,season,feature_0,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6,electricity_consumption
0,2020-01-01 00:00:00,2,0,1,0.37454,0.671368,0.40998,0.421576,0.137686,0.120749,0.616654,1.923384
1,2020-01-01 01:00:00,2,1,1,0.950714,0.523158,0.838483,0.280547,0.260339,0.520433,0.003229,47.550482
2,2020-01-01 02:00:00,2,2,1,0.731994,0.898639,0.185176,0.895044,0.48954,0.095159,0.792586,26.352564
3,2020-01-01 03:00:00,2,3,1,0.598658,0.164393,0.554842,0.332239,0.061339,0.256357,0.243121,53.995885
4,2020-01-01 04:00:00,2,4,1,0.156019,0.804109,0.722233,0.578596,0.095686,0.451709,0.299217,17.865769


In [None]:
print(df['date'].min())
print(df['date'].max())

2020-01-01 00:00:00
2020-12-31 00:00:00


## Step 3: Data Preprocessing
We'll need to normalize our features and create sequences of data that our model can learn from. We will split the data into training and test sets as well.

In [None]:
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

# Normalize features
scaler = MinMaxScaler()
df.iloc[:,1:-1] = scaler.fit_transform(df.iloc[:,1:-1])

# Function to create sequences
def create_sequences(input_data, target_data, input_steps, forecast_steps):
    X, y = [], []
    for i in range(len(input_data) - input_steps - forecast_steps):
        X.append(input_data.iloc[i:(i+input_steps)].values) # レコードをinput_steps分取り出し
        y.append(target_data.iloc[i+input_steps:i+input_steps+forecast_steps].values)
    return np.array(X), np.array(y)

encoder_length = 168  # 7 days of hourly records (= sequence length)
forecast_length = 24  # Predicting the next 24 hours

# Creating sequences
X, y = create_sequences(df.iloc[:,1:-1], df[['electricity_consumption']], encoder_length, forecast_length)
# X, y = create_sequences(df.iloc[:,0:-1], df[['date']], encoder_length, forecast_length)

# Splitting dataset
# !!! data leaking occer here!!! We should split data first, after that, we normalize each split data!!!
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [None]:
df.shape

(8761, 12)

In [None]:
X.shape
# 特徴量11個で1レコードを構成し、168レコードを一回の予測に使う

(8569, 168, 11)

In [None]:
y.shape

(8569, 24, 1)

In [None]:
from PIL import Image
im = Image.open('/content/drive/MyDrive/study_DeepLearning/Pytorchによる時系列予測(forecasting使わない)/data_structure.jpg')
im

Output hidden; open in https://colab.research.google.com to view.

* ※ 上記では、batch処理しないので、バッチサイズは考慮されていない！！！

In [None]:
df.iloc[:,1:-1].columns
# 日付と電気消費量(y)列以外を抽出

Index(['weekday', 'hour', 'season', 'feature_0', 'feature_1', 'feature_2',
       'feature_3', 'feature_4', 'feature_5', 'feature_6'],
      dtype='object')

## Step 4: Define the Model
For transfer learning, let's assume we have a pre-trained model that we want to adapt. We'll create a simple LSTM model for demonstration. The adaptation will happen in the final layers, where we adjust the model to predict the next 24 hours of electricity consumption.

CLASS **torch.nn.LSTM(self, input_size, hidden_size, num_layers=1, bias=True, batch_first=False, dropout=0.0, bidirectional=False, proj_size=0, device=None, dtype=None)**
<br><br>
* Parameters
    * input_size – The number of **expected features** in the input x
    * hidden_size – The number of features in the hidden state h
    * num_layers – Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking in outputs of the first LSTM and computing the final results. Default: 1
    * bias – If False, then the layer does not use bias weights b_ih and b_hh. Default: True
    * batch_first – If True, then the input and output tensors are provided as (batch, seq, feature) instead of (seq, batch, feature). Note that this does not apply to hidden or cell states. See the Inputs/Outputs sections below for details. Default: False
    * dropout – If non-zero, introduces a Dropout layer on the outputs of each LSTM layer except the last layer, with dropout probability equal to dropout. Default: 0
    * bidirectional – If True, becomes a bidirectional LSTM. Default: False
    * proj_size – If > 0, will use LSTM with projections of corresponding size. Default: 0

モデル呼出し時と出力時のデータ構造
* Inputs: input, (h_0, c_0)
    * **input**: tensor of shape **(L, H_in)** for unbatched input, **(L, N, H_in)** when `batch_first=False` or **(N, L, H_in)** when `batch_first=True` containing features of the input sequence. The input can also be a packed variable length  sequence.
    * **h_0**: tensor of shape (D * num_layers, H_out) for unbatched input or (D * num_layers, N, H_out) containing the initial hidden state for each element in the input sequence. Defaults to zeros if (h_0, c_0) is not provided.
    * **c_0**: tensor of shape (D * num_layers, H_cell) for unbatched input or (D * num_layers, N, H_cell) containing the initial cell state for each element in the input sequence. Defaluts to zeros if (h_0, c_0) is not provided.
    * where:
        * N = batch size
        * L = sequence length
        * D = 2 if bidirectional = True otherwise 1
        * H_in = input_size
        * H_cell = hidden_size
        * H_out = proj_size if proj_size > 0 otherwise hidden_size
* Outputs: output, (h_n, c_n)
    * **output**: tensor of shape (L, D * H_out) for unbathed input, (L, N, D * H_out) when `batch_first=False` or (N, L, D * H_out) when `batch_first=True` containing the output features (h_t) from the last layer of the LSTM, for each t . if a `torch.nn.utils.rnn.PackedSequence` has been given as the input, the output will also be a packed sequence. When `bidrectional=True`, output will contain a concatenation of the forward and reverse hidden states at each time step in the sequence.
    * **h_n**: 省略
    * **c_n**: 省略


In [None]:
import torch
import torch.nn as nn

class ElectricityConsumptionModel(nn.Module):
    def __init__(self, input_size, hidden_layer_size, output_size):
        super(ElectricityConsumptionModel, self).__init__()
        self.hidden_layer_size = hidden_layer_size

        self.lstm = nn.LSTM(input_size, hidden_layer_size)

        self.linear = nn.Linear(hidden_layer_size, output_size)

    def forward(self, input_seq):
        # input_seq = X_trainのイテレーションデータ:Size(encoder_length, features)=(168, 10)
        # input_seq.view(len(input_seq) ,1, -1)により(168,10)にバッチの次元を追加している->(168, 1, 10)
        # ↑ lstmでbatch_firstを設定していないので、lstmへの入力値のshapeは(seqeunce_length, batch_size, features)
        #　　のため、その3次元にしている
        lstm_out, _ = self.lstm(input_seq.view(len(input_seq) ,1, -1))
        # lstm_outのshape ->(168, 1, 100) この最後の100はhidden_layer_size
        # lstm_out.view(len(input_seq), -1)により、shapeを(168, 10)に変換している
        predictions = self.linear(lstm_out.view(len(input_seq), -1))
        # predictionsのshape -> (168, 1)
        return predictions[-24:]  # We're interested in the last 24 hours

# Model instantiation
model = ElectricityConsumptionModel(input_size=10, hidden_layer_size=100, output_size=1)


## Step 5: Train the Model
For simplicity, we'll outline a basic training loop. In a real-world scenario, you'd include validation checks, possibly adjust learning rates, and perform more complex model evaluations.

In [None]:
def train_model(model, X_train, y_train, epochs=10):
    loss_function = nn.MSELoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

    model.train()
    for i in range(epochs):
        for seq, labels in zip(X_train, y_train):
            optimizer.zero_grad()
            y_pred = model(torch.tensor(seq, dtype=torch.float32))
            single_loss = loss_function(y_pred, torch.tensor(labels, dtype=torch.float32))
            single_loss.backward()
            optimizer.step()

        if i%2 == 0:
            print(f'epoch: {i:3} loss: {single_loss.item():10.8f}')

# Training the model (This will not run here due to computational limitations)
train_model(model, X_train, y_train, epochs=2)


## Step 6: Validate the Model
After training, you should validate your model's performance on the test dataset to ensure it generalizes well.

In [None]:
def validate_model(model, X_test, y_test):
    model.eval()
    predictions = []
    actuals = []
    with torch.no_grad():
        for seq, labels in zip(X_test, y_test):
            y_pred = model(torch.tensor(seq, dtype=torch.float32))
            predictions.append(y_pred.numpy())
            actuals.append(labels)
    # Calculate accuracy or any other performance metrics
    # This is a placeholder for actual performance calculation
    print("Validation complete - model performance metrics here")
    return predictions, actuals

# Validate the model (This will not run here due to computational limitations)
predictions, actuals = validate_model(model, X_test, y_test)


In [None]:
print(len(predictions))
print(len(actuals))

# LSTM by gpt3.5

## Step 1: Data Preparation
First, you need to prepare your dataset. This includes loading your data, normalizing it, and creating input sequences and their corresponding labels.

### Generate Sample Data
This data will consists of 10 features, with each row representing an hourly record.

In [None]:
import numpy as np
import pandas as pd

def generate_sample_data(num_records=1000):
    # Generate random data for 10 features
    data = np.random.rand(num_records, 10)

    # Assume the last feature is related to electricity consumption
    # and use it to create a target variable
    # The actual consumption is some combination of the features plus noise
    consumption = data[:, -1] * 0.5 + np.random.normal(0, 0.02, size=num_records)

    return pd.DataFrame(data, columns=[f'feature{i}' for i in range(1, 11)]), consumption

features, consumption = generate_sample_data()

In [None]:
features

Unnamed: 0,feature1,feature2,feature3,feature4,feature5,feature6,feature7,feature8,feature9,feature10
0,0.772502,0.481144,0.788958,0.073508,0.191594,0.551534,0.947088,0.074758,0.043084,0.156551
1,0.641153,0.398993,0.825552,0.071933,0.196414,0.300026,0.626730,0.685173,0.124203,0.943805
2,0.341044,0.713169,0.524208,0.222758,0.846228,0.818238,0.396072,0.588608,0.257826,0.689852
3,0.309027,0.844079,0.217385,0.014803,0.268508,0.191052,0.508286,0.203703,0.763190,0.241371
4,0.706968,0.578471,0.986250,0.999901,0.869526,0.759983,0.386456,0.753277,0.956676,0.023378
...,...,...,...,...,...,...,...,...,...,...
995,0.500430,0.394782,0.557026,0.298789,0.485754,0.334745,0.195208,0.649367,0.673436,0.829232
996,0.874752,0.704462,0.687498,0.056676,0.109779,0.812563,0.232951,0.355565,0.145297,0.195152
997,0.775417,0.261860,0.449035,0.151671,0.677930,0.728270,0.361692,0.784747,0.239907,0.691904
998,0.729965,0.968657,0.232322,0.093710,0.263035,0.122862,0.169694,0.334115,0.413991,0.903724


In [None]:
consumption[:20]

array([ 0.06933425,  0.46952594,  0.35327824,  0.15068412, -0.01577216,
        0.17310564,  0.32834341,  0.18863397,  0.32956524,  0.29764606,
        0.27623026,  0.30566959,  0.36660977,  0.00204014,  0.14995074,
        0.02140257,  0.2976937 ,  0.15681663,  0.47826454,  0.03816285])

### Data Preprocessing
For LSTM models, we need to format our data into sequences. We'll also split the data into training and testing sets.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
import torch
from torch.utils.data import TensorDataset, DataLoader

In [None]:
# suqenceデータの作成
# twを5にすると、3次元のデータ構造で、x方向に10個(特徴量数), y方向に5個(時系列数), z方向に995個（len(data_normalized) - 5)の
# データが作られる。これは、LSTM用に、各yに対して5時点分のsequenceデータを用意している作業

def create_sequnces(featurs, targets, time_steps=1):
    Xs, ys = [], []
    for i in range(len(features) - time_steps):
        Xs.append(features[i:(i+time_steps)])
        ys.append(targets[i+time_steps])
    return np.array(Xs), np.array(ys)

# Normalize data
# ※データをtraintとtestに分割する前に標準化を適用しているので、データリーク生じているので注意！！！！！！
scaler = MinMaxScaler()
features_scaled = scaler.fit_transform(features)

# Create sequences
time_steps = 5
X, y = create_sequnces(features_scaled, consumption, time_steps)

In [None]:
X.shape

(995, 5, 10)

In [None]:
y.shape

(995,)

In [None]:
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Convert to Pytorch tensors
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32)

# Create TensorDatasets and DataLoaders
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)

batch_size = 64
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

## Step 2: Define the LSTM Model
Here's a simple LSTM model in PyTorch. The model takes sequences of data with 10 features and outputs a prediction for the future electricity consumption.

In [None]:
import torch.nn as nn

class ElectricityConsumptionLSTM(nn.Module):
    def __init__(self, input_size=10, hidden_layer_size=100, output_size=1):
        super(ElectricityConsumptionLSTM, self).__init__()
        self.hidden_layer_size = hidden_layer_size
        self.lstm = nn.LSTM(input_size, hidden_layer_size)
        self.linear = nn.Linear(hidden_layer_size, output_size)

    def forward(self, input_seq):
        # Get batch size
        batch_size = input_seq.size(0)

        # 最初のhidden_stateの値設定
        # h0 = torch.zeros(1, input_seq.size(0), self.hidden_layer_size)
        # c0 = torch.zeros(1, input_seq.size(0), self.hidden_layer_size)
        h0 = torch.zeros(1, batch_size, self.hidden_layer_size)
        c0 = torch.zeros(1, batch_size, self.hidden_layer_size)

        # lstm_out, _ = self.lstm(input_seq.view(len(input_seq), 1, -1), (h0, c0))
        lstm_out, _ = self.lstm(input_seq.transpose(0, 1), (h0, c0))
        # predictions = self.linear(lstm_out.view(len(input_seq), -1))
        predictions = self.linear(lstm_out[-1])  # Take the last output from the sequence

        return predictions.squeeze()  # Squeeze to remove any unnecessary dimensions

# Instantiate the model, define the loss function and the optimizer
model = ElectricityConsumptionLSTM(input_size=10)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

In [None]:
 hidden_layer_size=100
 h0 = torch.zeros(1, batch_size, hidden_layer_size)
 h0.shape

torch.Size([1, 64, 100])

In [None]:
tmp_data = next(iter(train_loader))
tmp_data[0].shape

torch.Size([64, 5, 10])

In [None]:
tmp_data[0].transpose(0, 1).shape

torch.Size([5, 64, 10])

You can use `torch.transpose()` to rearrange the dimensions of a tensor according to your specific needs, such as converting a batched sequence tensor from **(batch_size, seq_len, input_size)** to **(seq_len, batch_size, input_size)** for compatibility with certain PyTorch modules like LSTM, as we did in the previous example.

In [None]:
model = ElectricityConsumptionLSTM(input_size=10)
lstm_out, _ = model.lstm(tmp_data[0].transpose(0, 1))

In [None]:
lstm_out.shape

torch.Size([5, 64, 100])

In [None]:
lstm_out[-1].shape # -> 5つのシーケンスのうちの最後のデータが予測値ということ？

torch.Size([64, 100])

In this modification:

* We use `input_seq.transpose(0, 1)` to transpose the input sequence tensor so that the batch size becomes the first dimension. This ensures that the input tensor has the shape `(seq_len, batch_size, input_size)`, which is compatible with the expected input shape for the LSTM layer.
* We take only the last output from the LSTM sequence `(lstm_out[-1])` since we're interested in predicting the next value based on the entire sequence.

## Step3: Training the Model

In [None]:
epochs = 5

for epoch in range(epochs):
    for inputs, targets in train_loader:
        optimizer.zero_grad()
        # Forward pass
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        # Backward pass and optimize
        loss.backward()
        optimizer.step()

    print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")

Epoch 1, Loss: 0.0259
Epoch 2, Loss: 0.0212
Epoch 3, Loss: 0.0211
Epoch 4, Loss: 0.0148
Epoch 5, Loss: 0.0230


## Step6: Evaluating the Model
After training, you should evaluate the model's performance on the test set. This basic example doesn't include evaluation steps, but you would typically predict on the test set and compare it against the true values using a suitable metric (e.g., MSE for regression tasks).

In [None]:
with torch.no_grad():
    predictions = []
    for inputs, _ in test_loader:
        predictions.append(model(inputs).numpy())

# flatten the list of predictions
predictions = np.concatenate(predictions, axis=0)

In [None]:
loss = criterion(torch.tensor(predictions), torch.tensor(y_test))
print(f"Test Loss: {loss.item():.4f}")

Test Loss: 0.0211


In [None]:
len(predictions)

199

In [None]:
len(y_test)

199

### tips: about batch_first=True

In PyTorch's LSTM (and generally in RNN-based modules), the **batch_first=True** argument is important for defining how the input data should be structured. By default, PyTorch expects the input tensor to the LSTM layer to have the shape **(seq_len, batch, features)**, where:

* **seq_len** is the length of the sequence,
* **batch** is the batch size, and
* **features** is the number of features per time step.

<br><br>
When you set **batch_first=True**, it changes the expected input shape to **(batch, seq_len, features)**, which is a format more familiar to those who work with other types of neural networks (like CNNs) and might be more intuitive depending on how you preprocess or think about your data.
<br><br>
You should add **batch_first=True** to the LSTM initialization in the model definition if your input data is formatted with the batch size as the first dimension. Given the way we structured the input data in our example (especially in the DataLoader), you would typically want to set **batch_first=True** to align with the **(batch, seq_len, features)** format.
<br><br>
Here's a revised snippet of the LSTM model definition with **batch_first=True**:

In [None]:
# batch_first=Trueの場合のコード
class ElectricityConsumptionLSTM(nn.Module):
    def __init__(self, input_size=10, hidden_layer_size=100, output_size=1):
        super().__init__()
        self.hidden_layer_size = hidden_layer_size

        # Notice the batch_first=True here
        self.lstm = nn.LSTM(input_size, hidden_layer_size, batch_first=True)

        self.linear = nn.Linear(hidden_layer_size, output_size)

    def forward(self, input_seq):
        lstm_out, _ = self.lstm(input_seq)  # No need to reshape the input_seq here
        predictions = self.linear(lstm_out[:, -1, :])  # Adjusted for batch_first=True
        return predictions


With **batch_first=True**, note how the input to the LSTM no longer needs to be reshaped in the forward pass, and when accessing the output from the LSTM to pass to the linear layer, we use **lstm_out[:, -1, :]** to get the last time step's output for all elements in the batch. This change simplifies handling sequence data when your batches are the first dimension of the tensor.