# Introduction

This model will serve as our first foray into time-series forecasting using LSTMs. We will be following [this tutorial](https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/).

The code will be broken into the following sections:

```{raw}
III. Model Creation
IV. Model Training
V. Next Steps
```

# I. Data and Imports

In [1]:
import numpy as np
import tensorflow as tf
import pandas as pd
import h5py
from matplotlib import pyplot as plt

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential

from keras.preprocessing.sequence import pad_sequences

from sklearn.model_selection import train_test_split

In [7]:
BATCH_SIZE = 64
EPOCHS = 5

In [8]:
# Import data from .h5 file

# Load data from the HDF5 file
with h5py.File('preprocessed_data.h5', 'r') as hf:
    x = hf['X'][:]
    y = hf['y'][:]

# III. Model Creation

Here, we create a fairly standard LSTM model, which outputs vectors of shape (1, 35), matching the next-play in the sequence.

We would like to further explore our optimizer and loss functions, as well as various model architectures.

### III.a Normalization Layer

In [9]:
norm_layer = layers.Normalization(axis=-1)
norm_layer.adapt(x)

In [None]:
# NUM_DRIVES = 58279
NUM_PLAYS = 21
NUM_FEATURES = 35
hidden_size = 128

# Creating basic 2 layer LSTM
model = Sequential([
    layers.Input((NUM_PLAYS, NUM_FEATURES)), 
    layers.Masking(mask_value=-1.1),
    norm_layer, 
    layers.BatchNormalization(),
    layers.LSTM(hidden_size, recurrent_activation="tanh", kernel_regularizer="l2", return_sequences=True),
    layers.BatchNormalization(),
    layers.LSTM(hidden_size, recurrent_activation="tanh", kernel_regularizer="l2"),
    layers.BatchNormalization(),
    layers.Dense(NUM_FEATURES)
])

# TODO: Explore model params. Add momentum to optimizer? MSE because this feels like more of a regression problem.
model.compile(optimizer='adam',
                loss="mean_squared_error",
                metrics=['accuracy', "MSE"])

model.summary()

NameError: name 'Sequential' is not defined

# IV. Model Training

As you can see, the model trains quite well, achieving an accuracy of 54%.

We would like to add validation data to the model to ensure that it is not overfitting.

In [19]:
# Fit the model
history = model.fit(x=x, y=y, epochs=EPOCHS)

Epoch 1/5
[1m10022/10022[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m264s[0m 26ms/step - MSE: 59089.7773 - accuracy: 0.5743 - loss: 59092.7148
Epoch 2/5
[1m10022/10022[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m271s[0m 27ms/step - MSE: 1497.5549 - accuracy: 0.6428 - loss: 1502.5139
Epoch 3/5
[1m10022/10022[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m287s[0m 29ms/step - MSE: 1547.7394 - accuracy: 0.6487 - loss: 1553.9034
Epoch 4/5
[1m10022/10022[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m310s[0m 31ms/step - MSE: 1463.5544 - accuracy: 0.6487 - loss: 1469.0878
Epoch 5/5
[1m10022/10022[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m339s[0m 34ms/step - MSE: 1462.3384 - accuracy: 0.6557 - loss: 1467.4646


# V. Next Steps

1. Experiment with various model architectures and frameworks
   1. LSTM
   2. GRU
   3. Transformer
   4. Encoder-Decoder
2. Hyperparameter optimization
   1. Loss function
   2. Optimizer
   3. Regularization
   4. Weight normalization
   5. Model architectures
3. Dataset preparation
   1. Normalization
   2. Revisit feature selection
   3. Look into time-series methods (`tf.keras.preprocessing.timeseries_dataset_from_array`)

- Use None for first input shape for variable length sequence model inputs?
- use preprocessing.pad_sequences
- https://chatgpt.com/share/674f7ebb-778c-8011-a993-bd83320c73b8