# Introduction to Bidirectional LSTM

Bidirectional Long Short-Term Memory (BiLSTM) networks are an extension of traditional Long Short-Term Memory (LSTM) networks. LSTMs are a type of recurrent neural network that is capable of learning long-term dependencies in data which is crucial for many tasks. However, standard LSTMs have a limitation of processing data in a single direction, from past to future, which may not always capture all the available information in the data. This is where Bidirectional LSTMs come into play. 

A Bidirectional LSTM consists of two LSTMs: one processing the data from past to future (as a standard LSTM) and another one processing the data from future to past. By doing this, BiLSTMs are able to preserve information from both past and future, providing a richer representation of data.

## Pros and Cons of Bidirectional LSTM compared to standard LSTM

### Pros:
1. **Better Performance**: By accessing long-range information in both directions, BiLSTMs often outperform standard LSTMs, especially in tasks that benefit from context around each data point.
2. **Richer Representations**: BiLSTMs can generate richer representations by capturing relationships in the data that may be missed by a unidirectional LSTM.
3. **Improved Sequence Labelling**: In sequence labeling tasks, BiLSTMs have shown to perform significantly better as they have access to future context as well as past context.

### Cons:
1. **Increased Computational Complexity**: Due to the bidirectional nature, the training and inference times are roughly doubled compared to a standard LSTM.
2. **Memory Usage**: BiLSTMs require more memory as they need to store intermediate states for forward and backward passes.
3. **Potential Overfitting**: With more parameters to learn, BiLSTMs might be prone to overfitting especially on small datasets.

In conclusion, BiLSTMs provide a powerful tool for tasks that can benefit from understanding data in both temporal directions. However, they come at the cost of increased computational and memory requirements.


In [2]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from datetime import datetime
from datetime import timedelta
from tqdm import tqdm

2023-10-09 16:57:18.746932: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-10-09 16:57:18.773457: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-10-09 16:57:18.773486: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-10-09 16:57:18.773502: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-10-09 16:57:18.778361: I tensorflow/core/platform/cpu_feature_g

In [3]:
df = pd.read_csv('../ml-models/dataset/SPY_2020-01-01_2022-01-01.csv')
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2020-01-02,323.540009,324.890015,322.529999,324.869995,306.295227,59151200
1,2020-01-03,321.160004,323.640015,321.100006,322.410004,303.975922,77709700
2,2020-01-06,320.48999,323.730011,320.359985,323.640015,305.135651,55653900
3,2020-01-07,323.019989,323.540009,322.23999,322.730011,304.277649,40496400
4,2020-01-08,322.940002,325.779999,322.670013,324.450012,305.899261,68296000


In [4]:
minmax = MinMaxScaler().fit(df.iloc[:, 4:5].astype('float32')) # Close index
df_log = minmax.transform(df.iloc[:, 4:5].astype('float32')) # Close index
df_log = pd.DataFrame(df_log)
df_log.head()

  array.dtypes.apply(is_sparse).any()):
  array.dtypes.apply(is_sparse).any()):


Unnamed: 0,0
0,0.400424
1,0.390759
2,0.395592
3,0.392017
4,0.398774


## Split and train the dataset
The dataset will be splitted into training and test:
1. Train dataset is derived from the starting timestamp until the last 30 days
2. Test dataste is derived from the last 30 days of trading

I will let the model do forecasting based on last 30 days, and we will repeat the experiment for 10 times.

Try changing the tuning parameters!

In [5]:
test_size = 15
number_of_simulations = 1

df_train = df_log.iloc[:-test_size]
df_test = df_log.iloc[-test_size:]

df.shape, df_train.shape, df_test.shape

((505, 7), (490, 1), (15, 1))

In [11]:
# HYPERPARAMETERS
num_layers = 1
size_layer = 128
timestamp = 5
epoch = 300
dropout_rate = 0.8
future_day = test_size
learning_rate = 0.01



In [15]:
class BiLSTM_Model(tf.keras.Model):
    def __init__(self, num_layers, size_layer, output_size, dropout_rate):
        super(BiLSTM_Model, self).__init__()
        self.num_layers = num_layers
        self.size_layer = size_layer
        self.output_size = output_size
        self.dropout_rate = dropout_rate

        self.bilstm_layers = [
            tf.keras.layers.Bidirectional(
                tf.keras.layers.LSTM(self.size_layer, return_sequences=True),
                merge_mode='concat'
            )
            for _ in range(self.num_layers)
        ]
        self.dense = tf.keras.layers.Dense(self.output_size)

    def call(self, data, states=None):
        x = data
        for layer in self.bilstm_layers:
            x = layer(x)
        output = self.dense(x)
        return output, states

def forecast(df, df_log, df_train, learning_rate, num_layers, size_layer, dropout_rate, epoch, timestamp, test_size):
    def anchor(signal, weight):
        buffer = []
        last = signal[0]
        for i in signal:
            smoothed_val = last * weight + (1 - weight) * i
            buffer.append(smoothed_val)
            last = smoothed_val
        return buffer
    
    modelnn = BiLSTM_Model(num_layers, size_layer, df_log.shape[1], dropout_rate)
    optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
    
    date_ori = pd.to_datetime(df.iloc[:, 0]).tolist()

    for i in tqdm(range(epoch), desc='train loop'):
        total_loss = []
        for k in range(0, (df_train.shape[0] // timestamp) * timestamp, timestamp):
            index = min(k + timestamp, df_train.shape[0])
            batch_x = np.expand_dims(df_train.iloc[k:index, :].values, axis=0)
            batch_y = df_train.iloc[k + 1:index + 1, :].values

            with tf.GradientTape() as tape:
                logits, _ = modelnn(batch_x, states=None)
                loss = tf.reduce_mean(tf.square(batch_y - logits))
                total_loss.append(loss.numpy())
            
            grads = tape.gradient(loss, modelnn.trainable_variables)
            optimizer.apply_gradients(zip(grads, modelnn.trainable_variables))
        
        print(f'Epoch {i}, Loss: {np.mean(total_loss)}')
    
    future_day = test_size

    output_predict = np.zeros((df_train.shape[0] + future_day, df_train.shape[1]))
    output_predict[0] = df_train.iloc[0]
    upper_b = (df_train.shape[0] // timestamp) * timestamp

    for k in range(0, (df_train.shape[0] // timestamp) * timestamp, timestamp):
        out_logits, _ = modelnn(
            np.expand_dims(df_train.iloc[k:k + timestamp], axis=0),
            states=None
        )
        output_predict[k + 1:k + timestamp + 1] = out_logits

    if upper_b != df_train.shape[0]:
        out_logits, _ = modelnn(
            np.expand_dims(df_train.iloc[upper_b:], axis = 0),
            states=None
        )
        output_predict[upper_b + 1 : df_train.shape[0] + 1] = out_logits
        future_day -= 1
        date_ori.append(date_ori[-1] + timedelta(days = 1))

    for i in range(future_day):
        o = output_predict[-future_day - timestamp + i:-future_day + i]
        out_logits, _ = modelnn(
            np.expand_dims(o, axis = 0),
            states=None
        )
        output_predict[-future_day + i] = out_logits[-1]
        date_ori.append(date_ori[-1] + timedelta(days = 1))

    # Assuming you have a MinMaxScaler object named minmax to reverse the scaling
    output_predict = minmax.inverse_transform(output_predict)
    
    # Assuming anchor function is defined elsewhere to process the output
    deep_future = anchor(output_predict[:, 0], 0.3)
    
    return deep_future[-test_size:]

In [16]:
deep_future = forecast(df, df_log, df_train, learning_rate, num_layers, size_layer, dropout_rate, epoch, timestamp, test_size)

train loop:   0%|          | 0/300 [00:00<?, ?it/s]2023-10-09 17:12:12.114276: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:521] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Searched for CUDA in the following directories:
  ./cuda_sdk_lib
  /usr/local/cuda-11.8
  /usr/local/cuda
  /home/andrea/.local/lib/python3.10/site-packages/tensorflow/python/platform/../../../nvidia/cuda_nvcc
  /home/andrea/.local/lib/python3.10/site-packages/tensorflow/python/platform/../../../../nvidia/cuda_nvcc
  .
You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2023-10-09 17:12:12.121003: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023

UnknownError: Exception encountered when calling layer 'forward_lstm_1' (type LSTM).

{{function_node __wrapped__CudnnRNN_device_/job:localhost/replica:0/task:0/device:GPU:0}} Fail to find the dnn implementation. [Op:CudnnRNN]

Call arguments received by layer 'forward_lstm_1' (type LSTM):
  • inputs=tf.Tensor(shape=(1, 5, 1), dtype=float32)
  • mask=None
  • training=None
  • initial_state=None