# Section 6. Modeling with Deep Learning 

In this section, we will introduce how to deploy a seq2seq model in leveraging the time-series features in the data set. For simplicity of demonstration, we are only using the normalized price features in the data set. 

### CONTENTS

* <a href='00 - DSC 2022 Welcome and Logistics.ipynb#top'>**Section 0. Welcome and Logistics**</a> 
* <a href='01 - DSC 2022 Problem Definition.ipynb#top'>**Section 1. Problem Definition**</a> 
* <a href='02 - DSC 2022 Exploratory Data Analysis.ipynb#top'>**Section 2. Exploratory Data Analysis**</a> 
* <a href='03 - DSC 2022 Hypothesis testing.ipynb#top'>**Section 3. Hypothesis Testing**</a> 
* <a href='04 - DSC 2022 Feature Engineering.ipynb#top'>**Section 4. Feature Engineering**</a> 
* <a href='05 - DSC 2022 Modeling.ipynb#top'>**Section 5. Modeling**</a>
* <a href='06 - DSC 2022 Modeling with Deep Learning.ipynb#top'>**Section 6. Modeling with Deep Learning**</a>
  * [1. Model configuration](#configure)
  * [2. Model training](#train)
  * [3. Model inference](#inference)
* <a href='07 - DSC 2022 Submission.ipynb#top'>**Section 7. Submission**</a>

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from feature_engineering import feature_engineering
from evaluation import evaluation
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1'

In [2]:
cmg = pd.read_excel('cmg.xlsx', index_col = 'offeringId')
X_train, X_test, y_train, y_test = feature_engineering(cmg, test_frac = 0.3, normalize = True, random_state = 42)

<a id='configure'></a>
# 1. Model configuration

One problem with our previous machine learning models is that we disregard the time-sereis characterstics of features in the data set. The predictors (pre15_Price_Normalized, pre14_PriceNormalized, ..., pre1_PriceNormalized) form a time series sequence; the outcomes that we try to predict (post1_Price_Normalized, post7_Price_Normalized, ..., post180_Price_Normalized) also form a sequence. 

Seq2Seq is a type of Encoder-Decoder model that can convert sequences from one domain to sequences of another domain using recurrent neural network (__RNN__). And Long short-term memory (__LSTM__) is an artificial RNN architecture used in the field of deep learning. The hidden layer output of LSTM includes the hidden state(h) and the memory cell(c). You can find more detailed introduction to RNN <a href='https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks'>**here**</a>. In our problem setting, the input sequence would be normalizd pre-deal prices; and the output sequence is the post-deal returns.

Our first step is to configure the model, here we are implementing **seq2seq model with teacher enforcing design**. Teacher forcing is a strategy for training recurrent neural networks that uses ground truth as input, instead of model output from a prior time step as an input. The figure below is a illustration for our model design. 

Essentially, our model consists of two parts: __encoder__ and __decoder__.  

For the encoder part, the pre-deal normalized price of each time step is input into the encoder LSTM cell together with previous cell state c and hidden state h, the process repeats until the last cell state c and hidden state h are generated.

For the decoder part, we use the last cell state c and hidden state h from the encoder as the initial states of the decoder LSTM cell. Since we are implementing teacher forcing, the decoder part takes ground truth stock returns as inputs. The decoder outputs hidden state for all the 5 time steps(1, 7, 30, 90, 180), and these hidden states are connected to a dense layer to output the final result.

<img src="fig/seq2seq.png" width=600 height=400 />

In [3]:
import keras
from keras.models import Model
from keras.layers import Input, LSTM, Dense, TimeDistributed 

In [4]:
n_features = 1
latent_dimension = 32

# the encoder part 
encoder_inputs= Input(shape=(None, n_features), name = 'encoder_inputs')
encoder_lstm=LSTM(latent_dimension, return_state=True, name = 'encoder_lstm') # we only want the output from the last cell 
_, state_h, state_c = encoder_lstm(encoder_inputs)
encoder_states = [state_h, state_c]

# the decoder part 
decoder_inputs = Input(shape=(None, n_features), name='decoder_inputs')
decoder_lstm = LSTM(latent_dimension, return_sequences=True, return_state=True, name='decoder_lstm')
decoder_outputs, _, _ = decoder_lstm(decoder_inputs,initial_state=encoder_states)
decoder_dense = Dense(n_features, name='decoder_dense')
decoder_outputs = TimeDistributed(decoder_dense)(decoder_outputs)

# putting them together 
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 encoder_inputs (InputLayer)    [(None, None, 1)]    0           []                               
                                                                                                  
 decoder_inputs (InputLayer)    [(None, None, 1)]    0           []                               
                                                                                                  
 encoder_lstm (LSTM)            [(None, 32),         4352        ['encoder_inputs[0][0]']         
                                 (None, 32),                                                      
                                 (None, 32)]                                                      
                                                                                              

<a id='train'></a>
## 2. Model training 

Now, after configuring our network, we can now go ahead and train our model. LSTM expects input data to be of 3D shape (# observation, timesteps, features). Therefore, we would first need to transform our data into the desired shape. 

In [5]:
pre_timesteps = 15
post_timesteps = 5

In [6]:
# prepare data for the encoder part 
X_train_lstm = X_train.filter(like = 'pre').to_numpy().reshape(-1, pre_timesteps, 1)

# prepare data for the decoder part 
y_train_input_lstm = pd.concat([X_train.filter(like = 'pre1_'), y_train.drop(columns = ['post180_Price_Normalized'])], axis = 1)\
                    .to_numpy()\
                    .reshape(-1, post_timesteps, 1)
y_train_target_lstm = y_train.copy(deep = True)\
                        .to_numpy()\
                        .reshape(-1, post_timesteps, 1)

In [7]:
%%time
model.compile(optimizer = keras.optimizers.Adam(), loss = keras.losses.MeanSquaredError())
model.fit([X_train_lstm, y_train_input_lstm], y_train_target_lstm, epochs = 500, verbose = 0)

CPU times: user 22min 51s, sys: 7min 16s, total: 30min 7s
Wall time: 10min 52s


<keras.callbacks.History at 0x7fc2beaaf400>

<a id='inference'></a>
## 3. Model Inference

Now, with a trained model, our next step is to use the model for inference. However, recall that during the training phase, we fed in ground truth observations into the decoder part. However, in the inference phase, we no longer have the ground truth observations for the test set. The solution would be to predict the output sequence one at a time, that is model output from a prior time step would be an input to the current time step. 

In [8]:
# the decoder for inference 
encoder_model = Model(encoder_inputs, encoder_states)

decoder_state_input_h = Input(shape=(latent_dimension,), name = 'h state')
decoder_state_input_c = Input(shape=(latent_dimension,), name = 'c state')
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]

decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = TimeDistributed(decoder_dense)(decoder_outputs)

decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs] + decoder_states)

decoder_model.summary()

Model: "model_2"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 decoder_inputs (InputLayer)    [(None, None, 1)]    0           []                               
                                                                                                  
 h state (InputLayer)           [(None, 32)]         0           []                               
                                                                                                  
 c state (InputLayer)           [(None, 32)]         0           []                               
                                                                                                  
 decoder_lstm (LSTM)            [(None, None, 32),   4352        ['decoder_inputs[0][0]',         
                                 (None, 32),                      'h state[0][0]',          

In [9]:
def decode_sequence(input_seq):
    states_values = encoder_model.predict(input_seq.reshape(-1, pre_timesteps, 1), verbose = 0)
    decoder_inputs = input_seq[-1].reshape(1,1,1)
    result = []
    
    stop_condition = False
    while len(result) < post_timesteps:
        output, h, c = decoder_model.predict([decoder_inputs] + states_values, verbose = 0)
        result.append(output.reshape(-1))
        decoder_inputs = output.reshape(1, 1, 1)
        states_values = [h, c]
    return np.array(result).reshape(-1)

In [10]:
%%time
train_results = []
for i in range(X_train_lstm.shape[0]):
    train_results.append(decode_sequence(X_train_lstm[i]))
print('Seq2seq on train set:\n', evaluation(y_train, train_results))

Seq2seq on train set:
 {'MSE': 9.335408849757176, 'ACC': 0.6357418800972996}
CPU times: user 26min 2s, sys: 1min 4s, total: 27min 6s
Wall time: 26min 28s


In [11]:
%%time
X_test_lstm = np.array(X_test.filter(like = 'pre')).reshape(-1, pre_timesteps, 1)
test_results = []
for i in range(X_test_lstm.shape[0]):
    test_results.append(decode_sequence(X_test_lstm[i]))
print('Seq2seq on test set:\n', evaluation(y_test, test_results))

Seq2seq on test set:
 {'MSE': 23.133951086840046, 'ACC': 0.6321094793057369}
CPU times: user 10min 52s, sys: 28 s, total: 11min 20s
Wall time: 10min 58s


In the example given above, we showcased a very basic seq2seq design. Looks like we overfitted the train data. It is now your turn to design, train and improve your own deep learning model!