# Section 6. Modeling with Deep Learning 

### CONTENTS

* <a href='05- DSC 2022 Modeling .ipynb#top'>**Section 5. Modeling**</a>
* <a href='06- DSC 2022 Modeling with deep learning.ipynb#top'>**Section 6 - Modeling with deep learning**</a>
  * [1. Model configuration](#configure)
  * [2. Model training](#train)
  * [3. Model inference](#inference)
* <a href='07- DSC 2022 Submission #top'>**Section 7 - Submission**</a>

In [61]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from feature_engineering import feature_engineering
from evaluation import evaluation

In [2]:
cmg = pd.read_excel('cmg.xlsx', index_col = 'offeringId')
X_train, X_test, y_train, y_test = feature_engineering(cmg)

In [44]:
def eval(true, pred):
    
    # change format of input if necessary 
    if isinstance(true, pd.DataFrame) or isinstance(true, list): 
        true = np.array(true)
    if isinstance(pred, pd.DataFrame) or isinstance(pred, list): 
        pred = np.array(pred)
        
    # mae 
    mae = mean_absolute_error(true, pred)
    
    # direction
    n = len(true)
    true, pred = (true >= 0), (pred >= 0)
    score = 0 
    for i in range(n):
        score += accuracy_score(true[i], pred[i])
    acc = score/n
    
    return {'MAE':mae, 'ACC': acc}

<a id='configure'></a>
# 1. Model configuration

The first step is to configure the model, here we are implementing seq2seq model with teacher enforcing design.

In [5]:
import keras
from keras.models import Model
from keras.layers import Input, LSTM, Dense, TimeDistributed 

<img src="fig/seq2seq.png" width=600 height=400 />

In [7]:
n_features = 1
latent_dimension = 32

# the encoder part 
encoder_inputs= Input(shape=(None, n_features), name = 'encoder_inputs')
encoder_lstm=LSTM(latent_dimension, return_state=True, name = 'encoder_lstm') # we only want the output from the last cell 
_, state_h, state_c = encoder_lstm(encoder_inputs)
encoder_states = [state_h, state_c]

# the decoder part 
decoder_inputs = Input(shape=(None, n_features), name='decoder_inputs')
decoder_lstm = LSTM(latent_dimension, return_sequences=True, return_state=True, name='decoder_lstm')
decoder_outputs, _, _ = decoder_lstm(decoder_inputs,initial_state=encoder_states)
decoder_dense = Dense(n_features, name='decoder_dense')
decoder_outputs = TimeDistributed(decoder_dense)(decoder_outputs)

# putting them together 
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()

2022-06-29 19:51:36.777266: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 encoder_inputs (InputLayer)    [(None, None, 1)]    0           []                               
                                                                                                  
 decoder_inputs (InputLayer)    [(None, None, 1)]    0           []                               
                                                                                                  
 encoder_lstm (LSTM)            [(None, 32),         4352        ['encoder_inputs[0][0]']         
                                 (None, 32),                                                      
                                 (None, 32)]                                                      
                                                                                              

<a id='train'></a>
## 2. Model training 

Now, after configuring our network, we shall go ahead and train our model. LSTM expects input data to be of 3D shape (# observation, timesteps, features). In our case, the number of timesteps is 15 since we are using data from 15 days prior to deal announcement; and the number of features if 1 since we are using just the stock price feature. Of course we could include more features, if they are not time series, then we could just replicate its values for # time steps. For the sake of a easy tutorial, we shall only work with the stock price feature. 

In [54]:
pre_timesteps = 15
post_timesteps = 5

In [13]:
# prepare data for the encoder part 
X_train_lstm = X_train.filter(like = 'pre').to_numpy().reshape(-1, pre_timesteps, 1)

# prepare data for the decoder part 
y_train_input_lstm = pd.concat([X_train.filter(like = 'pre1_'), y_train.drop(columns = ['post180_Price_Normalized'])], axis = 1)\
                    .to_numpy()\
                    .reshape(-1, post_timesteps, 1)
y_train_target_lstm = y_train.copy(deep = True)
                        .to_numpy()\
                        .reshape(-1, post_timesteps, 1)

In [14]:
model.compile(optimizer = keras.optimizers.Adam(), loss = keras.losses.MeanAbsoluteError())
model.fit([X_train_lstm, y_train_input_lstm], y_train_target_lstm, epochs = 500, verbose = 0)

<keras.callbacks.History at 0x7fbda4a99130>

<a id='inference'></a>
## 3. Model Inference

Now, with a trained model, our next step is to use the model for inference. However, recall that during the training phase, we fed in true observations into the decoder part. However, in the inference phase, we no longer have the true observations. The solution to that is to predict output one at a time.
The inputs that the decoder takes are hidden state & cell state from the last lstm cell in the encoder, as well as $X_(-1)$

1) Encode the input sentence and retrieve the initial decoder state  
2) Run one step of the decoder with this initial state and a "start of sequence" token as target. The output will be the next target character.  
3) Append the target character predicted and repeat.

In [26]:
# the decoder for inference 
encoder_model = Model(encoder_inputs, encoder_states)

decoder_state_input_h = Input(shape=(latent_dimension,), name = 'h state')
decoder_state_input_c = Input(shape=(latent_dimension,), name = 'c state')
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]

decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = TimeDistributed(decoder_dense)(decoder_outputs)

decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs] + decoder_states)

decoder_model.summary()

Model: "model_6"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 decoder_inputs (InputLayer)    [(None, None, 1)]    0           []                               
                                                                                                  
 h state (InputLayer)           [(None, 32)]         0           []                               
                                                                                                  
 c state (InputLayer)           [(None, 32)]         0           []                               
                                                                                                  
 decoder_lstm (LSTM)            [(None, None, 32),   4352        ['decoder_inputs[0][0]',         
                                 (None, 32),                      'h state[0][0]',          

In [57]:
def decode_sequence(input_seq):
    states_values = encoder_model.predict(input_seq.reshape(-1, pre_timesteps, 1))
    decoder_inputs = input_seq[-1].reshape(1,1,1)
    result = []
    
    stop_condition = False
    while len(result) < post_timesteps:
        output, h, c = decoder_model.predict([decoder_inputs] + states_values, verbose = 0)
        result.append(output.reshape(-1))
        decoder_inputs = output.reshape(1, 1, 1)
        states_values = [h, c]
    return np.array(result).reshape(-1)

In [60]:
train_results = []
for i in range(X_train_lstm.shape[0]):
    if i % 500 == 0:
        print(i)
    train_results.append(decode_sequence(X_train_lstm[i]))
print('Seq2seq on train set:\n', evaluation(y_train, train_results))

Seq2seq on train set:
 {'MAE': 0.5060702792890398, 'ACC': 0.6481445120754007}


In [46]:
X_test_lstm = np.array(X_test.filter(like = 'pre')).reshape(-1, pre_timesteps, 1)
test_results = []
for i in range(X_test_lstm.shape[0]):
    test_results.append(decode_sequence(X_test_lstm[i]))
print('Seq2seq on test set:\n', evaluation(y_test, test_results))

Seq2seq on test set:
 {'MAE': 0.5903408798393653, 'ACC': 0.6249705535924588}


In the example given above, we showcased a very basic seq2seq model. There are quite a lot aspects left for you to improve: 

1. Model configuration
2. Training 
3. Include non time series features 