# Section 6. Modeling with Deep Learning 

### CONTENTS

* <a href='05- DSC 2022 Modeling .ipynb#top'>**Section 5. Modeling**</a>
* <a href='06- DSC 2022 Modeling with deep learning.ipynb#top'>**Section 6 - Modeling with deep learning**</a>
  * [1. Model configuration](#configure)
  * [2. Model training](#train)
  * [3. Model inference](#inference)
* <a href='07- DSC 2022 Submission #top'>**Section 7 - Submission**</a>

In [66]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score
from feature_engineering import feature_engineering

In [67]:
cmg = pd.read_excel('cmg_final.xlsx', index_col = 'offeringId')
X_train, X_test, y_train, y_test = feature_engineering(cmg)

<a id='configure'></a>
# 1. Model configuration

The first step is to configure the model, here we are implementing seq2seq model with teacher enforcing design.

In [68]:
import keras
from keras.models import Model
from keras.layers import Input, LSTM, Dense, TimeDistributed 

In [69]:
from IPython.display import Image
Image(url="fig/seq2seq.png", width=1000, height=618)

In [70]:
n_features = 1
latent_dimension = 32

# the encoder part 
encoder_inputs= Input(shape=(None, n_features), name = 'encoder_inputs')
encoder_lstm=LSTM(latent_dimension, return_state=True, name = 'encoder_lstm') # we only want the output from the last cell 
_, state_h, state_c = encoder_lstm(encoder_inputs)
encoder_states = [state_h, state_c]

# the decoder part 
decoder_inputs = Input(shape=(None, n_features), name='decoder_inputs')
decoder_lstm = LSTM(latent_dimension, return_sequences=True, return_state=True, name='decoder_lstm')
decoder_outputs, _, _ = decoder_lstm(decoder_inputs,initial_state=encoder_states)
decoder_dense = Dense(n_features, name='decoder_dense')
decoder_outputs = TimeDistributed(decoder_dense)(decoder_outputs)

# putting them together 
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()

2022-06-26 23:23:19.555425: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 encoder_inputs (InputLayer)    [(None, None, 1)]    0           []                               
                                                                                                  
 decoder_inputs (InputLayer)    [(None, None, 1)]    0           []                               
                                                                                                  
 encoder_lstm (LSTM)            [(None, 32),         4352        ['encoder_inputs[0][0]']         
                                 (None, 32),                                                      
                                 (None, 32)]                                                      
                                                                                              

<a id='train'></a>
## 2. Model training 

Now, after configuring our network, we shall go ahead and train our model. LSTM expects input data to be of 3D shape (# observation, timesteps, features). In our case, the number of timesteps is 15 since we are using data from 15 days prior to deal announcement; and the number of features if 1 since we are using just the stock price feature. Of course we could include more features, if they are not time series, then we could just replicate its values for # time steps. For the sake of a easy tutorial, we shall only work with the stock price feature. 

In [71]:
X_train_lstm = np.array(pd.concat([X_train.filter(like = 'Pre_'), X_train[['offeringPrice']]], axis = 1)).reshape(-1, 16, 1)
y_train_input = pd.concat([X_train['offeringPrice'], y_train.drop(columns = ['Post_180SharePrice'])], axis = 1)
y_train_target = y_train.copy(deep = True)
y_train_input_lstm = np.array(y_train_input).reshape(-1, 5, 1)
y_train_target_lstm = np.array(y_train_target).reshape(-1, 5, 1)

In [74]:
model.compile(optimizer = keras.optimizers.Adam(), loss = keras.losses.MeanAbsoluteError())
model.fit([X_train_lstm, y_train_input_lstm], y_train_target_lstm, epochs = 500, verbose = 0)

<keras.callbacks.History at 0x7f82ed49caf0>

In [10]:
print(mean_absolute_error(y_train_target, model.predict([X_train_lstm, y_train_input_lstm]).reshape(-1, 5)))
print(r2_score(y_train_target, model.predict([X_train_lstm, y_train_input_lstm]).reshape(-1, 5)))

4.78148773045604
0.5911242724914765


<a id='inference'></a>
## 3. Model Inference

Now, with a trained model, our next step is to use the model for inference. However, recall that during the training phase, we fed in true observations into the decoder part. However, in the inference phase, we no longer have the true observations. The solution to that is to predict output one at a time.
The inputs that the decoder takes are hidden state & cell state from the last lstm cell in the encoder, as well as $X_(-1)$

1) Encode the input sentence and retrieve the initial decoder state  
2) Run one step of the decoder with this initial state and a "start of sequence" token as target. The output will be the next target character.  
3) Append the target character predicted and repeat.

In [11]:
# the decoder for inference 
encoder_model = Model(encoder_inputs, encoder_states)

decoder_state_input_h = Input(shape=(latent_dimension,), name = 'h state')
decoder_state_input_c = Input(shape=(latent_dimension,), name = 'c state')
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]

decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = TimeDistributed(decoder_dense)(decoder_outputs)

decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs] + decoder_states)

decoder_model.summary()

Model: "model_2"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 decoder_inputs (InputLayer)    [(None, None, 1)]    0           []                               
                                                                                                  
 h state (InputLayer)           [(None, 32)]         0           []                               
                                                                                                  
 c state (InputLayer)           [(None, 32)]         0           []                               
                                                                                                  
 decoder_lstm (LSTM)            [(None, None, 32),   4352        ['decoder_inputs[0][0]',         
                                 (None, 32),                      'h state[0][0]',          

In [12]:
def decode_sequence(input_seq):
    states_values = encoder_model.predict(input_seq.reshape(-1, 15, 1))
    decoder_inputs = input_seq[-1].reshape(1,1,1)
    result = []
    
    stop_condition = False
    while len(result) < 5:
        output, h, c = decoder_model.predict([decoder_inputs] + states_values)
        result.append(output.reshape(-1))
        decoder_inputs = output.reshape(1, 1,1)
        states_values = [h, c]
    return np.array(result).reshape(-1)

In [13]:
results_train = []
for i in range(X_train_lstm.shape[0]):
    results_train.append(decode_sequence(X_train_lstm[i]))
print(r2_score(y_train, results_train))
print(mean_absolute_error(y_train, results_train))



































































































































































































0.3469905050009358
8.205962701581416


In [14]:
X_test_lstm = np.array(X_test.iloc[:, 2:17]).reshape(-1, 15, 1)
results = []
for i in range(X_test_lstm.shape[0]):
    results.append(decode_sequence(X_test_lstm[i]))
print(r2_score(y_test, results))
print(mean_absolute_error(y_test, results))

































































































































0.31761195500106776
10.300143398341358


In the example given above, we showcased a very basic seq2seq model. There are quite a lot aspects left for you to improve: 

1. Model configuration
2. Training 
3. Include non time series features 

In [109]:
def create_multivariate(X):
    # this will be of shape (# observations, # timestep, 1)
    result = np.empty((0, 16, 21), float)
    time_seq = np.array(pd.concat([X.filter(like = 'Pre_'), X[['offeringPrice']]], axis = 1)).reshape(-1, 16, 1)
    # this will be of shape (# observations, # non time series features )
    non_time = np.array(X.drop(list(X.filter(like = 'Price')), axis =1))
    for i in range(X.shape[0]):
        new = np.hstack((time_seq[i], np.tile(non_time[i], (16,1)))).reshape(-1, 16, 21)
        result = np.append(result, new, axis = 0)
    return result

In [110]:
create_multivariate(X_train).shape

(5093, 16, 21)