# Applications d'algo Deep Learning (NN) adaptés aux Time Series

Il existe plusieurs types de modèles adaptés aux Time Series. Leur particularité est de ne pas utiliser simplement les données comme des évenements indépendants mais de conserver une "mémoire" des évenements précédents pour mieux analyser un instant T.

Ceci est utile notamment pour trouver des pattern de tendance à terme. Voici les principaux modèles :
- RNN  : Recurrent Neuronal Network
- LSTM : Long Short-Term Memory
- GRU  : Gated Recurrent Unit

## Constitution des datasets

On va constituer 3 datasets différents avec une profondeur différente (nombre de variables) afin de pouvoir comparer notamment l'impact des indicateurs sur la qualité du résultat.

In [1]:
# pip install psycopg2-binary

In [2]:
import time
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import psycopg2
from sqlalchemy import create_engine

In [3]:
import warnings
warnings.filterwarnings('ignore')

In [4]:
from sklearn.model_selection import train_test_split, ShuffleSplit
from sklearn.metrics import *
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Convolution1D, MaxPooling1D, Flatten
from tensorflow.keras.layers import LSTM, GRU, TimeDistributed, Conv1D, ConvLSTM2D, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold


### Datasets : EURUSD H1

In [5]:
conn_string = 'postgresql://postgres:Juw51000@localhost/tradingIA'

db = create_engine(conn_string)
conn = db.connect()

In [6]:
df = pd.read_sql("select * from fex_eurusd_h1", conn);
df.head()

Unnamed: 0,epoch,mopen,mclose,mhigh,mlow,mvolume,mspread,ima,ima2,ima4,...,istos4,imom,imom2,imom4,rProfitBuy,rSwapBuy,rProfitBTrigger,rProfitSell,rSwapSell,rProfitSTrigger
0,946861200,1.0073,1.0128,1.0132,1.0073,194,50,1.008242,1.007963,1.006779,...,70.12987,100.536033,100.615935,100.565982,3.64,0.0,TO,-3.07,0.0,SL
1,946864800,1.0129,1.0137,1.0141,1.012,113,50,1.008733,1.008175,1.006973,...,72.331461,100.67534,100.815515,100.495688,2.56,0.0,TO,-3.15,0.0,SL
2,946868400,1.014,1.0171,1.0173,1.0134,149,50,1.009517,1.008588,1.007215,...,76.041667,101.073239,101.002979,100.902778,-0.1,0.0,TO,-0.88,0.0,TO
3,946872000,1.017,1.0175,1.019,1.017,214,50,1.01035,1.008958,1.007462,...,78.688525,100.87241,100.962493,100.882411,-2.36,0.0,TO,1.38,0.0,TO
4,946875600,1.0173,1.0167,1.0177,1.0164,162,50,1.010975,1.009296,1.007677,...,78.51153,100.703249,100.893123,100.813089,-2.95,0.0,SL,5.74,0.0,TP


In [7]:
df['targetBuy'] = df['rProfitBuy'] + df['rSwapBuy']
df['targetSell'] = df['rProfitSell'] + df['rSwapSell']

In [8]:
dfNotNa = df[df['rProfitBTrigger'].notna()]
dfCleanRow = dfNotNa[dfNotNa['epoch'] < 1690484400]
dfClean = dfCleanRow.drop(['rProfitBuy', 'rSwapBuy', 'rProfitSell', 'rSwapSell', 'rProfitSTrigger', 'rProfitBTrigger'], axis=1)
dfClean.shape

(145559, 27)

### Transposition en problème de classification binaire

On peut simplifier la question de base qui est de savoir quel est le moment du profit (Buy/Sell) en question binaire, à savoir est-ce que le trade à un instant T (Buy et Sell) entrainera une perte (0) ou un gain (1) ?

In [9]:
dfCleanBin = dfClean

In [10]:
dfCleanBin['targetProfitBuy'] = dfCleanBin['targetBuy'].apply(lambda x: 1 if x > 0 else 0)
dfCleanBin['targetProfitSell'] = dfCleanBin['targetSell'].apply(lambda x: 1 if x > 0 else 0)
dfCleanBin.shape

(145559, 29)

In [11]:
sum(dfCleanBin['targetBuy'])

-33065.310000000005

In [12]:
sum(dfCleanBin['targetProfitBuy']) / dfCleanBin.shape[0]

0.37148510226093884

In [13]:
sum(dfCleanBin['targetSell'])

-32935.02000000026

In [14]:
sum(dfCleanBin['targetProfitSell']) / dfCleanBin.shape[0]

0.37439801042876086

Qu'il s'agisse des Profits Buy ou Sell on est à environ 37% de target Profit pour 63% de perte. Les classes sont donc plutôt équilibrées.

### Glissement des valeurs Target (prévision)

Pour la prévision les valeurs à prédire (profit du trade) sont les valeurs qui concernent la periode à venir du trade (T+1) en fonction des features observées sur la periode actuelle (T). On doit donc glisser les valeurs de Target de T+1 vers T.

In [15]:
dfCleanBin['targetProfitBuy'] = dfCleanBin['targetProfitBuy'].shift(-1)
dfCleanBin['targetProfitSell'] = dfCleanBin['targetProfitSell'].shift(-1)
dfCleanBin['targetSell'] = dfCleanBin['targetSell'].shift(-1)
dfCleanBin['targetBuy'] = dfCleanBin['targetBuy'].shift(-1)

In [16]:
dfCleanBin = dfCleanBin[dfCleanBin['targetProfitSell'].notna()]

In [17]:
dfCleanBin.set_index('epoch', inplace=True)

#### Dataset basis
Ce dataset ne va comporfter que les données brutes (en plus des target) sans aucun indicateur technique

In [18]:
dfBasisB = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitBuy']]
dfBasisS = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitSell']]

#### Dataset intermediate low
Ce dataset, va comporfter les données brutes (en plus des target) ainsi que la version des indicateurs sur la plus courte periode de calcul

In [19]:
dfIntLowB = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitBuy', 
                   'ima', 'iatr', 'irsi', 'imacd', 'istos', 'imom']]
dfIntLowS = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitSell', 
                   'ima', 'iatr', 'irsi', 'imacd', 'istos', 'imom']]

#### Dataset intermediate Medium
Ce dataset, va comporfter les données brutes (en plus des target) ainsi que la version des indicateurs sur la periode de calcul intermediaire

In [20]:
dfIntMedB = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitBuy', 
                   'ima2', 'iatr2', 'irsi2', 'imacd2', 'istos2', 'imom2']]
dfIntMedS = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitSell', 
                   'ima2', 'iatr2', 'irsi2', 'imacd2', 'istos2', 'imom2']]

#### Dataset intermediate High
Ce dataset, va comporfter les données brutes (en plus des target) ainsi que la version des indicateurs sur la plus longue periode de calcul

In [21]:
dfIntHigB = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitBuy', 
                   'ima4', 'iatr4', 'irsi4', 'imacd4', 'istos4', 'imom4']]
dfIntHigS = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitSell', 
                   'ima4', 'iatr4', 'irsi4', 'imacd4', 'istos4', 'imom4']]

#### Dataset Complet
Ce dataset, va comporfter les données brutes (en plus des target) ainsi tous les indicateurs sur toutes les periodes de calcul

In [22]:
dfFullB = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitBuy', 
                   'ima', 'iatr', 'irsi', 'imacd','ima2', 'iatr2', 'irsi2', 'imacd2','ima4', 'iatr4', 'irsi4', 'imacd4',
                   'istos', 'istos2', 'istos4', 'imom', 'imom2', 'imom4']]
dfFullS = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitSell', 
                   'ima', 'iatr', 'irsi', 'imacd','ima2', 'iatr2', 'irsi2', 'imacd2','ima4', 'iatr4', 'irsi4', 'imacd4',
                   'istos', 'istos2', 'istos4', 'imom', 'imom2', 'imom4']]

## Applications des Deep Learning Model

#### Utilisation du modele de base : dfBasisB

In [23]:
dfBasisB.shape

(145558, 7)

#### Definition des datsests de Features / Target

In [24]:
df = dfBasisB

In [25]:
dfTarget = df['targetProfitBuy']
dfFeatures = df.drop(columns=['targetProfitBuy'])

#### Separation du Dataset Train / Test

In [26]:
def getTrainTestDatasets(dfFeatures, dfTarget, testSize=.2):
    rs = ShuffleSplit(n_splits=1, test_size=testSize)
    train_index, test_index = next(rs.split(dfFeatures, dfTarget)) 
    dX_train, dX_test = dfFeatures.iloc[train_index], dfFeatures.iloc[test_index] 
    dy_train, dy_test = dfTarget.iloc[train_index], dfTarget.iloc[test_index]
    return dX_train, dX_test, dy_train, dy_test

Split into (Train + Valid) / Test datasets :

In [27]:
dfFeaturesT, dX_test, dfTargetT, dy_test = getTrainTestDatasets(dfFeatures, dfTarget, .2)

Split into Train / Valid datasets

In [28]:
dX_train, dX_val, dy_train, dy_val = getTrainTestDatasets(dfFeaturesT, dfTargetT, .1)

#### Normalisation des données

In [29]:
scaler = StandardScaler()
X_train = scaler.fit_transform(dX_train)
X_test = scaler.transform(dX_test)
X_val = scaler.transform(dX_val)

In [30]:
y_train = dy_train.to_numpy()
y_test = dy_test.to_numpy()
y_val = dy_val.to_numpy()

In [31]:
X_train.shape

(104801, 6)

#### Spécificité LSTM / GRU : Separation des données en sous-ensembles

Les LSTM travaillent par lots (sous-ensembles) qui déterminent pour une instance donné quelles sont les instances précédentes qui doivent lui être associées.

Dans le contexte du trading on va donner pour chaque extrait de données à un instant T un nombre n (paramètre) d'extraits qui le précédent directement dans le temps [T-1 .... T-n], et qui vont être utilisés par LSTM pour comprendre la donnée à l'instant T.

In [32]:
def spliSequencesWithSamples(xdata, ydata, lookback):
    X, y = list(), list()
    for i in range(len(xdata)):
        if (i>=lookback-1): # Rows with not enough prev values cannot be taken
            # gather input and output parts of the pattern
            seq_x, seq_y = xdata[i+1-lookback:i+1, :], ydata[i]
            X.append(seq_x)
            y.append(seq_y)  
    return(np.array(X), np.array(y))

## Calcul des scores et gains

In [33]:
def calculateRandomProfit(dfCleanRow, target='targetBuy'):
    profit = dfCleanRow[target].sum()
    profitPerTrade = profit / len(dfCleanRow)
    return profit, profitPerTrade

In [34]:
def calculateProfit(dfCleanRow, dX_test, yTestLbk, pred, lookback=100, specificity=.8, target='targetBuy'):
    [fpr, tpr, thr] = roc_curve(yTestLbk, pred, pos_label=1)
    idx = np.max(np.where((1-fpr) > specificity)) 
    seuil = thr[idx]  
    dfPred = pd.DataFrame(pred, columns = ['proba'])
    #Get rows index with positive proba (proba > seuil)
    xRows = dfPred[dfPred['proba']>seuil].index.to_numpy()
    #Get matching index (epoch timestamp) from dX_test => Periods with proba > seuil
    xEpochs = dX_test.iloc[lookback-1:,:].iloc[xRows].index.to_numpy()
    dfCleanEpochIdx = dfCleanRow.set_index('epoch')
    profit = dfCleanEpochIdx.loc[xEpochs][target].sum()
    profitPerTrade = profit / len(xRows)
    return profit, profitPerTrade

### Calcul des scores et gains (model 100 % aléatoire)

In [35]:
profitRandom, profitPerTradeRandom = calculateRandomProfit(dfCleanRow, target='targetBuy')

In [36]:
profitRandom

-33065.30999999999

In [37]:
profitPerTradeRandom

-0.2271608763456742

## GRU

Expected format : 3D tensor (batch_size, timesteps, channels)

In [38]:
lookback = 1 * 24     # Nb hours  (T-n) to look back for a time prediction (T)

In [39]:
xTrainLbk, yTrainLbk = spliSequencesWithSamples(X_train, y_train, lookback)
print(xTrainLbk.shape, yTrainLbk.shape)

(104778, 24, 6) (104778,)


In [40]:
xValLbk, yValLbk = spliSequencesWithSamples(X_val, y_val, lookback)
print(xValLbk.shape, yValLbk.shape)

(11622, 24, 6) (11622,)


In [41]:
xTrainLbkCNN = xTrainLbk

In [42]:
xTrainLbkCNN.shape

(104778, 24, 6)

In [43]:
xValLbkCNN = xValLbk

In [44]:
xValLbkCNN.shape

(11622, 24, 6)

### Training

Custom Metric functions :

In [45]:
def mape(y_true, y_pred):
    import keras.backend as K
    """
    Returns the mean absolute percentage error.
    For examples on losses see:
    https://github.com/keras-team/keras/blob/master/keras/losses.py
    """
    return (K.abs(y_true - y_pred) / K.abs(y_pred)) * 100

def smape(y_true, y_pred):
    import keras.backend as K
    """
    Returns the Symmetric mean absolute percentage error.
    For examples on losses see:
    https://github.com/keras-team/keras/blob/master/keras/losses.py
    """
    return (K.abs(y_pred - y_true) / ((K.abs(y_true) + K.abs(y_pred))))*100

In [46]:
def create_LSTM01(lookback, n_features):
    model = Sequential()
    # LSTM input : [timesteps, features] 
    model.add(LSTM(64, name = 'LSTM_1', return_sequences=True, input_shape=(lookback, n_features), activation='tanh'))
    model.add(Flatten(name = 'Flatten_1'))
    model.add(Dense(32, name = 'Dense_1', activation='relu'))
    model.add(Dense(8, name = 'Dense_2', activation='relu'))
    model.add(Dense(1, name = 'Dense_3', activation='sigmoid'))
    #model.compile(loss='mse', optimizer='adam', metrics=[mape, smape])
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy',mape, smape])
    return model

In [47]:
def create_LSTM02(lookback, n_features):
    model = Sequential()
    # LSTM input : [timesteps, features] 
    model.add(LSTM(256, name = 'LSTM_1', return_sequences=True, input_shape=(lookback, n_features), activation='tanh'))
    #model.add(BatchNormalization(name = 'batch_norm_1'))
    model.add(LSTM(128, name = 'LSTM_2', return_sequences=True, activation='tanh'))
    #model.add(Dropout(0.20, name = 'dropout_2'))
    #model.add(TimeDistributed(Dense(64, name = 'TimeDense_1', activation='relu')))
    model.add(Flatten(name = 'Flatten_1'))
    #model.add(BatchNormalization(name = 'batch_norm_2'))
    model.add(Dense(64, name = 'Dense_1', activation='relu'))
    model.add(Dense(8, name = 'Dense_2', activation='relu'))
    model.add(Dense(1, name = 'Dense_3', activation='sigmoid'))
    #model.compile(loss='mse', optimizer='adam', metrics=[mape, smape])
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy',mape, smape])
    return model

In [48]:
PATIENCE = 4
EPOCHS = 20
LOOP = 1
BATCH_SIZE = 32 # Default used my model.fit is 32
steps_per_epoch = xTrainLbkCNN.shape[0] * LOOP / EPOCHS // BATCH_SIZE    # Split all data by Epochs ()
validation_steps = xValLbkCNN.shape[0] // BATCH_SIZE                    # Take all validation data for validation on each epoch

In [49]:
print(steps_per_epoch, validation_steps)

163.0 363


In [50]:
early_stopping = EarlyStopping(monitor='val_loss', patience = PATIENCE, restore_best_weights=True)

In [51]:
CLASS_WEIGHT = {0: .37, 1 : .63} # Use to counter unbalnced class

In [52]:
modelLSTM02 = create_LSTM02(lookback, xTrainLbk.shape[2])

In [53]:
modelstart = time.time()
history = modelLSTM02.fit(
                    xTrainLbkCNN,
                    yTrainLbk,
                    epochs = EPOCHS,
                    batch_size = BATCH_SIZE,
                    #class_weight = CLASS_WEIGHT,
                    validation_data=(xValLbkCNN,yValLbk),
                    validation_steps=validation_steps,
                    steps_per_epoch=steps_per_epoch)
modelLSTM02.save('LSTM_02.h5')
print("\nModel Runtime: %0.2f Minutes"%((time.time() - modelstart)/60))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20

KeyboardInterrupt: 

In [None]:
modelLSTM02.summary()

### Test

In [None]:
xTestLbk, yTestLbk = spliSequencesWithSamples(X_test, y_test, lookback)

In [None]:
xTestLbkCNN = xTestLbk

In [None]:
pred = modelLSTM02.predict(xTestLbkCNN)

### Profit

In [None]:
profit, profitPerTrade = calculateProfit(dfCleanRow, dX_test, yTestLbk, pred, lookback=lookback, specificity=.9, target='targetBuy')

In [None]:
profit

In [None]:
profitPerTrade

In [None]:
pred