# Applications d'algo Deep Learning (NN) adaptés aux Time Series

Il existe plusieurs types de modèles adaptés aux Time Series. Leur particularité est de ne pas utiliser simplement les données comme des évenements indépendants mais de conserver une "mémoire" des évenements précédents pour mieux analyser un instant T.

Ceci est utile notamment pour trouver des pattern de tendance à terme. Voici les principaux modèles :
- RNN  : Recurrent Neuronal Network
- LSTM : Long Short-Term Memory
- GRU  : Gated Recurrent Unit

## Constitution des datasets

On va constituer 3 datasets différents avec une profondeur différente (nombre de variables) afin de pouvoir comparer notamment l'impact des indicateurs sur la qualité du résultat.

In [1]:
# pip install psycopg2-binary

In [2]:
import time
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import psycopg2
from sqlalchemy import create_engine

In [3]:
import warnings
warnings.filterwarnings('ignore')

In [4]:
from sklearn.model_selection import train_test_split, ShuffleSplit
from sklearn.metrics import *
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Convolution1D, MaxPooling1D, Flatten
from tensorflow.keras.layers import LSTM, TimeDistributed, Conv1D, ConvLSTM2D
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold


### Datasets : EURUSD H1

In [5]:
conn_string = 'postgresql://postgres:Juw51000@localhost/tradingIA'

db = create_engine(conn_string)
conn = db.connect()

In [6]:
df = pd.read_sql("select * from fex_eurusd_h1", conn);
df.head()

Unnamed: 0,epoch,mopen,mclose,mhigh,mlow,mvolume,mspread,ima,ima2,ima4,...,istos4,imom,imom2,imom4,rProfitBuy,rSwapBuy,rProfitBTrigger,rProfitSell,rSwapSell,rProfitSTrigger
0,946861200,1.0073,1.0128,1.0132,1.0073,194,50,1.008242,1.007963,1.006779,...,70.12987,100.536033,100.615935,100.565982,3.64,0.0,TO,-3.07,0.0,SL
1,946864800,1.0129,1.0137,1.0141,1.012,113,50,1.008733,1.008175,1.006973,...,72.331461,100.67534,100.815515,100.495688,2.56,0.0,TO,-3.15,0.0,SL
2,946868400,1.014,1.0171,1.0173,1.0134,149,50,1.009517,1.008588,1.007215,...,76.041667,101.073239,101.002979,100.902778,-0.1,0.0,TO,-0.88,0.0,TO
3,946872000,1.017,1.0175,1.019,1.017,214,50,1.01035,1.008958,1.007462,...,78.688525,100.87241,100.962493,100.882411,-2.36,0.0,TO,1.38,0.0,TO
4,946875600,1.0173,1.0167,1.0177,1.0164,162,50,1.010975,1.009296,1.007677,...,78.51153,100.703249,100.893123,100.813089,-2.95,0.0,SL,5.74,0.0,TP


In [7]:
df['targetBuy'] = df['rProfitBuy'] + df['rSwapBuy']
df['targetSell'] = df['rProfitSell'] + df['rSwapSell']

In [8]:
dfNotNa = df[df['rProfitBTrigger'].notna()]
dfCleanRow = dfNotNa[dfNotNa['epoch'] < 1690484400]
dfClean = dfCleanRow.drop(['rProfitBuy', 'rSwapBuy', 'rProfitSell', 'rSwapSell', 'rProfitSTrigger', 'rProfitBTrigger'], axis=1)
dfClean.shape

(145559, 27)

### Transposition en problème de classification binaire

On peut simplifier la question de base qui est de savoir quel est le moment du profit (Buy/Sell) en question binaire, à savoir est-ce que le trade à un instant T (Buy et Sell) entrainera une perte (0) ou un gain (1) ?

In [9]:
dfCleanBin = dfClean

In [10]:
dfCleanBin['targetProfitBuy'] = dfCleanBin['targetBuy'].apply(lambda x: 1 if x > 0 else 0)
dfCleanBin['targetProfitSell'] = dfCleanBin['targetSell'].apply(lambda x: 1 if x > 0 else 0)
dfCleanBin.shape

(145559, 29)

In [11]:
sum(dfCleanBin['targetBuy'])

-33065.310000000005

In [12]:
sum(dfCleanBin['targetProfitBuy']) / dfCleanBin.shape[0]

0.37148510226093884

In [13]:
sum(dfCleanBin['targetSell'])

-32935.02000000026

In [14]:
sum(dfCleanBin['targetProfitSell']) / dfCleanBin.shape[0]

0.37439801042876086

Qu'il s'agisse des Profits Buy ou Sell on est à environ 37% de target Profit pour 63% de perte. Les classes sont donc plutôt équilibrées.

### Glissement des valeurs Target (prévision)

Pour la prévision les valeurs à prédire (profit du trade) sont les valeurs qui concernent la periode à venir du trade (T+1) en fonction des features observées sur la periode actuelle (T). On doit donc glisser les valeurs de Target de T+1 vers T.

In [15]:
dfCleanBin['targetProfitBuy'] = dfCleanBin['targetProfitBuy'].shift(-1)
dfCleanBin['targetProfitSell'] = dfCleanBin['targetProfitSell'].shift(-1)
dfCleanBin['targetSell'] = dfCleanBin['targetSell'].shift(-1)
dfCleanBin['targetBuy'] = dfCleanBin['targetBuy'].shift(-1)

In [16]:
dfCleanBin = dfCleanBin[dfCleanBin['targetProfitSell'].notna()]

In [17]:
dfCleanBin.set_index('epoch', inplace=True)

#### Dataset basis
Ce dataset ne va comporfter que les données brutes (en plus des target) sans aucun indicateur technique

In [18]:
dfBasisB = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitBuy']]
dfBasisS = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitSell']]

#### Dataset intermediate low
Ce dataset, va comporfter les données brutes (en plus des target) ainsi que la version des indicateurs sur la plus courte periode de calcul

In [19]:
dfIntLowB = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitBuy', 
                   'ima', 'iatr', 'irsi', 'imacd', 'istos', 'imom']]
dfIntLowS = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitSell', 
                   'ima', 'iatr', 'irsi', 'imacd', 'istos', 'imom']]

#### Dataset intermediate Medium
Ce dataset, va comporfter les données brutes (en plus des target) ainsi que la version des indicateurs sur la periode de calcul intermediaire

In [20]:
dfIntMedB = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitBuy', 
                   'ima2', 'iatr2', 'irsi2', 'imacd2', 'istos2', 'imom2']]
dfIntMedS = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitSell', 
                   'ima2', 'iatr2', 'irsi2', 'imacd2', 'istos2', 'imom2']]

#### Dataset intermediate High
Ce dataset, va comporfter les données brutes (en plus des target) ainsi que la version des indicateurs sur la plus longue periode de calcul

In [21]:
dfIntHigB = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitBuy', 
                   'ima4', 'iatr4', 'irsi4', 'imacd4', 'istos4', 'imom4']]
dfIntHigS = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitSell', 
                   'ima4', 'iatr4', 'irsi4', 'imacd4', 'istos4', 'imom4']]

#### Dataset Complet
Ce dataset, va comporfter les données brutes (en plus des target) ainsi tous les indicateurs sur toutes les periodes de calcul

In [22]:
dfFullB = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitBuy', 
                   'ima', 'iatr', 'irsi', 'imacd','ima2', 'iatr2', 'irsi2', 'imacd2','ima4', 'iatr4', 'irsi4', 'imacd4',
                   'istos', 'istos2', 'istos4', 'imom', 'imom2', 'imom4']]
dfFullS = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitSell', 
                   'ima', 'iatr', 'irsi', 'imacd','ima2', 'iatr2', 'irsi2', 'imacd2','ima4', 'iatr4', 'irsi4', 'imacd4',
                   'istos', 'istos2', 'istos4', 'imom', 'imom2', 'imom4']]

## Applications des Deep Learning Model

#### Utilisation du modele de base : dfBasisB

In [23]:
dfBasisB.shape

(145558, 7)

#### Definition des datsests de Features / Target

In [24]:
df = dfBasisB

In [25]:
dfTarget = df['targetProfitBuy']
dfFeatures = df.drop(columns=['targetProfitBuy'])

#### Separation du Dataset Train / Test

In [26]:
def getTrainTestDatasets(dfFeatures, dfTarget, testSize=.2):
    rs = ShuffleSplit(n_splits=1, test_size=testSize)
    train_index, test_index = next(rs.split(dfFeatures, dfTarget)) 
    dX_train, dX_test = dfFeatures.iloc[train_index], dfFeatures.iloc[test_index] 
    dy_train, dy_test = dfTarget.iloc[train_index], dfTarget.iloc[test_index]
    return dX_train, dX_test, dy_train, dy_test

In [27]:
dX_train, dX_test, dy_train, dy_test = getTrainTestDatasets(dfFeatures, dfTarget, .2)

#### Normalisation des données

In [28]:
scaler = StandardScaler()
X_train = scaler.fit_transform(dX_train)
X_test = scaler.transform(dX_test)

In [29]:
y_train = dy_train.to_numpy()
y_test = dy_test.to_numpy()

In [30]:
X_train.shape

(116446, 6)

#### Spécificité LSTM : Separation des données en sous-ensembles

Les LSTM travaillent par lots (sous-ensembles) qui déterminent pour une instance donné quelles sont les instances précédentes qui doivent lui être associées.

Dans le contexte du trading on va donner pour chaque extrait de données à un instant T un nombre n (paramètre) d'extraits qui le précédent directement dans le temps [T-1 .... T-n], et qui vont être utilisés par LSTM pour comprendre la donnée à l'instant T.

In [31]:
def spliSequencesWithSamples(xdata, ydata, lookback):
    X, y = list(), list()
    for i in range(len(xdata)):
        if (i>=lookback-1): # Rows with not enough prev values cannot be taken
            # gather input and output parts of the pattern
            seq_x, seq_y = xdata[i+1-lookback:i+1, :], ydata[i]
            X.append(seq_x)
            y.append(seq_y)  
    return(np.array(X), np.array(y))

In [32]:
xTrainLbk, yTrainLbk = spliSequencesWithSamples(X_train, y_train, 100)

In [33]:
print(xTrainLbk.shape, yTrainLbk.shape)

(116347, 100, 6) (116347,)


## Vanilla LSTM (basique)

### Training

In [34]:
def create_VanillaLSTM(lookback, nbFeatures):
    model = Sequential()
    model.add(LSTM(50, input_shape=(lookback, nbFeatures)))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
    return model

LSTM input is 3D with NbSequences / NbLookbackSequences / NbFeatures

In [35]:
modelLSTM = create_VanillaLSTM(xTrainLbk.shape[1], xTrainLbk.shape[2])

In [36]:
modelLSTM.fit(xTrainLbk, yTrainLbk, epochs=1)



<keras.callbacks.History at 0x15227c298d0>

### Test

In [37]:
xTestLbk, yTestLbk = spliSequencesWithSamples(X_test, y_test, 100) 

In [38]:
pred = modelLSTM.predict(xTestLbk)



### Calcul des scores et gains (model 100 % aléatoire)

In [39]:
def calculateRandomProfit(dfCleanRow, target='targetBuy'):
    profit = dfCleanRow[target].sum()
    profitPerTrade = profit / len(dfCleanRow)
    return profit, profitPerTrade

In [40]:
def calculateProfit(dfCleanRow, dX_test, yTestLbk, pred, lookback=100, specificity=.8, target='targetBuy'):
    [fpr, tpr, thr] = roc_curve(yTestLbk, pred, pos_label=1)
    idx = np.max(np.where((1-fpr) > specificity)) 
    seuil = thr[idx]  
    dfPred = pd.DataFrame(pred, columns = ['proba'])
    #Get rows index with positive proba (proba > seuil)
    xRows = dfPred[dfPred['proba']>seuil].index.to_numpy()
    #Get matching index (epoch timestamp) from dX_test => Periods with proba > seuil
    xEpochs = dX_test.iloc[lookback-1:,:].iloc[xRows].index.to_numpy()
    dfCleanEpochIdx = dfCleanRow.set_index('epoch')
    profit = dfCleanEpochIdx.loc[xEpochs][target].sum()
    profitPerTrade = profit / len(xRows)
    return profit, profitPerTrade

In [41]:
profitRandom, profitPerTradeRandom = calculateRandomProfit(dfCleanRow, target='targetBuy')

In [42]:
profitRandom

-33065.30999999999

In [43]:
profitPerTradeRandom

-0.2271608763456742

### Calcul des scores et gains

In [44]:
profit, profitPerTrade = calculateProfit(dfCleanRow, dX_test, yTestLbk, pred, lookback=100, specificity=.8, target='targetBuy')

In [45]:
profit

-466.88

In [46]:
profitPerTrade

-0.07690331082194037

## Stacked LSTM

### Training

In [47]:
def create_StackedLSTM(lookback, nbFeatures):
    model = Sequential()    
    model.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(lookback, nbFeatures)))
    model.add(LSTM(50, activation='relu'))
    model.add(Dense(16, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
    return model

LSTM input is 3D with NbSequences / NbLookbackSequences / NbFeatures

In [48]:
modelLSTM = create_StackedLSTM(xTrainLbk.shape[1], xTrainLbk.shape[2])

In [49]:
modelLSTM.fit(xTrainLbk, yTrainLbk, epochs=1)



<keras.callbacks.History at 0x1522996f220>

### Test

In [50]:
xTestLbk, yTestLbk = spliSequencesWithSamples(X_test, y_test, 100)

In [51]:
pred = modelLSTM.predict(xTestLbk)



### Calcul des scores et gains

In [52]:
profit, profitPerTrade = calculateProfit(dfCleanRow, dX_test, yTestLbk, pred, lookback=100, specificity=.8, target='targetBuy')

In [53]:
profit

-621.9899999999999

In [54]:
profitPerTrade

-0.10211623707108848

## CNN LSTM

In [55]:
print(xTrainLbk.shape, yTrainLbk.shape)

(116347, 100, 6) (116347,)


In [56]:
n_seq = 10
n_steps = 10

In [57]:
xTrainLbkCNN = xTrainLbk.reshape((xTrainLbk.shape[0], n_seq, n_steps, xTrainLbk.shape[2]))

### Training

model = Sequential()
model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation='relu'), input_shape=(None, n_steps, n_features)))
model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(50, activation='relu'))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')

In [58]:
def create_CNNyLSTM(n_steps, n_features):
    model = Sequential()
    model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation='relu'), input_shape=(None, n_steps, n_features)))
    model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
    model.add(TimeDistributed(Flatten()))
    model.add(LSTM(50, activation='relu'))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
    return model

In [59]:
modelLSTM = create_CNNyLSTM(n_steps, xTrainLbk.shape[2])

In [60]:
modelLSTM.fit(xTrainLbkCNN, yTrainLbk, epochs=1)



<keras.callbacks.History at 0x1522dff83d0>

### Test

In [61]:
xTestLbk, yTestLbk = spliSequencesWithSamples(X_test, y_test, 100)

In [62]:
xTestLbkCNN = xTestLbk.reshape((xTestLbk.shape[0], n_seq, n_steps, xTestLbk.shape[2]))

In [63]:
pred = modelLSTM.predict(xTestLbkCNN)



### Calcul des scores et gains

In [64]:
profit, profitPerTrade = calculateProfit(dfCleanRow, dX_test, yTestLbk, pred, lookback=100, specificity=.8, target='targetBuy')

In [65]:
profit

-745.92

In [66]:
profitPerTrade

-0.12286608466479987

## ConvLSTM

In [67]:
print(xTrainLbk.shape, yTrainLbk.shape)

(116347, 100, 6) (116347,)


In [68]:
n_seq = 10
n_steps = 10

In [69]:
xTrainLbkCNN = xTrainLbk.reshape((xTrainLbk.shape[0], n_seq, 1, n_steps, xTrainLbk.shape[2]))

### Training

In [70]:
def create_ConvLSTM(n_seq, n_steps, n_features):
    model = Sequential()
    model.add(ConvLSTM2D(filters=128, kernel_size=(1,2), activation='relu', input_shape=(n_seq, 1, n_steps, n_features)))
    model.add(Flatten())
    model.add(Dense(32, activation='relu'))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
    return model

In [71]:
modelLSTM = create_ConvLSTM(n_seq, n_steps, xTrainLbk.shape[2])

In [72]:
modelLSTM.fit(xTrainLbkCNN, yTrainLbk, epochs=1)



<keras.callbacks.History at 0x1523067edd0>

### Test

In [73]:
xTestLbk, yTestLbk = spliSequencesWithSamples(X_test, y_test, 100)

In [74]:
xTestLbkCNN = xTestLbk.reshape((xTestLbk.shape[0], n_seq, 1, n_steps, xTestLbk.shape[2]))

In [75]:
pred = modelLSTM.predict(xTestLbkCNN)



In [76]:
profit, profitPerTrade = calculateProfit(dfCleanRow, dX_test, yTestLbk, pred, lookback=100, specificity=.8, target='targetBuy')

In [77]:
profit

-368.30999999999995

In [78]:
profitPerTrade

-0.06033912188728702