# Applications d'algo Deep Learning (NN)

## Constitution des datasets

On va constituer 3 datasets différents avec une profondeur différente (nombre de variables) afin de pouvoir comparer notamment l'impact des indicateurs sur la qualité du résultat.

In [1]:
# pip install psycopg2-binary

In [2]:
import time
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import psycopg2
from sqlalchemy import create_engine

In [3]:
import warnings
warnings.filterwarnings('ignore')

In [4]:
from scipy.stats import spearmanr
from sklearn.model_selection import train_test_split
from sklearn.metrics import *
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold

### Datasets : EURUSD D1

In [5]:
conn_string = 'postgresql://postgres:Juw51000@localhost/tradingIA'

db = create_engine(conn_string)
conn = db.connect()

In [6]:
df = pd.read_sql("select * from fex_eurusd_d1", conn);
df.head()

Unnamed: 0,epoch,mopen,mclose,mhigh,mlow,mvolume,mspread,ima,ima2,ima4,...,irsi4,iatr,iatr2,iatr4,rProfitBuy,rSwapBuy,rProfitBTrigger,rProfitSell,rSwapSell,rProfitSTrigger
0,946857600,1.0073,1.0243,1.0278,1.0054,6572,50,1.011008,1.012496,1.023587,...,48.887713,0.009387,0.00975,0.010237,7.65,-0.48,TO,-9.13,0.0,SL
1,946944000,1.0243,1.0296,1.034,1.0213,7253,50,1.012825,1.013387,1.023129,...,50.520967,0.009625,0.010206,0.01035,2.81,-0.48,TO,-9.31,0.0,SL
2,947030400,1.0295,1.032,1.0402,1.0284,6548,50,1.014383,1.014633,1.022656,...,51.24914,0.010375,0.010181,0.010562,-4.47,-0.24,TO,3.5,0.08,TO
3,947116800,1.0327,1.0327,1.0415,1.0272,7288,50,1.0164,1.015867,1.022267,...,51.464196,0.011575,0.0106,0.010762,-11.55,-0.12,SL,6.43,0.08,TO
4,947203200,1.0329,1.0295,1.0334,1.026,5765,50,1.018083,1.016154,1.021787,...,50.414735,0.011138,0.01025,0.010591,4.26,-0.24,TO,-5.22,0.08,TO


In [7]:
df['targetBuy'] = df['rProfitBuy'] + df['rSwapBuy']
df['targetSell'] = df['rProfitSell'] + df['rSwapSell']

In [8]:
dfNotNa = df[df['rProfitBTrigger'].notna()]
dfCleanRow = dfNotNa[dfNotNa['epoch'] < 1689811200]
dfClean = dfCleanRow.drop(['rProfitBuy', 'rSwapBuy', 'rProfitSell', 'rSwapSell', 'rProfitSTrigger', 'rProfitBTrigger'], axis=1)
dfClean.shape

(5963, 21)

### Transposition en problème de classification binaire

On peut simplifier la question de base qui est de savoir quel est le moment du profit (Buy/Sell) en question binaire, à savoir est-ce que le trade à un instant T (Buy et Sell) entrainera une perte (0) ou un gain (1) ?

In [9]:
dfCleanBin = dfClean

In [10]:
dfCleanBin['targetProfitBuy'] = dfCleanBin['targetBuy'].apply(lambda x: 1 if x > 0 else 0)
dfCleanBin['targetProfitSell'] = dfCleanBin['targetSell'].apply(lambda x: 1 if x > 0 else 0)
dfCleanBin.shape

(5963, 23)

In [11]:
sum(dfCleanBin['targetBuy'])

-2267.709999999994

In [12]:
sum(dfCleanBin['targetProfitBuy']) / dfCleanBin.shape[0]

0.46050645648163674

In [13]:
sum(dfCleanBin['targetSell'])

-983.0399999999954

In [14]:
sum(dfCleanBin['targetProfitSell']) / dfCleanBin.shape[0]

0.4650343786684555

Qu'il s'agisse des Profits Buy ou Sell on est à environ 46% de target Profit pour 54% de perte. Les classes sont donc plutôt équilibrées.

### Glissement des valeurs Target (prévision)

Pour la prévision les valeurs à prédire (profit du trade) sont les valeurs qui concernent la periode à venir du trade (T+1) en fonction des features observées sur la periode actuelle (T). On doit donc glisser les valeurs de Target de T+1 vers T.

In [15]:
dfCleanBin['targetProfitBuy'] = dfCleanBin['targetProfitBuy'].shift(-1)
dfCleanBin['targetProfitSell'] = dfCleanBin['targetProfitSell'].shift(-1)
dfCleanBin['targetSell'] = dfCleanBin['targetSell'].shift(-1)
dfCleanBin['targetBuy'] = dfCleanBin['targetBuy'].shift(-1)

In [16]:
dfCleanBin = dfCleanBin[dfCleanBin['targetProfitSell'].notna()]

In [17]:
dfCleanBin.set_index('epoch', inplace=True)

#### Dataset basis
Ce dataset ne va comporfter que les données brutes (en plus des target) sans aucun indicateur technique

In [18]:
dfBasisB = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitBuy']]
dfBasisS = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitSell']]

#### Dataset intermediate low
Ce dataset, va comporfter les données brutes (en plus des target) ainsi que la version des indicateurs sur la plus courte periode de calcul

In [19]:
dfIntLowB = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitBuy', 
                   'ima', 'iatr', 'irsi', 'imacd']]
dfIntLowS = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitSell', 
                   'ima', 'iatr', 'irsi', 'imacd']]

#### Dataset intermediate Medium
Ce dataset, va comporfter les données brutes (en plus des target) ainsi que la version des indicateurs sur la periode de calcul intermediaire

In [20]:
dfIntMedB = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitBuy', 
                   'ima2', 'iatr2', 'irsi2', 'imacd2']]
dfIntMedS = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitSell', 
                   'ima2', 'iatr2', 'irsi2', 'imacd2']]

#### Dataset intermediate High
Ce dataset, va comporfter les données brutes (en plus des target) ainsi que la version des indicateurs sur la plus longue periode de calcul

In [21]:
dfIntHigB = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitBuy', 
                   'ima4', 'iatr4', 'irsi4', 'imacd4']]
dfIntHigS = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitSell', 
                   'ima4', 'iatr4', 'irsi4', 'imacd4']]

#### Dataset Complet
Ce dataset, va comporfter les données brutes (en plus des target) ainsi tous les indicateurs sur toutes les periodes de calcul

In [22]:
dfFullB = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitBuy', 
                   'ima', 'iatr', 'irsi', 'imacd','ima2', 'iatr2', 'irsi2', 'imacd2','ima4', 'iatr4', 'irsi4', 'imacd4']]
dfFullS = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitSell', 
                   'ima', 'iatr', 'irsi', 'imacd','ima2', 'iatr2', 'irsi2', 'imacd2','ima4', 'iatr4', 'irsi4', 'imacd4']]

## Applications des Deep Learning Model

#### Utilisation du modele de base : dfBasisB

In [23]:
dfBasisB.shape

(5962, 7)

Definition des datsests de Features / Target

In [24]:
df = dfBasisB

In [25]:
dfTarget = df['targetProfitBuy']
dfFeatures = df.drop(columns=['targetProfitBuy'])

Separation du Dataset Train / Test

In [26]:
X_train, X_test, y_train, y_test = train_test_split(dfFeatures, dfTarget, train_size=0.8)

Separation du Dataset de Train Train / Val

#### Normalisation des fetures du dataset, pour Entrainement (Train / Val)

In [27]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)

#### Définition du Modele Deep Learning

In [39]:
# baseline model
def create_baseline():
 # create model
 model = Sequential()
 model.add(Dense(12, input_shape=(6,), activation='relu'))
 model.add(Dense(4, activation='relu'))
 model.add(Dense(1, activation='sigmoid'))
 model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
 return model

In [44]:
# evaluate model with standardized dataset
estimator = KerasClassifier(model=create_baseline, epochs=4, batch_size=5, verbose=0)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(estimator, X_train, y_train, cv=kfold)

In [45]:
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Baseline: 52.95% (0.85%)


#### Utilisation des modeles avec feature engineering

dfIntLowB

In [50]:
dfIntLowB.shape

(5962, 11)

In [51]:
df = dfIntLowB
dfTarget = df['targetProfitBuy']
dfFeatures = df.drop(columns=['targetProfitBuy'])

#### Normalisation des fetures du dataset, pour Entrainement (Train / Val)

In [53]:
scaler = StandardScaler()
X_train = scaler.fit_transform(dfFeatures)
y_train = dfTarget.to_numpy()

In [61]:
# baseline model
def create_baseline_Int():
 # create model
 model = Sequential()
 model.add(Dense(16, input_shape=(10,), activation='relu'))
 model.add(Dense(8, activation='relu'))
 model.add(Dense(1, activation='sigmoid'))
 model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
 return model

In [62]:
# evaluate model with standardized dataset
estimator = KerasClassifier(model=create_baseline_Int, epochs=10, batch_size=5, verbose=0)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(estimator, X_train, y_train, cv=kfold)

In [63]:
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Baseline: 53.77% (1.58%)


#### Utilisation des modeles complets

In [64]:
dfFullB.shape

(5962, 19)

In [65]:
df = dfFullB
dfTarget = df['targetProfitBuy']
dfFeatures = df.drop(columns=['targetProfitBuy'])

#### Normalisation des fetures du dataset, pour Entrainement (Train / Val)

In [66]:
scaler = StandardScaler()
X_train = scaler.fit_transform(dfFeatures)
y_train = dfTarget.to_numpy()

In [101]:
# baseline model
def create_baseline_Full():
 # create model
 model = Sequential()
 model.add(Dense(64, input_shape=(18,), activation='relu'))
 model.add(Dense(32, activation='relu'))
 model.add(Dense(8, activation='relu'))
 model.add(Dense(1, activation='sigmoid'))
 model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
 return model

In [102]:
# evaluate model with standardized dataset
estimator = KerasClassifier(model=create_baseline_Full, epochs=4, batch_size=20, verbose=0)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(estimator, X_train, y_train, cv=kfold)

In [103]:
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Baseline: 53.57% (1.18%)


## Conclusion

### Constat

Les résultats sont très faibles. Proches de 50% qui serait le résultat attendu pour un algorithme totalement aléatoire.
On arrive pas au niveau des autres scores d'algo ML (57%) qui étaient déjà très faibles.
- Le manque de données (6000) ne permet surement pas de faire un apprentissage suffisant avec du deep Learning
- On garde également en tête que les NN classiques ne sont pas forcément optimaux pour des series temporelles

### Optimisation

1- Augmentation du volume de données
- Journalier -> Horraire -> 5mn ? : Permettera sur la même periode de démultiplier la taille des datsets
- EUR_USD + autre paires ? : A débattre, pe dans un second temps, car rien ne prouve que les compôrtements soient les même

2- Augmentation de la profondeur des données (Feature engineering)
- Ajout de nouveaux indicateurs techniques financiers
- Techniques ML de création d'indicateurs combinés

3- Utilisations de réseaux de neuronnes plus adaptés aux Time Series
- RNN
- LSTM