# Applications d'algo Deep Learning (NN)

## Constitution des datasets

On va constituer 3 datasets différents avec une profondeur différente (nombre de variables) afin de pouvoir comparer notamment l'impact des indicateurs sur la qualité du résultat.

In [1]:
# pip install psycopg2-binary

In [2]:
import time
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import psycopg2
from sqlalchemy import create_engine

In [3]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
from scipy.stats import spearmanr
from sklearn.model_selection import train_test_split
from sklearn.metrics import *
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation


### Datasets : EURUSD D1

In [5]:
conn_string = 'postgresql://postgres:Juw51000@localhost/tradingIA'

db = create_engine(conn_string)
conn = db.connect()

In [6]:
df = pd.read_sql("select * from fex_eurusd_d1", conn);
df.head()

Unnamed: 0,epoch,mopen,mclose,mhigh,mlow,mvolume,mspread,ima,ima2,ima4,...,irsi4,iatr,iatr2,iatr4,rProfitBuy,rSwapBuy,rProfitBTrigger,rProfitSell,rSwapSell,rProfitSTrigger
0,946857600,1.0073,1.0243,1.0278,1.0054,6572,50,1.011008,1.012496,1.023587,...,48.887713,0.009387,0.00975,0.010237,7.65,-0.48,TO,-9.13,0.0,SL
1,946944000,1.0243,1.0296,1.034,1.0213,7253,50,1.012825,1.013387,1.023129,...,50.520967,0.009625,0.010206,0.01035,2.81,-0.48,TO,-9.31,0.0,SL
2,947030400,1.0295,1.032,1.0402,1.0284,6548,50,1.014383,1.014633,1.022656,...,51.24914,0.010375,0.010181,0.010562,-4.47,-0.24,TO,3.5,0.08,TO
3,947116800,1.0327,1.0327,1.0415,1.0272,7288,50,1.0164,1.015867,1.022267,...,51.464196,0.011575,0.0106,0.010762,-11.55,-0.12,SL,6.43,0.08,TO
4,947203200,1.0329,1.0295,1.0334,1.026,5765,50,1.018083,1.016154,1.021787,...,50.414735,0.011138,0.01025,0.010591,4.26,-0.24,TO,-5.22,0.08,TO


In [7]:
df['targetBuy'] = df['rProfitBuy'] + df['rSwapBuy']
df['targetSell'] = df['rProfitSell'] + df['rSwapSell']

In [8]:
dfNotNa = df[df['rProfitBTrigger'].notna()]
dfCleanRow = dfNotNa[dfNotNa['epoch'] < 1689811200]
dfClean = dfCleanRow.drop(['rProfitBuy', 'rSwapBuy', 'rProfitSell', 'rSwapSell', 'rProfitSTrigger', 'rProfitBTrigger'], axis=1)
dfClean.shape

(5963, 21)

### Transposition en problème de classification binaire

On peut simplifier la question de base qui est de savoir quel est le moment du profit (Buy/Sell) en question binaire, à savoir est-ce que le trade à un instant T (Buy et Sell) entrainera une perte (0) ou un gain (1) ?

In [9]:
dfCleanBin = dfClean

In [10]:
dfCleanBin['targetProfitBuy'] = dfCleanBin['targetBuy'].apply(lambda x: 1 if x > 0 else 0)
dfCleanBin['targetProfitSell'] = dfCleanBin['targetSell'].apply(lambda x: 1 if x > 0 else 0)
dfCleanBin.shape

(5963, 23)

In [11]:
sum(dfCleanBin['targetBuy'])

-2267.709999999994

In [12]:
sum(dfCleanBin['targetProfitBuy']) / dfCleanBin.shape[0]

0.46050645648163674

In [13]:
sum(dfCleanBin['targetSell'])

-983.0399999999954

In [14]:
sum(dfCleanBin['targetProfitSell']) / dfCleanBin.shape[0]

0.4650343786684555

Qu'il s'agisse des Profits Buy ou Sell on est à environ 46% de target Profit pour 54% de perte. Les classes sont donc plutôt équilibrées.

### Glissement des valeurs Target (prévision)

Pour la prévision les valeurs à prédire (profit du trade) sont les valeurs qui concernent la periode à venir du trade (T+1) en fonction des features observées sur la periode actuelle (T). On doit donc glisser les valeurs de Target de T+1 vers T.

In [15]:
dfCleanBin['targetProfitBuy'] = dfCleanBin['targetProfitBuy'].shift(-1)
dfCleanBin['targetProfitSell'] = dfCleanBin['targetProfitSell'].shift(-1)
dfCleanBin['targetSell'] = dfCleanBin['targetSell'].shift(-1)
dfCleanBin['targetBuy'] = dfCleanBin['targetBuy'].shift(-1)

In [16]:
dfCleanBin = dfCleanBin[dfCleanBin['targetProfitSell'].notna()]

In [17]:
dfCleanBin.set_index('epoch', inplace=True)

#### Dataset basis
Ce dataset ne va comporfter que les données brutes (en plus des target) sans aucun indicateur technique

In [18]:
dfBasisB = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitBuy']]
dfBasisS = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitSell']]

#### Dataset intermediate low
Ce dataset, va comporfter les données brutes (en plus des target) ainsi que la version des indicateurs sur la plus courte periode de calcul

In [19]:
dfIntLowB = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitBuy', 
                   'ima', 'iatr', 'irsi', 'imacd']]
dfIntLowS = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitSell', 
                   'ima', 'iatr', 'irsi', 'imacd']]

#### Dataset intermediate Medium
Ce dataset, va comporfter les données brutes (en plus des target) ainsi que la version des indicateurs sur la periode de calcul intermediaire

In [20]:
dfIntMedB = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitBuy', 
                   'ima2', 'iatr2', 'irsi2', 'imacd2']]
dfIntMedS = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitSell', 
                   'ima2', 'iatr2', 'irsi2', 'imacd2']]

#### Dataset intermediate High
Ce dataset, va comporfter les données brutes (en plus des target) ainsi que la version des indicateurs sur la plus longue periode de calcul

In [21]:
dfIntHigB = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitBuy', 
                   'ima4', 'iatr4', 'irsi4', 'imacd4']]
dfIntHigS = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitSell', 
                   'ima4', 'iatr4', 'irsi4', 'imacd4']]

#### Dataset Complet
Ce dataset, va comporfter les données brutes (en plus des target) ainsi tous les indicateurs sur toutes les periodes de calcul

In [22]:
dfFullB = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitBuy', 
                   'ima', 'iatr', 'irsi', 'imacd','ima2', 'iatr2', 'irsi2', 'imacd2','ima4', 'iatr4', 'irsi4', 'imacd4']]
dfFullS = dfCleanBin[['mopen', 'mclose', 'mhigh', 'mlow', 'mvolume', 'mspread', 'targetProfitSell', 
                   'ima', 'iatr', 'irsi', 'imacd','ima2', 'iatr2', 'irsi2', 'imacd2','ima4', 'iatr4', 'irsi4', 'imacd4']]

## Applications des Deep Learning Model

Utilisation des exemples Gitlab :
https://github.com/stefan-jansen/machine-learning-for-trading/blob/main/17_deep_learning/04_optimizing_a_NN_architecture_for_trading.ipynb

In [23]:
from utils import MultipleTimeSeriesCV, format_time
from itertools import product
from sklearn.preprocessing import StandardScaler
from pathlib import Path
from time import time

In [24]:
results_path = Path('results')
if not results_path.exists():
    results_path.mkdir()
    
checkpoint_path = results_path / 'logs'

In [25]:
n_splits = 1
train_period_length=5000
test_period_length=800

In [26]:
def make_model(dense_layers, activation, dropout, input_dim):
    '''Creates a multi-layer perceptron model
    
    dense_layers: List of layer sizes; one number per layer
    '''

    model = Sequential()
    for i, layer_size in enumerate(dense_layers, 1):
        if i == 1:
            model.add(Dense(layer_size, input_dim=input_dim))
            model.add(Activation(activation))
        else:
            model.add(Dense(layer_size))
            model.add(Activation(activation))
    model.add(Dropout(dropout))
    model.add(Dense(1, activation='sigmoid'))

    model.compile(loss=tf.keras.losses.BinaryCrossentropy(),
                  optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
                 metrics=['accuracy'])
    model.summary()
    return model

In [27]:
cv = MultipleTimeSeriesCV(n_splits=n_splits,
                          train_period_length=train_period_length,
                          test_period_length=test_period_length,
                          lookahead=1, 
                          date_idx='epoch')

Definition de paramètres différents pour Cross-Validation

In [49]:
# dense_layer_opts = [(4, 2), (8, 4), (16, 8), (16, 16)]
dense_layer_opts = [(32, 16),(64, 32)]
activation_opts = ['tanh']
dropout_opts = [0, .1, .2]

In [50]:
param_grid = list(product(dense_layer_opts, activation_opts, dropout_opts))
np.random.shuffle(param_grid)
len(param_grid)

6

To trigger the parameter search, we instantiate a GridSearchCV object, define the fit_params that will be passed to the Keras model’s fit method, and provide the training data to the GridSearchCV fit method:

In [51]:
def get_train_valid_data(X, y, train_idx, test_idx):
    x_train, y_train = X.iloc[train_idx, :], y.iloc[train_idx]
    x_val, y_val = X.iloc[test_idx, :], y.iloc[test_idx]
    return x_train, y_train, x_val, y_val

In [52]:
X_cv = dfBasisB.drop('targetProfitBuy', axis=1)
y_cv = dfBasisB['targetProfitBuy']

In [53]:
param_grid[0]

((32, 16), 'tanh', 0)

In [54]:
ic = []
scaler = StandardScaler()
#for params in param_grid:
params = param_grid[0]
dense_layers, activation, dropout = params
for batch_size in [64]:
#for batch_size in [64, 256]:
    # batch_size = 64
    # print(dense_layers, activation, dropout, batch_size)
    checkpoint_dir = checkpoint_path / str(dense_layers) / activation / str(dropout) / str(batch_size)
    if not checkpoint_dir.exists():
        checkpoint_dir.mkdir(parents=True, exist_ok=True)
    start = time()
    for fold, (train_idx, test_idx) in enumerate(cv.split(X_cv)):
        # get train & validation data
        x_train, y_train, x_val, y_val = get_train_valid_data(X_cv, y_cv, train_idx, test_idx)
        # scale features
        x_train = scaler.fit_transform(x_train)
        x_val = scaler.transform(x_val)
        # set up dataframes to log results
        preds = y_val.to_frame('actual')
        r = pd.DataFrame(index=y_val.groupby(level='epoch').size().index)
        # create model based on validation parameters
        model = make_model(dense_layers, activation, dropout, x_train.shape[1])
        # cross-validate for 20 epochs
        for epoch in range(1):            
            model.fit(x_train,
                      y_train,
                      batch_size=batch_size,
                      epochs=4,
                      verbose=1,
                      shuffle=True,
                      validation_data=(x_val, y_val))
            model.save_weights((checkpoint_dir / f'ckpt_{fold}_{epoch}').as_posix())
            preds[epoch] = model.predict(x_val).squeeze()
            
            
            

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_9 (Dense)             (None, 32)                224       
                                                                 
 activation_6 (Activation)   (None, 32)                0         
                                                                 
 dense_10 (Dense)            (None, 16)                528       
                                                                 
 activation_7 (Activation)   (None, 16)                0         
                                                                 
 dropout_3 (Dropout)         (None, 16)                0         
                                                                 
 dense_11 (Dense)            (None, 1)                 17        
                                                                 
Total params: 769
Trainable params: 769
Non-trainable 

In [55]:
preds.head(50)

Unnamed: 0_level_0,actual,0
epoch,Unnamed: 1_level_1,Unnamed: 2_level_1
1588204800,0.0,0.456728
1588291200,0.0,0.449426
1588550400,0.0,0.455329
1588636800,1.0,0.456678
1588723200,0.0,0.458998
1588809600,1.0,0.458089
1588896000,1.0,0.454523
1589155200,0.0,0.456649
1589241600,0.0,0.455886
1589328000,1.0,0.460299


In [None]:
ic = []
scaler = StandardScaler()
for params in param_grid:
    dense_layers, activation, dropout = params
    # for batch_size in [64, 256]:
    
    batch_size = 64
    print(dense_layers, activation, dropout, batch_size)
    checkpoint_dir = checkpoint_path / str(dense_layers) / activation / str(dropout) / str(batch_size)
    if not checkpoint_dir.exists():
        checkpoint_dir.mkdir(parents=True, exist_ok=True)
    start = time()
    for fold, (train_idx, test_idx) in enumerate(cv.split(X_cv)):
        # get train & validation data
        x_train, y_train, x_val, y_val = get_train_valid_data(X_cv, y_cv, train_idx, test_idx)
        # scale features
        x_train = scaler.fit_transform(x_train)
        x_val = scaler.transform(x_val)
        # set up dataframes to log results
        preds = y_val.to_frame('actual')
        r = pd.DataFrame(index=y_val.groupby(level='epoch').size().index)
        # create model based on validation parameters
        model = make_model(dense_layers, activation, dropout, x_train.shape[1])
        # cross-validate for 20 epochs
        for epoch in range(20):            
            model.fit(x_train,
                      y_train,
                      batch_size=batch_size,
                      epochs=1,
                      verbose=0,
                      shuffle=True,
                      validation_data=(x_val, y_val))
            model.save_weights((checkpoint_dir / f'ckpt_{fold}_{epoch}').as_posix())
            preds[epoch] = model.predict(x_val).squeeze()
            r[epoch] = preds.groupby(level='epoch').apply(lambda x: spearmanr(x.actual, x[epoch])[0]).to_frame(epoch)

            # print(format_time(time()-start), f'{fold + 1:02d} | {epoch + 1:02d} | {r[epoch].mean():7.4f} | {r[epoch].median():7.4f}')

        ic.append(r.assign(dense_layers=str(dense_layers), 
                           activation=activation, 
                           dropout=dropout,
                           batch_size=batch_size,
                           fold=fold))    
        
        t = time()-start
        pd.concat(ic).to_hdf(results_path / 'scores.h5', 'ic_by_day')