<div class="alert alert-block alert-info">
    <h2 align="center"style="color: black;">Tradeset Starter Notebook</h2>
    <h3 align="center"style="color: black;">Build your Machine Learning-based Algorithmic Trading System</h3>
    <h5 align="center"style="color: black;"><em>Intelligence Can Solve Complexity</em></h5>
    <h5 align="center"><a href="http://tradeset.ai" style="color: blue;">Tradeset.ai</a></h5>
</div>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/tradeset/tradeset_notebooks/blob/main/notebooks/tradeset_notebook.ipynb)  [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://github.com/tradeset/tradeset_notebooks/blob/main/notebooks/tradeset_notebook.ipynb)

In this notebook, we'll build a profitable ML-based trading system using tradeset. We will utilize the tradeset API to define a classification target for identifying upward movements of USDJPY in the Forex market and label the data accordingly. Next, we'll get ML-ready features for USDJPY, train an ML prediction model, and assess potential profits through various backtesting strategies`. At tradeset, you can do historical experiments for free! if you haven't signed up on tradeset, [sign up](http://tradeset.ai) for free and get the API key. Let's dive in!

First of all, you need to install the __tradeset__ package using _pip_

In [None]:
!pip install --upgrade tradeset

Get an API key from your [tradeset profile](http://tradeset.ai/profilesetting).

In [None]:
API_KEY = "" # Paste your API key here

# Define Target
Some important assumptions and specifications of our system is as follows:

|  Factor             | Specification      |
|---------------------|--------------------|
| Markets             | _Forex, Crypto_    |
| Horizon             | _Intraday_         |
| Feature Frequency      | _5 Minutes_        | 
| Problem Type        | _Classification_   |

In this section you need to specify the classification problem that you are trying to solve and get the target dataframe. For example, lets say we want to train a prediction model that identifies upward movemets of 35 pips in USDJPY in the next 5 hours without downward movements of 5 pips. Using our API ypu can simply do it by `create_target`. In our Beta version of tradeset we only provide services for `USDJPY`.

In [None]:
import pandas as pd
from tradeset import create_target

forex_pair = 'USDJPY' # Define the forex pair for the trade. Crypto coins will be added in future
trade_mode = 'long' # Specify the trade mode (long or short)
target_look_ahead = 300 # Set the look-ahead period in minutes. It should be more than 5 minutes
target_take_profit = 35 # Specify the take profit in pips
target_stop_loss = 5 # Specify the stop loss in pips

target_token, target_name = create_target(
    forex_pair,
    trade_mode,
    target_look_ahead,
    target_take_profit,
    target_stop_loss,
    API_KEY
    )
df_target = pd.read_parquet(f"./{target_name}.parquet")
df_target.head()

# Get Features
Information Advantage is what you need to somehow predict some events of financial markets. That said, having informative features is a vital part of an ML-based algorithmic trading system. At tradeset we provide special features for each asset, making them good predictors on which you can train valuable models. Use `get_features` with `feature_type = 'train'` to get ML-ready features.

### ML-ready features

Please note: This may take a short while.

In [None]:
from tradeset import get_features

get_features(forex_pair, api_key = API_KEY, feature_type = 'train')
df_features = pd.read_parquet(f"{forex_pair}_train.parquet")
df_features.head()

### Raw Features
You can also get raw features which is 5-minute OHLC data.

In [None]:
get_features(forex_pair, api_key = API_KEY, feature_type = 'raw')
df_raw_features = pd.read_parquet(f"{forex_pair}_raw.parquet")
df_raw_features.head()

### Merging features and Target

In [None]:
df_all = df_features.merge(df_target,on="_time",how="inner")
# Rename the Target column 
df_all.rename(columns={f"{target_name}":"target"},inplace=True)
df_all.set_index("_time",inplace=True)
print(f'Min date:{df_all.index.min()} Max date:{df_all.index.max()}')
df_all.head()

# EDA
You can do some EDA, visualize data, see correlations and do some feature engineering. But we will skip this!

# Time-Series Cross Validation

In the field of quantitative finance, robust validation over an extended period is essential. Given the time series nature of the data, it's crucial not to shuffle during the train-test split. Introducing a gap between train and test sets (`train_test_gap_size`) is also advisable to prevent biases. To simplify this process, we've built `create_TS_cross_val_folds` and `run_model_on_folds` into the [tradeset public package](https://github.com/tradeset/tradeset-public/). However, you're free to employ your custom train-test split if preferred.

In [None]:
from tradeset import create_TS_cross_val_folds, run_model_on_folds

early_stopping_rounds = None
cross_val_config = {
  "n_splits": 18, # Using 18 folds for time-series cross-validation
  "max_train_size": 288 * 350, # Considering the frequency of dataset, which is 5 minutes, each day is 288 rows of data so the train size is 350 days
  "test_size": 288 * 30, # The test size is 30 days
  "early_stopping_rounds": early_stopping_rounds,
  "train_test_gap_size": 288 * 30 , # The gap between train and test on each fold is 30 days
}

folds = create_TS_cross_val_folds(
  df_all = df_all,
  max_train_size = cross_val_config["max_train_size"],
  n_splits = cross_val_config["n_splits"],
  test_size = cross_val_config["test_size"],
  early_stopping_rounds = cross_val_config["early_stopping_rounds"],
  train_test_gap_size = cross_val_config["train_test_gap_size"],
)

# Model

Now, let's train an XGBoost model and perform cross-validation. The below parameters of the model is what we found kind of optimal during few experiments. But you can modify  these parameters and other XGBoost parameters that are not here.

In [None]:
from xgboost import XGBClassifier

xgboost_params =  {
        # 'tree_method':'hist',
        # 'device' : 'cuda',#  None,
        "objective": "binary:logistic",
        "max_depth": 5,
        "learning_rate": 0.05,
        "n_estimators": 200,
        "early_stopping_rounds" : early_stopping_rounds,
        "min_child_weight": 1,
        "subsample": 0.5,
        "colsample_bytree": 0.8,
        "scale_pos_weight" :1,
        'random_state': 42,
    }

In [None]:
model = XGBClassifier(**xgboost_params)

# Run Model on Folds
Note: the `model` in the output of `run_model_on_folds` is the model trained on the last fold which we will use for test data. This is important to use the last fold model. Beacuse unlike other domains and problems, the data drift is critical due to the changing of market dynamic. That said, we should not train the model on all historical data and instead we should optimize the amount of `max_train_size`.

In [None]:
evals, df_prediction, model = run_model_on_folds(
    df = df_all,
    folds = folds,
    model = model,
    early_stopping_rounds = early_stopping_rounds,
    )

### Save model

In [None]:
import joblib
#save model
joblib.dump(model, 'usdjpy_long_xgb.joblib') 

In [None]:
overal_precision = df_prediction[(df_prediction.target==1)&
                    (df_prediction.model_prediction==1)].shape[0]/df_prediction[df_prediction.model_prediction==1].shape[0]
num_signals = df_prediction[df_prediction.model_prediction==1].shape[0]
print(f"Overal Precison: {overal_precision*100:.1f}%")
print(f"Number of Signals: {num_signals}")

## Estimate profit without backtest
If we consider a loss with the value of stop-loss for all `False Positives` (FP) signals then we will have a pessimistic estimation of profit.

In [None]:
spread = 2
pess_est_profit = overal_precision*num_signals*target_take_profit - (1-overal_precision)*num_signals*target_stop_loss - num_signals*spread 
print(f"Pessimistic Estimation of Profit: {pess_est_profit:.0f} pips")

But most of the times, that's not the case. This is because many FP signals are FP because they couldn't reach to the take-profit level during look-ahead period. But some of these FP signals are even profitable. Later in the notebook, we will calculate an accurate profit in the Backtest section. In the below animated GIF you can see examples of four different types of signals. 

![SegmentLocal](target_signal_type.gif "segment")

# Backtest

## (I) Using default startegy
In this section we use the same strategy as the one used in target definition to backtest. First we need to make a dataframe containing model signals

In [None]:
# ####### TEMP
# df_prediction.to_parquet('df_prediction.parquet')
# df_prediction = pd.read_parquet('df_prediction.parquet')
# TEMP ########

It is time to backtest our default strategy. we should also specify `volume` of each trade, `initial balance` and `spread`. Use tags to better organize your experiments, mention ML model & its parameters. For example, you can use the `tag`: _XGB Param1 Strg1_. __NOTE__: A tag should be less than 30 characters.

In [None]:
from tradeset import backtest_strategy

strategy_config = {
    'target_token': target_token,
    'volume': 0.1,
    'initial_balance': 3000,
    'spread': 2,
    'tag' : "XGB Param1 Strg1",
}
backtest_results, backtest_df = backtest_strategy(df_prediction[["model_prediction"]], strategy_config, API_KEY)

In [None]:
backtest_df.net_profit.cumsum().plot()
backtest_results

In [None]:
backtest_df

## (II) Modify Startegy

In [None]:
strategy_config_modified = {
    'target_token': target_token,
    'volume': 0.1,
    'initial_balance': 3000,
    'spread': 2,
    "look_ahead": 400,
    "take_profit": 40,
    "stop_loss": 20,
    "tag" : "XGB Param1 Strg2",
}
backtest_results_modified, backtest_df_modified = backtest_strategy(df_prediction[["model_prediction"]],
                                                                     strategy_config_modified,
                                                                     API_KEY)

In [None]:
backtest_df_modified.net_profit.cumsum().plot()
backtest_results_modified

# NOTE
These results serve as baselines and can be enhanced by developing more accurate models, employing better strategies, and refining target definitions. Now it's your opportunity to create a more profitable algorithmic trading system.

## Compare Profits

In [None]:
import matplotlib.pyplot as plt
plt.figure(figsize=(8,6))
by = (pess_est_profit/strategy_config['initial_balance']*100,
        backtest_results['profit_percent'],
        backtest_results_modified['profit_percent'])
bx = range(len(by))
x_ticks_labels = ('No Backtest (pessimistic estimation)', 'Default Strategy', 'Modified Strategy')
plt.xticks(bx, x_ticks_labels, size='small')
plt.title('Profit Comparison')
plt.ylabel('Profit Percent')
plt.bar(bx,by, color = (0.1,0.1,0.7,0.6))
plt.show()

In [None]:
import numpy as np

fig, (ax1,ax2) = plt.subplots(2,figsize=(5,6))
by = (
         backtest_results['profit_percent'],
        backtest_results_modified['profit_percent'])
bydd = (
         backtest_results['max_draw_down']*-1,
        backtest_results_modified['max_draw_down']*-1)
bx = range(len(by))

ax1.bar(bx, bydd, color=(0.7,0.2,0,0.5), width=0.35)
ax2.bar(bx, by, color=(0.1,0.1,0.7,0.6), width = 0.35)
x_ticks_labels = ('Default Strategy', 'Modified Strategy')
plt.xticks(bx, x_ticks_labels, size='small')
ax2.set_ylabel('profit percent')
ax1.set_ylabel('maximum draw down')

plt.show()


## Plot Backtest 

In [None]:
# !pip install plotly

In [None]:
signal_time = [] 
signal_days = []
for i in range(backtest_df_modified.shape[0]):
    a = backtest_df_modified._time[i].date()
    if a not in signal_days:
        signal_days.append(a)
        signal_time.append(backtest_df_modified._time[i])

### plot first 5 days of trading

In [None]:
import plotly.graph_objects as go
import pandas as pd
idx = list(df_raw_features[df_raw_features._time.isin(signal_time)].index)
backtest_df_modified['date'] = [d.date() for d in backtest_df_modified['_time']]

for i in range(5): # to plot all days use rang(len(signal_days))
    
    df_plt = df_raw_features[idx[i]-5:idx[i]+288]
                    
    fig = go.Figure(data=[go.Candlestick(x=df_plt['_time'],
                    open=df_plt[f'{forex_pair}_M5_OPEN'], high=df_plt[f'{forex_pair}_M5_HIGH'],
                    low=df_plt[f'{forex_pair}_M5_LOW'], close=df_plt[f'{forex_pair}_M5_CLOSE'])
                          ])
    _time = signal_time[i]
    print('date: ',_time,' weekday: ',_time.weekday())
    print('N.o. Signals: ',backtest_df_modified[backtest_df_modified.date == _time.date()].shape[0])
    print('day net profit: ',int(backtest_df_modified[backtest_df_modified.date == _time.date()].net_profit.sum()))
    shapes = []
    annotations = []
    for _time in list(backtest_df_modified[backtest_df_modified.date == _time.date()]._time):
        
        shapes.append(dict(x0=_time, x1=_time, y0=0, y1=1, xref='x', yref='paper',line_width=2))
        annotations.append(dict(x=_time, y=0.05, xref='x', yref='paper',
            showarrow=False, xanchor='right', text=f'{int(backtest_df_modified[backtest_df_modified._time == _time].net_profit.iloc[0])}'))
            
    fig.update_layout(
#         title=target_info["target_symbol"],
        shapes = shapes,
        annotations = annotations)
    fig.show()

## Submit Test Predictions
For the sake of comparison, a recent subset of historical data is used for the competition. You can submit your prediction and compare your model and strategy performance with other data scientists, and see the potential results you can achieve. The results will be displayed in the `Leaderboard` section of your dashboard.

In [None]:
import pandas as pd 
from tradeset import get_features

get_features(forex_pair, api_key = API_KEY, feature_type = 'test')
df_features_test = pd.read_parquet(f"{forex_pair}_test.parquet")

In [None]:
#load saved model
import joblib
model = joblib.load('usdjpy_long_xgb.joblib')

In [None]:
df_features_test["model_prediction"] = model.predict(df_features_test)

In [None]:
test_preds = df_features_test[["model_prediction"]]
test_preds

It appears the baseline XGBoost model generates few signals. You can enhance it with your data science skills. As a motivation, with a single model, we could achieve up to 30% of profit for this test set

Please note: You should use the `target_token` you generated with `create_target`.

In [None]:
from tradeset import submit_to_leaderboard

submission_strategy = {
    'target_token': target_token,
    'volume': 1,
    "look_ahead": 480,
    "take_profit": 40,
    "stop_loss": 15,
}
submission_results, _ = submit_to_leaderboard(test_preds, submission_strategy, API_KEY)

In [None]:
submission_results