# Bacbacktesting Monte Carlo Based

"Backtesting is used in technical analysis to assess the viability of trading strategies without real capital by simulating past market conditions, considering factors such as historical data quality, strategy accuracy, and risk management." [1]

[1] [Quantified Strategies, Backtesting Technical Analysis (Results)](https://tradingstrategy.medium.com/backtesting-technical-analysis-bb34ec4b423c)

**Issue:** The historical series may have a noisy component that can lead to poor analysis.

**Solution:** By decomposing the series into trend, seasonality and residual (also known as noise), we can simulate many possible historical series. The noise can be composed of many types of interferences in stock prices: political and economic transition effects, speculative attacks, etc. So, the residual can well represent the transitory effects.

## Backtesting Analysis

Overview about the proposed method:

![](bt.png)


## Backtesting Analysis Monte Carlo Based

Instead of doing just one simulation, we run many simulations, many variations of the past. 

![](btmc.png)


### Monte Carlo Simulations

Using the historical series, we replicate the series N times, doing N simulations. The optimizer step will rank the best strategies (and parameters) for each company.

![](mcsim.png)

### Replication Step

We use the parts of the decomposed series to reconstruct a main series (clean signal), without the residual. With the residual, we generate N noisy signals (series) with the same standard deviation and mean (similar noisy properties) combining with the clean signal. This is the core of the method.

![](repstep.png)



- Author: Israel Oliveira [\[e-mail\]](mailto:'Israel%20Oliveira%20'<prof.israel@gmail.com>)

In [1]:
%load_ext watermark

In [2]:
import pandas as pd
from src.helpers import get_data_history, company_code_list, get_data_history_BackMC
import src.advanced_strategies as advanced_strategies
import src.momentum as momentum
import src.overlap as overlap
import json
import numpy as np
from statsmodels.tsa.seasonal import seasonal_decompose
from tqdm.contrib.concurrent import process_map

In [11]:
from tqdm.notebook import tqdm

# from glob import glob

import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib import rcParams
from cycler import cycler

rcParams['figure.figsize'] = 12, 8 # 18, 5
rcParams['axes.spines.top'] = False
rcParams['axes.spines.right'] = False
rcParams['axes.grid'] = True
rcParams['axes.prop_cycle'] = cycler(color=['#365977'])
rcParams['lines.linewidth'] = 2.5

# import seaborn as sns
# sns.set_theme()

# pd.set_option("max_columns", None)
# pd.set_option("max_rows", None)
# pd.set_option('display.max_colwidth', None)

from IPython.display import Markdown, display
def md(arg):
    display(Markdown(arg))

# from pandas_profiling import ProfileReport
# #report = ProfileReport(#DataFrame here#, minimal=True)
# #report.to

# import pyarrow.parquet as pq
# #df = pq.ParquetDataset(path_to_folder_with_parquets, filesystem=None).read_pandas().to_pandas()

# import functools
# import operator
# def flat(a):
#     return functools.reduce(operator.iconcat, a, [])


######### LoadDicts

# import json
# from glob import glob
# from typing import Any
# from typing import NewType

# def np_encoder(object):
#     if isinstance(object, np.generic):
#         return object.item()


# DictsPathType = NewType("DictsPath", str)


# def load_file_json(path: DictsPathType):
#     with open(path, "r") as f:
#         return json.load(f)


# def dump_file_json(path: DictsPathType, var: Any):
#     with open(path, "w") as f:
#         return json.dump(var, f, indent=4, default=np_encoder)


# class LoadDicts:
#     def __init__(
#         self, dict_path: DictsPathType = "./data", ignore_errors: bool = False
#     ):
#         Dicts_glob = Path().glob(f"{dict_path}/*.json")
#         self.List = []
#         self.Dict = {}
#         self.not_attr = []
#         for path_json in Dicts_glob:
#             try:
#                 name = path_json.as_posix().split("/")[-1].replace(".json", "")
#                 self.List.append(name)
#                 self.Dict[name] = load_file_json(path_json)
#                 if name.isidentifier() and not iskeyword(name):
#                     setattr(self, name, self.Dict[name])
#                 else:
#                     self.not_attr.append(name)
#             except Exception as e:
#                 print(f"Error trying to load the file: {path_json.absolute()}: ")
#                 if not ignore_errors:
#                     raise e
#                 print(e)
                
#     def __len__(self):
#         return len(self.List)
    
#     def items(self):
#         for item in self.List:
#             yield item, self.Dict[item]

#     def __repr__(self) -> str:
#         return "LoadDicts: {}".format(", ".join(self.List))

In [4]:
# Run this cell before close.
%watermark -d --iversion -b -r -g -m -v
!cat /proc/cpuinfo |grep 'model name'|head -n 1 |sed -e 's/model\ name/CPU/'
!free -h |cut -d'i' -f1  |grep -v total

Python implementation: CPython
Python version       : 3.11.10
IPython version      : 8.27.0

Compiler    : GCC 13.2.0
OS          : Linux
Release     : 6.8.0-49-generic
Machine     : x86_64
Processor   : x86_64
CPU cores   : 20
Architecture: 64bit

Git hash: c2de496bb1695943dcf959fcf4b242b5034b7398

Git repo: https://github.com/ysraell/forward-testing-mc.git

Git branch: main

numpy     : 2.1.3
pandas    : 2.2.3
json      : 2.0.9
matplotlib: 3.9.2

CPU	: 12th Gen Intel(R) Core(TM) i7-12700
Mem:            31G
Swap:          3.7G


# 1. Simulation Step

In [5]:
data_path = '/home/israel/tmp'
intraday_data = f"{data_path}/intraday_data"

In [None]:
strategies = [
    advanced_strategies.ADXRSI,
    advanced_strategies.WilliansppRMACD,
    overlap.SMA,
    overlap.SuperTrend,
    momentum.AwesomeOscillator,
    momentum.CommodityChannelIndex,
    momentum.CoppockCurve,
    momentum.WilliansppR,
    momentum.MACD,
]


In [None]:
{class_obj.__name__ : [] for class_obj in strategies}

### Methods and parameters

In [None]:
lookback_days_list = list(range(5, 65, 5))
strategies_params = {
    'ADXRSI': [
        {
            'LOOKBACK_STRATEGY_PARAM': lookback_days_list,
            'RSI': True,
            'ADX': True,
        },
        {
            'LOOKBACK_STRATEGY_PARAM': lookback_days_list,
            'RSI': False,
            'ADX': True,
        },
        {
            'LOOKBACK_STRATEGY_PARAM': lookback_days_list,
            'RSI': True,
            'ADX': False,
        },
    ],
    'MACD': [
        {
            'slow': 26,
            'fast': 12,
            'smooth': 9
        },
        {
            'slow': 20,
            'fast': 10,
            'smooth': 5
        },
        {
            'slow': 25,
            'fast': 15,
            'smooth': 5
        },
        {
            'slow': 25,
            'fast': 15,
            'smooth': 10
        },
    ],
    'WilliansppRMACD': [
        {
            'LOOKBACK_STRATEGY_PARAM': lookback_days_list,
            'slow': 26,
            'fast': 12,
            'smooth': 9
        },
        {
            'LOOKBACK_STRATEGY_PARAM': lookback_days_list,
            'slow': 20,
            'fast': 10,
            'smooth': 5
        },
        {
            'LOOKBACK_STRATEGY_PARAM': lookback_days_list,
            'slow': 25,
            'fast': 15,
            'smooth': 5
        },
        {
            'LOOKBACK_STRATEGY_PARAM': lookback_days_list,
            'slow': 25,
            'fast': 15,
            'smooth': 10
        },
    ],
    'SMA': [
        {
            'SHORT_WINDOW': 20,
            'LONG_WINDOW': 50,
        },
        {
            'SHORT_WINDOW': 10,
            'LONG_WINDOW': 30,
        },
        {
            'SHORT_WINDOW': 5,
            'LONG_WINDOW': 20,
        },
        {
            'SHORT_WINDOW': 5,
            'LONG_WINDOW': 15,
        },
        {
            'SHORT_WINDOW': 5,
            'LONG_WINDOW': 10,
        },
    ],
    'SuperTrend': [
        {
            'LOOKBACK_STRATEGY_PARAM': lookback_days_list,
            'MULTIPLIER': 2
        },
        {
            'LOOKBACK_STRATEGY_PARAM': lookback_days_list,
            'MULTIPLIER': 3
        },
        {
            'LOOKBACK_STRATEGY_PARAM': lookback_days_list,
            'MULTIPLIER': 5
        }
    ],
    'AwesomeOscillator': [
        {
            'short_period': 5,
            'long_period': 15
        },
        {
            'short_period': 5,
            'long_period': 25
        },
        {
            'short_period': 5,
            'long_period': 35
        },
        {
            'short_period': 10,
            'long_period': 35
        },
    ],
    'CommodityChannelIndex': [
        {
            'LOOKBACK_STRATEGY_PARAM': lookback_days_list
        },
    ],
    'CoppockCurve': [
        {
            'shortROC': 5,
            'longROC': 15,
            'lookbackWMA': 10,
        },
        {
            'shortROC': 10,
            'longROC': 20,
            'lookbackWMA': 15,
        },
        {
            'shortROC': 15,
            'longROC': 30,
            'lookbackWMA': 20,
        },
        {
            'shortROC': 20,
            'longROC': 35,
            'lookbackWMA': 25,
        },
    ],
    'WilliansppR': [
        {
            'LOOKBACK_STRATEGY_PARAM': lookback_days_list
        }
    ]
}

# 1.1 Backtesting Traditional

In [None]:
results = []
periods = ["1st", "2nd", "year"]
total = len(periods) * len(company_code_list) * len(strategies)
with tqdm(total=total) as pbar:
    for period in periods:
        for comp_code in company_code_list:
            df_past, df_future = get_data_history(comp_code, intraday_data, period)
            for class_obj in strategies:
                class_params = strategies_params[class_obj.__name__]
                run_params = []
                for params in class_params:
                    if 'LOOKBACK_STRATEGY_PARAM' not in params:
                        run_params.append(params)
                    else:
                        for lb in params['LOOKBACK_STRATEGY_PARAM']:
                            tmp = dict(params)
                            tmp['LOOKBACK_STRATEGY_PARAM'] = lb
                            run_params.append(tmp)
                for params in run_params:
                    tmp = [
                        period,
                        comp_code,
                        class_obj.__name__,
                        json.dumps(params),
                    ]   
                    algo = class_obj(params)
                    algo.apply_strategy(df_past, 1.0)
                    M_final, M_diffs = algo.metrics()
                    tmp.extend([M_final, sum(np.array(M_diffs) > 0), sum(np.array(M_diffs) < 0)])
                    algo.apply_strategy(df_future, 1.0)
                    M_final, M_diffs = algo.metrics()
                    tmp.extend([M_final, sum(np.array(M_diffs) > 0), sum(np.array(M_diffs) < 0)])
                    results.append(tmp)
                pbar.update(1)

In [None]:
df = pd.DataFrame(results, columns=['period', 'comp_code', 'strategy', 'params', 'M_past', 'L_past', 'W_past', 'M_future', 'L_future', 'W_future'])
df.to_csv(f"{data_path}/results_backtesting.csv", sep=';', index=False)

In [None]:
df.describe()

# 1.2 Backstesting MC

In [None]:
N_MC_Sims = 100

In [None]:
results = []
periods = ["1st", "2nd", "year"]
total = len(periods) * len(company_code_list) * len(strategies)
with tqdm(total=total, smoothing=0) as pbar:
    for period in periods:
        for comp_code in company_code_list:
            df_past, df_future = get_data_history(comp_code, intraday_data, period)
            future_prev_list = get_data_history_BackMC(comp_code, intraday_data, period, N_MC_Sims)
            for class_obj in strategies:
                class_params = strategies_params[class_obj.__name__]
                run_params = []
                for params in class_params:
                    if 'LOOKBACK_STRATEGY_PARAM' not in params:
                        run_params.append(params)
                    else:
                        for lb in params['LOOKBACK_STRATEGY_PARAM']:
                            tmp = dict(params)
                            tmp['LOOKBACK_STRATEGY_PARAM'] = lb
                            run_params.append(tmp)
                for params in run_params:
                    algo = class_obj(params)
                    algo.apply_strategy(df_future, 1.0)
                    M_final_future, M_diffs_future = algo.metrics()
                    for sim_n, df_future_prev in enumerate(future_prev_list):
                        tmp = [
                            period,
                            comp_code,
                            class_obj.__name__,
                            json.dumps(params),
                            sim_n+1
                        ]
                        algo.apply_strategy(df_future_prev, 1.0)
                        M_final, M_diffs = algo.metrics()
                        tmp.extend([M_final, sum(np.array(M_diffs) > 0), sum(np.array(M_diffs) < 0)])
                        M_final, M_diffs = M_final_future, M_diffs_future
                        tmp.extend([M_final, sum(np.array(M_diffs) > 0), sum(np.array(M_diffs) < 0)])
                        results.append(tmp)
                pbar.update(1)

In [None]:
df = pd.DataFrame(results, columns=['period', 'comp_code', 'strategy', 'params', 'sim_n', 'M_past', 'L_past', 'W_past', 'M_future', 'L_future', 'W_future'])
df.to_csv(f"{data_path}/results_backtestingMC.csv", sep=';', index=False)

In [None]:
df.describe()

In [None]:
df

# 2. Analysis step

In [6]:
df_bt = pd.read_csv(f"{data_path}/results_backtesting.csv", sep=';')
df_btmc = pd.read_csv(f"{data_path}/results_backtestingMC.csv", sep=';')

# 2.1 Backtesting Traditional

In [26]:
group_cols = ['period', 'comp_code']
target_col = 'M_future'
top_cols = ['M_past'] # ['M_past', '-L_past', 'W_past', 'WL_past', 'M*WL_past']
perf_rounding = 4

def gen_new_metrics(df):
    df['-L_past'] = - df['L_past']
    df['WL_past'] = df['W_past'] - df['L_past']
    df['M*WL_past'] = df['W_past'] * df['WL_past']
    return df

def gen_results(df, title, group_cols=group_cols):
    #md(f'### {title}')
    results = {}
    target_series = {}
    for top_col in top_cols:
        top_index = df.sort_values(top_col, ascending=False).groupby(group_cols)[top_col].head(1).index
        df_top = df.loc[top_index]
        top_desc = df_top.describe()[target_col]
        sum_M = df_top[target_col].sum()
        top_mean = top_desc['mean']
        top_std = round(top_desc['std'], perf_rounding)
        top_min = round((top_desc['min'] -1), perf_rounding)
        top_max = round((top_desc['max'] -1), perf_rounding)
        overall_perf = "+{:.2f}%".format(100*float(round((top_mean - 1), perf_rounding)))
        md(f"## Overall perf.: {overall_perf}")
        target_series[top_col] = sorted(df_top[target_col])
    return target_series

def clean_patter(patt):
    patt = "".join(patt.to_list()).replace(' ','').replace('\"','')
    for col in hash_cols:
        patt = patt.replace(col, '')
    return patt

In [27]:
df = df_bt.copy()
df = gen_new_metrics(df_bt)
ts_bt = gen_results(df_bt, 'BackTesting (Trad)')

### BackTesting (Trad)

## Overall perf.: +19.48%

# 1.2 Backstesting MC

In [29]:
df = df_btmc.copy()
df = gen_new_metrics(df)
sim_col = 'sim_n'
hash_cols = ['period', 'comp_code', 'strategy', 'params']
hash_col = 'hash'
#df[hash_col] = df[hash_cols].apply(lambda x: clean_patter(x), axis=1)

In [31]:
df_agg = df.groupby(hash_cols).sum().reset_index()

results = {}
mc_ts = {}
for top_col in top_cols:
    top_index = df_agg.sort_values(top_col, ascending=False).groupby(group_cols)[top_col].head(1).index
    df_top_sim = df_agg.loc[top_index][['period', 'comp_code', 'strategy', 'params']].reset_index(drop=True)
    df_top = pd.merge(df_top_sim, df_bt, on=list(df_top_sim.columns))
    df_top = df.loc[top_index]
    top_desc = df_top.describe()[target_col]
    sum_M = df_top[target_col].sum()
    top_mean = top_desc['mean']
    top_std = round(top_desc['std'], perf_rounding)
    top_min = round((top_desc['min'] -1), perf_rounding)
    top_max = round((top_desc['max'] -1), perf_rounding)
    overall_perf = "+{:.2f}%".format(100*float(round((top_mean - 1), perf_rounding)))
    md(f"## Overall perf.: {overall_perf}")


## Overall perf.: +26.40%

# 3. Statistic validation

In [None]:
Results = []
for col in top_cols:
    a = ts_bt[col]
    b = mc_ts[col]
    result = stats.ttest_ind(a,b)
    Results.append(['ttest_ind', col, result.pvalue, result.statistic])
    result = stats.mannwhitneyu(a, b)
    Results.append(['mannwhitneyu', col, result.pvalue, result.statistic])
    result = stats.bws_test(a, b)
    Results.append(['bws_test', col, result.pvalue, result.statistic])
    result = stats.ranksums(a, b)
    Results.append(['ranksums', col, result.pvalue, result.statistic])
    result = stats.brunnermunzel(a, b)
    Results.append(['brunnermunzel', col, result.pvalue, result.statistic])
    result = stats.mood(a, b)
    Results.append(['mood', col, result.pvalue, result.statistic])
    result = stats.ansari(a, b)
    Results.append(['ansari', col, result.pvalue, result.statistic])
    result = stats.cramervonmises_2samp(a, b)
    Results.append(['cramervonmises_2samp', col, result.pvalue, result.statistic])
    result = stats.epps_singleton_2samp(a, b)
    Results.append(['epps_singleton_2samp', col, result.pvalue, result.statistic])
    result = stats.ks_2samp(a, b)
    Results.append(['ks_2samp', col, result.pvalue, result.statistic])

In [58]:
df_stats_results = pd.DataFrame(Results, columns=['stats_method', 'col','p_value','t_value']).sort_values('p_value')

# 3.1 Main stats tests

In [59]:
df_stats_results.groupby('stats_method').head(1)

Unnamed: 0,stats_method,col,p_value,t_value
4,brunnermunzel,M_past,0.000133,4.27409
2,bws_test,M_past,0.0003,7.925112
7,cramervonmises_2samp,M_past,0.000549,1.272135
3,ranksums,M_past,0.000597,-3.433172
1,mannwhitneyu,M_past,0.000619,121.5
9,ks_2samp,M_past,0.001401,0.541667
0,ttest_ind,M_past,0.01213,-2.611585
8,epps_singleton_2samp,M_past,0.05177,9.403401
16,ansari,-L_past,0.060378,345.5
15,mood,-L_past,0.070148,-1.810954


# Test-t

In [60]:
df_stats_results.query('stats_method == "ttest_ind"')

Unnamed: 0,stats_method,col,p_value,t_value
0,ttest_ind,M_past,0.01213,-2.611585
10,ttest_ind,-L_past,0.289874,1.070737
30,ttest_ind,WL_past,0.365385,0.914189
40,ttest_ind,M*WL_past,0.411836,0.828197
20,ttest_ind,W_past,0.502833,0.675352


## 3.1 Conclusions

- With the MC approach, it is possible to obtain a greater financial return.
- Have stistic significance (test-t, p-value <5%)

## 3.2 Next Steps:

- Implement more strategies.
- Use more companies.- Use the forecast and try to predict the next day (or another period) to enrich the strategy.