# Strategy Backtesing 
---

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import scale
import tushare as ts
import os

## Define stats data parsing functions
Here we are just grabbing the fundamental analysis code we have learnt so far in the previous section and convert them to functions so that they can be called easily.

In [None]:
def get_stats(year,quarter):
    '''
    This function returns the stats data for all stocks at given year-quarter combinations
    year = yyyy
    quarter = 1,2,3 or 4
    '''
    df_profit = ts.get_profit_data(year,quarter)
    df_op = ts.get_operation_data(year,quarter)
    df_growth = ts.get_growth_data(year,quarter)
    df_debt = ts.get_debtpaying_data(year,quarter)
    df_cash = ts.get_cashflow_data(year,quarter)
    
    return df_profit,df_op,df_growth,df_debt,df_cash

def process_stats(df):
    '''
    Filter only SH stocks that are not *ST
    '''
    new_df = df[(df['code'].str.startswith('600') & (df['name'].str.contains('ST') == False))]
    new_df.set_index('code',inplace=True)
    
    return new_df

def drop_stats(df):
    '''
    Drop useless columns and na rows
    '''
    new_df = df.drop(['net_profits','business_income','arturndays','inventory_days','currentasset_days'],axis=1)
    new_df.dropna(inplace=True)
    
    return new_df

## Read and process stats data

Suppose we want to decide which stocks are likely to beat the market based on 2020Q1 data, we need to parse the data from Tushare,reformat it in the way we need, and then feed it into our model to make predictions.

In [None]:
year = 2020
quarter = 1
stats_dfs = get_stats(year,quarter)

main_df = pd.DataFrame()

for each_df in stats_dfs:
    each_df_new = process_stats(each_df)
    if main_df.empty:
        main_df = each_df_new
    else:
        each_df_new.drop('name',axis=1,inplace=True)
        main_df = main_df.join(each_df_new,how='inner')
    
main_df = drop_stats(main_df)

##  Load model and predict
Load the machine learning model which we trained earlier and make predictions based on new data.

In [None]:
import pickle

with open('beat_clf.pkl','rb') as f:
    clf_svm = pickle.load(f)

X = scale(main_df.drop('name',axis=1))
y = clf_svm.predict(X)

y = pd.Series(y,index=main_df.index,name='beat')
main_df = main_df.join(y)
pool_df = main_df[main_df['beat'] == 1]

`pool_df` is the pool of stocks that we can invest in as the model predicts that they are likely to beat the market.
Once we have narrowed down our investment targets, we can apply the algo trading techiniques discussed earlier to this pool and see how our returns will be.

## Define price data processing functions
These functions all come from the algo trading notebook

In [None]:
def mtm_func(stock_price,time_frame=10):
    '''
    mtm(momentum)
    
    (current stock price/stock price n days ago) - 1
    
    '''
    mtm = stock_price / (stock_price.shift(time_frame)) - 1
    
    return mtm

def boll_func(stock_price,time_frame=10):
    '''
    boll(bollinger bands)
    
    mid line is n-day SMA
    upper line is SMA+2*std(std of the stock price in the past n days)
    lower line is SMA-2*std
    
    once the stock price goes beyond the upper line (or lower line), it signifies a sell (or buy) opportunity
    
    '''
    sma = stock_price.rolling(time_frame).mean()
    sigma = stock_price.rolling(time_frame).std()
    upper = sma + 2*sigma
    lower = sma - 2*sigma
    bolli = (stock_price - sma) / (2*sigma)
    
    return bolli, sma, upper, lower


In [None]:
def test_strategy(stockCode):
    
    df = pd.read_csv(stockCode)
    # initialize new columns
    df['close T-1'] = df['close'].shift(1)
    df['BOLLI'] = boll_func(df['close'])[0]
    df['MTM'] = mtm_func(df['close'])
    df['Daily profit'] = df['close'] - df['close T-1']
    df['Daily return'] = df['Daily profit'] / df['close']
    df['Cost'] = 0
    df['Hold'] = 0
    df['Action'] = 0
    df['My daily profit'] = 0
    
    for i in range(0,len(df)-1):
        # when we are not holding the stock, plus both boll and mtm meet our conditions,then we buy,update our holding status and holding cost
        if df.loc[i,'Hold'] == 0 and df.loc[i,'BOLLI'] <= -1 and df.loc[i,'MTM'] >= 0:
            df.loc[i+1,'Hold'] = 1
            df.loc[i+1,'Action'] = 1
            df.loc[i+1,'Cost'] = df.loc[i+1,'close T-1']

        # when we are holding the stock, and the stock price hits our stop loss/stop profit target, we sell
        elif df.loc[i,'Hold'] == 1 and ((df.loc[i,'close'] - df.loc[i,'Cost']) / df.loc[i,'Cost'] <= -0.08 
                                        or (df.loc[i,'close']-df.loc[i,'Cost']) / df.loc[i,'Cost'] >= 0.08):
            df.loc[i+1,'Hold'] = 0
            df.loc[i+1,'Action'] = -1
            df.loc[i+1,'Cost'] = 0

        # if neither of our buy/sell action is triggered, just fill down the status and cost   
        else:
            df.loc[i+1,'Hold'] = df.loc[i,'Hold']
            df.loc[i+1,'Cost'] = df.loc[i,'Cost']

    df['My daily profit'] = df['Daily profit'] * df['Hold']
    df['My cumulative profit'] = df['My daily profit'].cumsum()
    df['My daily return'] = df['My daily profit'] / df['close T-1']
    daily_return_avg = df['My daily return'].sum() / df['Hold'].sum()
    
    return daily_return_avg, df

## Parse price data

In [57]:
start_date = '2020-03-01'
end_date = '2020-05-31'
for each_stock in pool_df.index:
    each_stock_data = ts.get_hist_data(each_stock,start_date,end_date)
    each_stock_data.sort_values(by='date',inplace=True)
    each_stock_data.to_csv('stock_pool/'+each_stock+'.csv',encoding='gbk')

本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/docu

本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/docu

## Process price data and test strategy

In [58]:
data_path = 'stock_pool/'
stock_codes = [i for i in os.walk(data_path)][0][2]
stock_returns = []
for each_stock in stock_codes:
    stock_return, stock_df = test_strategy(data_path + each_stock)
    stock_returns.append(stock_return)
#     print(each_stock,stock_return)
#     stock_df.plot(x='date',y='My cumulative profit',label=each_stock)

ls = np.array([i for i in stock_returns if np.isnan(i) == False])
print(ls.mean())
    

  daily_return_avg = df['My daily return'].sum() / df['Hold'].sum()


-0.0027119918138284885


In [59]:
ls

array([ 0.01196986, -0.00179906, -0.0015171 , -0.00294083, -0.00279432,
       -0.01770736,  0.00317805, -0.01008518])

SH index falled from 2970 on Mar 2nd to 2852 on May 29th, a return of -0.03973, where as our portfolio has a return of -0.000199. We can say that our investment has beated the market since it has a smaller drawdown.

## *What if we don't have a pool?

In [60]:
start_date = '2020-03-01'
end_date = '2020-05-31'
for each_stock in main_df.index:
    each_stock_data = ts.get_hist_data(each_stock,start_date,end_date)
    each_stock_data.sort_values(by='date',inplace=True)
    each_stock_data.to_csv('stock_all/'+each_stock+'.csv',encoding='gbk')
    
data_path = 'stock_all/'
stock_codes = [i for i in os.walk(data_path)][0][2]
stock_returns = []
for each_stock in stock_codes:
    stock_return, stock_df = test_strategy(data_path + each_stock)
    stock_returns.append(stock_return)
#     print(each_stock,stock_return)
#     stock_df.plot(x='date',y='My cumulative profit',label=each_stock)

ls = np.array([i for i in stock_returns if np.isnan(i) == False])
print(ls.mean())
    

本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/docu

本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/docu

本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/document/2
本接口即将停止更新，请尽快使用Pro版接口：https://tushare.pro/docu

  daily_return_avg = df['My daily return'].sum() / df['Hold'].sum()


0.0017520406260295538


In [61]:
ls

array([-0.00173409,  0.01196986, -0.00146262, -0.00179906, -0.0015171 ,
       -0.00017095, -0.0019615 , -0.00294083, -0.00279432, -0.01770736,
        0.0074322 ,  0.00317805,  0.04130973, -0.00986611,  0.00051289,
        0.02189103,  0.00241915, -0.0316235 ,  0.02560894, -0.01008518,
        0.0010019 ,  0.00807439,  0.00056142])

We can infer from the results that it turns out not having a pool yields a better return than having a pool! 
<br>Disappointed as we may be, but the stock market is really a place where a invisible hand exists! 