# Strategy Details

### Code Author: Beryl ZHENG

Reference: Gil Cohen " Polynomial Moving Regression Band Stocks Trading System"

Link to Reference: 

### The Main Ideas

The study develops a trading system using polynomial moving regression models to analyze Nasdaq100 stocks from 2017 to March 2024, demonstrating that these models can effectively identify stock trends and generate profitable trading signals. Among the polynomial models, the fourth-degree polynomial MRB achieved the highest average net profit. Thus, in the code below, I will use the fourth-degree MRB to first test its profitability and then to see if some simple approaches can improve the performances.

To avoid using future data, in this paper, the strategies will be open at open price and close at open price as well. 

In [3]:
# Collect the list of the S&P 500 companies from Wikipedia and save it to a file
import os
import requests
import pandas as pd

# ignore warnings
import warnings
warnings.filterwarnings("ignore")

# Get the list of S&P 500 companies from Wikipedia
url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
response = requests.get(url)
html = response.content
df = pd.read_html(html, header=0)[0]

tickers = df['Symbol'].tolist()

In [4]:
# Load the data from yahoo finance
import os
import yfinance as yf

def load_data(symbol):

    direc = 'data_2010_2024/'
    os.makedirs(direc, exist_ok=True)

    file_name = os.path.join(direc, symbol + '.csv')

    if not os.path.exists(file_name):

        ticker = yf.Ticker(symbol)
        df = ticker.history(start='2010-01-01', end='2024-10-31')

        df.to_csv(file_name)

    df = pd.read_csv(file_name, index_col=0)
    df.index = pd.to_datetime(df.index, utc=True).tz_convert('US/Eastern')
    df['date'] = df.index

    if len(df) == 0:
        os.remove(file_name)
        return None

    return df

holder = {}
ticker_with_data = []
for symbol in tickers:
    df = load_data(symbol)
    if df is not None:
        holder[symbol] = df
        ticker_with_data.append(symbol)

tickers = ticker_with_data[:]


print (f'Loaded data for {len(tickers)} companies')

BRK.B: No timezone found, symbol may be delisted
BF.B: No price data found, symbol may be delisted (1d 2010-01-01 -> 2024-10-31)


Loaded data for 501 companies


In [5]:
# Keep columns of 'Open' and 'Close' only
for ticker in tickers:
   holder[ticker] = holder[ticker][['Open', 'Close']]
   holder[ticker].columns = ['open', 'close']
   holder[ticker].index = holder[ticker].index.date

In [6]:
# Add the 50 days factors using the 'close' price and 50 days standard deviation
for ticker in tickers:
    holder[ticker]['price change'] = holder[ticker]['open'].pct_change()
    holder[ticker]['50 MA'] = holder[ticker]['close'].rolling(window=50).mean()
    holder[ticker]['50 STD'] = holder[ticker]['close'].rolling(window=50).std()
    holder[ticker]['50 MA^2'] = holder[ticker]['50 MA']**2
    holder[ticker]['50 MA^3'] = holder[ticker]['50 MA']**3
    holder[ticker]['50 MA^4'] = holder[ticker]['50 MA']**4
    holder[ticker].dropna(inplace=True)

In [7]:
# Update the list of tickers
tickers = [ticker for ticker in tickers if len(holder[ticker]) > 0]

# Regression Analysis

First, test the significance between the 

In [8]:
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import statsmodels.api as sm

regression_holder = []
for ticker in tickers:
    regression_holder.append(holder[ticker][['close', '50 MA']])
df_for_regression = pd.concat(regression_holder, axis=0, ignore_index=True)

# Calculate the correlation between the 50 days moving average and the close price
correlation = df_for_regression.corr()
print(f'Correlation between 50 days moving average and close price: {correlation.loc["50 MA", "close"]}')

# Perform the linear regression
X = df_for_regression['50 MA']
y = df_for_regression['close']

X = sm.add_constant(X)
model = sm.OLS(y, X).fit()
print(model.summary())

Correlation between 50 days moving average and close price: 0.9975261390134735
                            OLS Regression Results                            
Dep. Variable:                  close   R-squared:                       0.995
Model:                            OLS   Adj. R-squared:                  0.995
Method:                 Least Squares   F-statistic:                 3.511e+08
Date:                Wed, 27 Nov 2024   Prob (F-statistic):               0.00
Time:                        17:41:35   Log-Likelihood:            -7.2019e+06
No. Observations:             1743579   AIC:                         1.440e+07
Df Residuals:                 1743577   BIC:                         1.440e+07
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------

## Analysis on the Regression Results

From the ...

### Next Step

Now, we are going to test the profitability of the 4th-degree MRB and how to make some improvement based on the results.

#  Polynomial Moving Regression Band Stocks Trading System

We first apply the model without spliting training and test set to have an overview of the model.

In [9]:
# Develop the second degree polynomial regression of the 50 days moving average and the close price. The formula is y = a + bX + cX^2
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

model = LinearRegression() 

In [10]:
fourth_regression_holder = {}
for ticker in tickers:
    #print(df.shape)
    Y = holder[ticker]['open']
    X = holder[ticker][['50 MA', '50 MA^2','50 MA^3','50 MA^4']]    
    model.fit(X, Y)
    fourth_regression_holder[ticker] = model

# Caculate the upper and lower bounds of the 50 days moving average with the model and the 2 standard deviation from the regression model
for ticker in tickers:
    holder[ticker]['4th 50 MA Average'] = fourth_regression_holder[ticker].predict(holder[ticker][['50 MA', '50 MA^2','50 MA^3','50 MA^4']])
    holder[ticker]['4th 50 MA Upper'] =  holder[ticker]['4th 50 MA Average'] + holder[ticker]['50 STD']
    holder[ticker]['4th 50 MA Lower'] =  holder[ticker]['4th 50 MA Average'] - holder[ticker]['50 STD']

In [11]:
# Based on the model prediction, we can assign the signal to the stock data. 1 for buy, -1 for sell, 0 for no action
def assign_signal(df):
    df['signal'] = 0
    df['signal'] = np.where(df['close'] > df['4th 50 MA Upper'], 1, df['signal'])
    df['signal'] = np.where(df['close'] < df['4th 50 MA Lower'], -1, df['signal'])
    return df

In [12]:
# update the position based on the signal, 1 means holding long position, 0 means holding no position
def assign_position(df):
    df['position'] = df['signal']
    df['position'] = np.where(df['position'] == 0, np.nan, df['position'])
    df['position'] = df['position'].ffill()
    df['position'] = np.where(df['position'] == -1, 0, df['position'])
    df['position'] = df['position'].shift(1)
    return df

In [13]:
# Assign the signal and position to the data
for ticker in tickers:
    holder[ticker] = assign_signal(holder[ticker])
    holder[ticker]['signal'] = holder[ticker]['signal'].shift(1)
    holder[ticker] = assign_position(holder[ticker])
    holder[ticker]['profit/loss'] = 1000*(1+holder[ticker]['position']*holder[ticker]['price change']).cumprod()

In [14]:
# Count the loss and profit
profit_loss = []
count_profit = 0
count_loss = 0
for ticker in tickers:
    profit_loss.append(holder[ticker]['profit/loss'][-1] - 1000)
    if holder[ticker]['profit/loss'][-1] > 1000:
        count_profit += 1
    else:
        count_loss += 1

print(f'Total profit/loss: {sum(profit_loss)}')
print(f'Number of profitable stock: {count_profit}')
print(f'Number of loss stock: {count_loss}')

Total profit/loss: 1547748.569717605
Number of profitable stock: 434
Number of loss stock: 66


In [15]:
# Calculate the profit and loss for the S&P 500 index
profit_loss_sp500 = 0

for ticker in tickers:
    profit_loss_sp500 += holder[ticker]['open'][-1]/holder[ticker]['open'][0]*1000 - 1000
print(f'Total profit/loss for S&P 500: {profit_loss_sp500}')

Total profit/loss for S&P 500: 4828541.977913449


## Apply Training and Test Set to the Polynominal Band

In [16]:
# save the stock data after 2017  holder based on the date index
from datetime import datetime

test_start_date = datetime.strptime('2017-01-01', '%Y-%m-%d').date()
test_end_date = datetime.strptime('2024-10-31', '%Y-%m-%d').date()
train_start_date = datetime.strptime('2010-01-01', '%Y-%m-%d').date()
train_end_date = datetime.strptime('2016-12-31', '%Y-%m-%d').date()

holder_test = {}
for ticker in tickers:
    holder_test[ticker] = holder[ticker].loc[test_start_date:test_end_date]

holder_train = {}
for ticker in tickers:
    holder_train[ticker] = holder[ticker].loc[train_start_date:train_end_date]
    

In [17]:
# Check if all dataframe is non-empty in train and test. Create a list of non-empty tickers
tickers_with_data = []
for ticker in tickers:
    if not holder_train[ticker].empty and not holder_test[ticker].empty:
        tickers_with_data.append(ticker)

In [18]:
# Use the trainning data to predict the test data
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

model = LinearRegression()

fourth_regression_holder_train = {}

for ticker in tickers_with_data:
    Y = holder_train[ticker]['open']
    X = holder_train[ticker][['50 MA', '50 MA^2','50 MA^3','50 MA^4']]

    model.fit(X, Y)
    fourth_regression_holder_train[ticker] = model

for ticker in tickers_with_data:
    holder_test[ticker]['4th 50 MA Average'] = fourth_regression_holder_train[ticker].predict(holder_test[ticker][['50 MA', '50 MA^2','50 MA^3','50 MA^4']])
    holder_test[ticker]['4th 50 MA Upper'] =  holder_test[ticker]['4th 50 MA Average'] + holder_test[ticker]['50 STD']
    holder_test[ticker]['4th 50 MA Lower'] =  holder_test[ticker]['4th 50 MA Average'] - holder_test[ticker]['50 STD']

for ticker in tickers_with_data:
    holder_test[ticker] = assign_signal(holder_test[ticker])
    holder_test[ticker]['signal'] = holder_test[ticker]['signal'].shift(1)

for ticker in tickers_with_data:
    holder_test[ticker] = assign_position(holder_test[ticker])

    holder_test[ticker]['profit/loss'] = 1000*(1+holder_test[ticker]['position']*holder_test[ticker]['price change']).cumprod()

profit_loss_test = []
count_profit_test = 0
count_loss_test = 0

for ticker in tickers_with_data:
    profit_loss_test.append(holder_test[ticker]['profit/loss'][-1] - 1000)
    if holder_test[ticker]['profit/loss'][-1] > 1000:
        count_profit_test += 1
    else:
        count_loss_test += 1

print(f'Total profit/loss for test data: {sum(profit_loss_test)}')
print(f'Number of profitable stock for test data: {count_profit_test}')
print(f'Number of loss stock for test data: {count_loss_test}')


Total profit/loss for test data: 41294.97480625666
Number of profitable stock for test data: 174
Number of loss stock for test data: 302


In [19]:
holder_test['AAPL'].tail()

Unnamed: 0,open,close,price change,50 MA,50 STD,50 MA^2,50 MA^3,50 MA^4,4th 50 MA Average,4th 50 MA Upper,4th 50 MA Lower,signal,position,profit/loss
2024-10-24,229.727257,230.31662,-0.017515,226.418694,4.504401,51265.4252,11607450.0,2628144000.0,483527.967026,483532.471427,483523.462625,-1.0,0.0,1441.089396
2024-10-25,229.487523,231.155685,-0.001044,226.552347,4.544456,51325.966118,11628020.0,2634355000.0,484893.151361,484897.695817,484888.606905,-1.0,0.0,1441.089396
2024-10-28,233.063595,233.143494,0.015583,226.699186,4.637367,51392.520838,11650640.0,2641191000.0,486396.345782,486400.983149,486391.708415,-1.0,0.0,1441.089396
2024-10-29,232.843827,233.413193,-0.000943,226.854615,4.730503,51463.016149,11674620.0,2648442000.0,487991.283683,487996.014186,487986.55318,-1.0,0.0,1441.089396
2024-10-30,232.354358,229.847122,-0.002102,226.926336,4.748471,51495.561859,11685700.0,2651793000.0,488728.573754,488733.322225,488723.825283,-1.0,0.0,1441.089396


In [20]:
holder_test['AAPL'].iloc[200:210]

Unnamed: 0,open,close,price change,50 MA,50 STD,50 MA^2,50 MA^3,50 MA^4,4th 50 MA Average,4th 50 MA Upper,4th 50 MA Lower,signal,position,profit/loss
2017-10-18,37.603784,37.449074,0.004005,37.059215,0.782116,1373.385395,50896.584215,1886187.0,39.248193,40.030309,38.466077,-1.0,0.0,1143.643933
2017-10-19,36.743514,36.563019,-0.022877,37.038352,0.781141,1371.839551,50810.676767,1881944.0,39.228948,40.010089,38.447807,-1.0,0.0,1143.643933
2017-10-20,36.710692,36.626305,-0.000893,37.042712,0.778156,1372.162531,50828.621789,1882830.0,39.232974,40.01113,38.454819,-1.0,0.0,1143.643933
2017-10-23,36.776334,36.607559,0.001788,37.036571,0.780396,1371.707589,50803.345472,1881582.0,39.227303,40.007698,38.446907,-1.0,0.0,1143.643933
2017-10-24,36.635676,36.82555,-0.003825,37.023678,0.778408,1370.752765,50750.309585,1878963.0,39.215382,39.99379,38.436973,-1.0,0.0,1143.643933
2017-10-25,36.781023,36.663818,0.003967,36.999347,0.770051,1368.951656,50650.316924,1874029.0,39.19283,39.962881,38.422779,-1.0,0.0,1143.643933
2017-10-26,36.856021,36.898216,0.002039,36.982751,0.762935,1367.723838,50582.189528,1870668.0,39.177408,39.940343,38.414472,-1.0,0.0,1143.643933
2017-10-27,37.338895,38.220272,0.013102,37.007082,0.782759,1369.524111,50682.090968,1875596.0,39.200007,39.982766,38.417248,-1.0,0.0,1143.643933
2017-10-30,38.417186,39.080563,0.028879,37.050307,0.835697,1372.725221,50859.890352,1884375.0,39.239982,40.075678,38.404285,-1.0,0.0,1143.643933
2017-10-31,39.357164,39.62439,0.024468,37.105768,0.91086,1376.837994,51088.630737,1895683.0,39.29095,40.201809,38.38009,0.0,0.0,1143.643933


## Apply the Validation Set into the Model

In [21]:
# Train, test and validate the model
validate_start_date = datetime.strptime('2017-01-01', '%Y-%m-%d').date()
validate_end_date = datetime.strptime('2018-12-31', '%Y-%m-%d').date()

test_start_date = datetime.strptime('2019-01-01', '%Y-%m-%d').date()
test_end_date = datetime.strptime('2024-10-31', '%Y-%m-%d').date()

holder_validate = {}
for ticker in tickers_with_data:
    holder_validate[ticker] = holder_test[ticker].loc[validate_start_date:validate_end_date]
   # print(holder_validate[ticker])

holder_test_update = {}
for ticker in tickers_with_data:
    holder_test_update[ticker] = holder_test[ticker].loc[test_start_date:test_end_date]

# Select the stocks that generate profit in the validation period
profitable_tickers = []
for ticker in tickers_with_data:
    #print(ticker)
    if holder_validate[ticker]['profit/loss'][-1] > 1000:
        profitable_tickers.append(ticker)

print(f'Number of profitable stocks in the validation period: {len(profitable_tickers)}')

# Use the profitable stocks to predict the test data
profit_loss_test_validate = []
count_profit_test_validate = 0
count_loss_test_validate = 0
for ticker in profitable_tickers:
    profit_loss_test_validate.append(holder_test_update[ticker]['profit/loss'][-1] - 1000)
    if holder_test_update[ticker]['profit/loss'][-1] > 1000:
        count_profit_test_validate += 1
    else:
        count_loss_test_validate += 1

print(f'Total profit/loss for test data: {sum(profit_loss_test_validate)}')
print(f'Number of profitable stock for test data: {count_profit_test_validate}')
print(f'Number of loss stock for test data: {count_loss_test_validate}')


Number of profitable stocks in the validation period: 98
Total profit/loss for test data: 27930.86891751044
Number of profitable stock for test data: 89
Number of loss stock for test data: 9
