# Strategy Details

### Code Author: Beryl ZHENG

Reference: Gil Cohen " Polynomial Moving Regression Band Stocks Trading System"

Link to Reference: 

### The Main Ideas

The study develops a trading system using polynomial moving regression models to analyze Nasdaq100 stocks from 2017 to March 2024, demonstrating that these models can effectively identify stock trends and generate profitable trading signals. Among the polynomial models, the fourth-degree polynomial MRB achieved the highest average net profit. Thus, in the code below, I will use the fourth-degree MRB to first test its profitability and then to see if some simple approaches can improve the performances.

To avoid using future data, in this paper, the strategies will be open at open price and close at open price as well. 

In [1]:
# Collect the list of the S&P 500 companies from Wikipedia and save it to a file
import os
import requests
import pandas as pd

# ignore warnings
import warnings
warnings.filterwarnings("ignore")

# Get the list of S&P 500 companies from Wikipedia
url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
response = requests.get(url)
html = response.content
df = pd.read_html(html, header=0)[0]

tickers = df['Symbol'].tolist()

In [2]:
# Load the data from yahoo finance
import os
import yfinance as yf

def load_data(symbol):

    direc = 'data_2010_2024/'
    os.makedirs(direc, exist_ok=True)

    file_name = os.path.join(direc, symbol + '.csv')

    if not os.path.exists(file_name):

        ticker = yf.Ticker(symbol)
        df = ticker.history(start='2010-01-01', end='2024-10-31')

        df.to_csv(file_name)

    df = pd.read_csv(file_name, index_col=0)
    df.index = pd.to_datetime(df.index, utc=True).tz_convert('US/Eastern')
    df['date'] = df.index

    if len(df) == 0:
        os.remove(file_name)
        return None

    return df

holder = {}
ticker_with_data = []
for symbol in tickers:
    df = load_data(symbol)
    if df is not None:
        holder[symbol] = df
        ticker_with_data.append(symbol)

tickers = ticker_with_data[:]


print (f'Loaded data for {len(tickers)} companies')

BRK.B: No timezone found, symbol may be delisted
BF.B: No price data found, symbol may be delisted (1d 2010-01-01 -> 2024-10-31)


Loaded data for 501 companies


In [3]:
# Keep columns of 'Open' and 'Close' only
for ticker in tickers:
   holder[ticker] = holder[ticker][['Open', 'Close']]
   holder[ticker].columns = ['open', 'close']
   holder[ticker].index = holder[ticker].index.date

In [4]:
# Add the 50 days factors using the 'close' price and 50 days standard deviation
for ticker in tickers:
    holder[ticker]['price change'] = holder[ticker]['open'].pct_change()
    holder[ticker]['50 MA'] = holder[ticker]['close'].rolling(window=50).mean()
    holder[ticker]['50 STD'] = holder[ticker]['close'].rolling(window=50).std()
    holder[ticker]['50 MA^2'] = holder[ticker]['50 MA']**2
    holder[ticker]['50 MA^3'] = holder[ticker]['50 MA']**3
    holder[ticker]['50 MA^4'] = holder[ticker]['50 MA']**4
    holder[ticker].dropna(inplace=True)

In [5]:
# Update the list of tickers
tickers = [ticker for ticker in tickers if len(holder[ticker]) > 0]

# Regression Analysis

First, test the significance between the 50 day's moving average and the close prices.

In [6]:
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import statsmodels.api as sm

regression_holder = []
for ticker in tickers:
    regression_holder.append(holder[ticker][['close', '50 MA']])
df_for_regression = pd.concat(regression_holder, axis=0, ignore_index=True)

# Calculate the correlation between the 50 days moving average and the close price
correlation = df_for_regression.corr()
print(f'Correlation between 50 days moving average and close price: {correlation.loc["50 MA", "close"]}')

# Perform the linear regression
X = df_for_regression['50 MA']
y = df_for_regression['close']

X = sm.add_constant(X)
model = sm.OLS(y, X).fit()
print(model.summary())

Correlation between 50 days moving average and close price: 0.9975263586477868
                            OLS Regression Results                            
Dep. Variable:                  close   R-squared:                       0.995
Model:                            OLS   Adj. R-squared:                  0.995
Method:                 Least Squares   F-statistic:                 3.511e+08
Date:                Sun, 01 Dec 2024   Prob (F-statistic):               0.00
Time:                        18:28:17   Log-Likelihood:            -7.2018e+06
No. Observations:             1743579   AIC:                         1.440e+07
Df Residuals:                 1743577   BIC:                         1.440e+07
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------

## Analysis on the Regression Results

From the ...

### Next Step

Now, we are going to test the profitability of the 4th-degree MRB and how to make some improvement based on the results.

#  Polynomial Moving Regression Band Stocks Trading System

## Apply Training and Test Set to the Polynominal Band

In [7]:
# save the stock data after 2017  holder based on the date index
from datetime import datetime

train_start_date = datetime.strptime('2010-01-01', '%Y-%m-%d').date()
train_end_date = datetime.strptime('2016-12-31', '%Y-%m-%d').date()

test_start_date = datetime.strptime('2017-01-01', '%Y-%m-%d').date()
test_end_date = datetime.strptime('2024-10-31', '%Y-%m-%d').date()

holder_train = {}
for ticker in tickers:
    holder_train[ticker] = holder[ticker].loc[train_start_date:train_end_date]

holder_test = {}
for ticker in tickers:
    holder_test[ticker] = holder[ticker].loc[test_start_date:test_end_date]

# Check if all dataframe is non-empty in train and test. Create a list of non-empty tickers
tickers_with_data = []
for ticker in tickers:
    if not holder_train[ticker].empty and not holder_test[ticker].empty:
        tickers_with_data.append(ticker)

In [8]:
# Based on the model prediction, we can assign the signal to the stock data. 1 for buy, -1 for sell, 0 for no action
def assign_signal(df):
    df['signal'] = 0
    df['signal'] = np.where(df['close'] > df['4th 50 MA Upper'], 1, df['signal'])
    df['signal'] = np.where(df['close'] < df['4th 50 MA Lower'], -1, df['signal'])
    return df

In [9]:
# update the position based on the signal, 1 means holding long position, 0 means holding no position
def assign_position(df):
    df['position'] = df['signal']
    df['position'] = np.where(df['position'] == 0, np.nan, df['position'])
    df['position'] = df['position'].ffill()
    df['position'] = np.where(df['position'] == -1, 0, df['position'])
    df['position'] = df['position'].shift(1)
    return df

In [10]:
# Use the trainning data to predict the test data
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

model = LinearRegression()

fourth_regression_holder_without_validate = {}

for ticker in tickers_with_data:
    Y = holder_train[ticker]['open']
    X = holder_train[ticker][['50 MA', '50 MA^2','50 MA^3','50 MA^4']]

    model.fit(X, Y)
    fourth_regression_holder_without_validate[ticker] = model

for ticker in tickers_with_data:
    holder_test[ticker]['4th 50 MA Average'] = fourth_regression_holder_without_validate[ticker].predict(holder_test[ticker][['50 MA', '50 MA^2','50 MA^3','50 MA^4']])
    holder_test[ticker]['4th 50 MA Upper'] =  holder_test[ticker]['4th 50 MA Average'] + 2*holder_test[ticker]['50 STD']
    holder_test[ticker]['4th 50 MA Lower'] =  holder_test[ticker]['4th 50 MA Average'] - 2*holder_test[ticker]['50 STD']

for ticker in tickers_with_data:
    holder_test[ticker] = assign_signal(holder_test[ticker])
    holder_test[ticker]['signal'] = holder_test[ticker]['signal'].shift(1)

for ticker in tickers_with_data:
    holder_test[ticker] = assign_position(holder_test[ticker])
    holder_test[ticker]['profit/loss'] = 1000*(1+holder_test[ticker]['position']*holder_test[ticker]['price change']).cumprod()

profit_loss_test = []
count_profit_test = 0
count_loss_test = 0

for ticker in tickers_with_data:
    profit_loss_test.append(holder_test[ticker]['profit/loss'][-1] - 1000)
    if holder_test[ticker]['profit/loss'][-1] > 1000:
        count_profit_test += 1
    else:
        count_loss_test += 1

print(f'Total profit/loss for test data: {sum(profit_loss_test)}')
print(f'Number of profitable stock for test data: {count_profit_test}')
print(f'Number of loss stock for test data: {count_loss_test}')


Total profit/loss for test data: 27215.07128395023
Number of profitable stock for test data: 147
Number of loss stock for test data: 329


From the number of profitable stocks and loss stocks we can see that the trading strategy is not profitable to most of the stocks. So we will add a validation set into the model.

## Apply the Validation Set into the Model

In [11]:
# Train, test and validate the model
train_start_date = datetime.strptime('2010-01-01', '%Y-%m-%d').date()
train_end_date = datetime.strptime('2016-12-31', '%Y-%m-%d').date()

validate_start_date = datetime.strptime('2017-01-01', '%Y-%m-%d').date()
validate_end_date = datetime.strptime('2018-12-31', '%Y-%m-%d').date()

test_start_date = datetime.strptime('2019-01-01', '%Y-%m-%d').date()
test_end_date = datetime.strptime('2024-10-31', '%Y-%m-%d').date()

holder_validate = {}
for ticker in tickers_with_data:
    holder_validate[ticker] = holder_test[ticker].loc[validate_start_date:validate_end_date]
   # print(holder_validate[ticker])

holder_test_temp = {}
for ticker in tickers_with_data:
    holder_test_temp[ticker] = holder_test[ticker].loc[test_start_date:test_end_date]

# Copy the holder_test_temp to holder_test
holder_test = holder_test_temp.copy()

# Select the stocks that generate profit in the validation period
profitable_tickers = []
for ticker in tickers_with_data:
    #print(ticker)
    if holder_validate[ticker]['profit/loss'][-1] > 1000:
        profitable_tickers.append(ticker)

print(f'Number of profitable stocks in the validation period: {len(profitable_tickers)}')

# Use the profitable stocks to predict the test data
profit_loss_test_validate = []
count_profit_test_validate = 0
count_loss_test_validate = 0
for ticker in profitable_tickers:
    profit_loss_test_validate.append(holder_test[ticker]['profit/loss'][-1] - 1000)
    if holder_test[ticker]['profit/loss'][-1] > 1000:
        count_profit_test_validate += 1
    else:
        count_loss_test_validate += 1

print(f'Total profit/loss for test data: {sum(profit_loss_test_validate)}')
print(f'Number of profitable stock for test data: {count_profit_test_validate}')
print(f'Number of loss stock for test data: {count_loss_test_validate}')


Number of profitable stocks in the validation period: 93
Total profit/loss for test data: 15489.551416399028
Number of profitable stock for test data: 74
Number of loss stock for test data: 19


# Further Improvement on the Polynomial System

Based on the paper, we can see that the author use '2' as the band for the trading system. Now, we are going to see if we can adjust the coefficient to achieve better result.

In [12]:
upper_range = [0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4]
lower_range = [0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4]
ls_upper_range = []
ls_lower_range = []
ls_profit_loss_test = []
ls_profit_loss_validate = []
# Based on the validation period, we can find the optimal upper and lower range of 50 std for each stocks. and put the result in a dataframe
for uppder_coeff in upper_range:
    for lower_coeff in lower_range:
        profitable_tickers = []
        for ticker in tickers_with_data:
            holder_validate[ticker]['4th 50 MA Upper'] = holder_validate[ticker]['4th 50 MA Average'] + uppder_coeff*holder_validate[ticker]['50 STD']
            holder_validate[ticker]['4th 50 MA Lower'] = holder_validate[ticker]['4th 50 MA Average'] - lower_coeff*holder_validate[ticker]['50 STD']
            holder_validate[ticker] = assign_signal(holder_validate[ticker])
            holder_validate[ticker]['signal'] = holder_validate[ticker]['signal'].shift(1)
            holder_validate[ticker] = assign_position(holder_validate[ticker])
            holder_validate[ticker]['profit/loss'] = 1000*(1+holder_validate[ticker]['position']*holder_validate[ticker]['price change']).cumprod() 
            
            holder_test[ticker]['4th 50 MA Upper'] = holder_test[ticker]['4th 50 MA Average'] + uppder_coeff*holder_test[ticker]['50 STD']
            holder_test[ticker]['4th 50 MA Lower'] = holder_test[ticker]['4th 50 MA Average'] - lower_coeff*holder_test[ticker]['50 STD']
            holder_test[ticker] = assign_signal(holder_test[ticker])
            holder_test[ticker]['signal'] = holder_test[ticker]['signal'].shift(1)
            holder_test[ticker] = assign_position(holder_test[ticker])
            holder_test[ticker]['profit/loss'] = 1000*(1+holder_test[ticker]['position']*holder_test[ticker]['price change']).cumprod()

            if holder_validate[ticker]['profit/loss'][-1] > 1000:
                profitable_tickers.append(ticker)
        profit_loss_validate = []
        profit_loss_test = []
        count_profit_test = 0
        count_loss_test = 0
        for ticker in profitable_tickers:
            profit_loss_validate.append(holder_validate[ticker]['profit/loss'][-1] - 1000)
            profit_loss_test.append(holder_test[ticker]['profit/loss'][-1] - 1000)
            if holder_test[ticker]['profit/loss'][-1] > 1000:
                count_profit_test += 1
            else:
                count_loss_test += 1
        ls_upper_range.append(uppder_coeff)
        ls_lower_range.append(lower_coeff)
        ls_profit_loss_test.append(sum(profit_loss_test))
        ls_profit_loss_validate.append(sum(profit_loss_validate))

In [13]:
# Create a dataframe to store the result
df_result = pd.DataFrame({'upper range': ls_upper_range, 'lower range': ls_lower_range, 'profit/loss_test': ls_profit_loss_test, 'profit/loss_validate': ls_profit_loss_validate})
df_result_by_test = df_result.sort_values(by='profit/loss_test', ascending=False)
df_result_by_validate = df_result.sort_values(by='profit/loss_validate', ascending=False)


In [14]:
df_result_by_test.head(10)

Unnamed: 0,upper range,lower range,profit/loss_test,profit/loss_validate
5,0.0,2.5,27381.180985,23847.678551
8,0.0,4.0,27101.74978,29871.148394
7,0.0,3.5,26239.373868,28623.040064
17,0.5,4.0,24682.106744,28263.599176
6,0.0,3.0,23898.357115,25070.31583
16,0.5,3.5,22998.187233,27495.425911
4,0.0,2.0,22471.761555,21504.718591
26,1.0,4.0,21966.607972,26378.46921
14,0.5,2.5,20049.221941,23560.029306
25,1.0,3.5,19051.447296,25704.232284


In [15]:
df_result_by_validate.head(10)

Unnamed: 0,upper range,lower range,profit/loss_test,profit/loss_validate
8,0.0,4.0,27101.74978,29871.148394
7,0.0,3.5,26239.373868,28623.040064
17,0.5,4.0,24682.106744,28263.599176
16,0.5,3.5,22998.187233,27495.425911
26,1.0,4.0,21966.607972,26378.46921
25,1.0,3.5,19051.447296,25704.232284
6,0.0,3.0,23898.357115,25070.31583
15,0.5,3.0,17127.886362,24704.674745
5,0.0,2.5,27381.180985,23847.678551
35,1.5,4.0,15304.465492,23780.243287


## Repick Stocks Annually

In [16]:
train_start_date = datetime.strptime('2010-01-01', '%Y-%m-%d').date()
ls_train_end_date = ['2016-12-31', '2017-12-31', '2018-12-31', '2019-12-31', '2020-12-31', '2021-12-31', '2022-12-31']

ls_validation_start_date = ['2017-01-01', '2018-01-01', '2019-01-01', '2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01']
ls_validation_end_date = ['2017-12-31', '2018-12-31', '2019-12-31', '2020-12-31', '2021-12-31', '2022-12-31', '2023-12-31']

ls_test_start_date = ['2018-01-01', '2019-01-01', '2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01', '2024-01-01']
ls_test_end_date = ['2018-12-31', '2019-12-31', '2020-12-31', '2021-12-31', '2022-12-31', '2023-12-31', '2024-10-31']

cumulative_profit_loss_test = 0

# Assume that we use the training data to train the model and use the validation data to select the profitable stocks on an annual basis.
# To simplify the process, we assume all the trade will be closed at the end of the test period if the position is still active.
# Ignore the opmization of the upper and lower range of 50 std for each stock for now
for i in range(len(ls_train_end_date)):
    train_end_date = datetime.strptime(ls_train_end_date[i], '%Y-%m-%d').date()
    validate_start_date = datetime.strptime(ls_validation_start_date[i], '%Y-%m-%d').date()
    validate_end_date = datetime.strptime(ls_validation_end_date[i], '%Y-%m-%d').date()
    test_start_date = datetime.strptime(ls_test_start_date[i], '%Y-%m-%d').date()
    test_end_date = datetime.strptime(ls_test_end_date[i], '%Y-%m-%d').date()

    holder_train = {}
    holder_validate = {}
    holder_test = {}
    
    for ticker in tickers:
        holder_train[ticker] = holder[ticker].loc[train_start_date:train_end_date]
        holder_validate[ticker] = holder[ticker].loc[validate_start_date:validate_end_date]
        holder_test[ticker] = holder[ticker].loc[test_start_date:test_end_date]
    
    tickers_with_data = []
    for ticker in tickers:
        if not holder_train[ticker].empty and not holder_validate[ticker].empty and not holder_test[ticker].empty:
            tickers_with_data.append(ticker)
    
    # Train the model using the training data
    model = LinearRegression()
    fourth_regression_holder = {}
    for ticker in tickers_with_data:
        Y = holder_train[ticker]['open']
        X = holder_train[ticker][['50 MA', '50 MA^2','50 MA^3','50 MA^4']]
        model.fit(X, Y)
        fourth_regression_holder[ticker] = model

    # Use the model to predict the validation data
    for ticker in tickers_with_data:
        holder_validate[ticker]['4th 50 MA Average'] = fourth_regression_holder[ticker].predict(holder_validate[ticker][['50 MA', '50 MA^2','50 MA^3','50 MA^4']])
        holder_validate[ticker]['4th 50 MA Upper'] =  holder_validate[ticker]['4th 50 MA Average'] + 2*holder_validate[ticker]['50 STD']
        holder_validate[ticker]['4th 50 MA Lower'] =  holder_validate[ticker]['4th 50 MA Average'] - 2*holder_validate[ticker]['50 STD']

        holder_test[ticker]['4th 50 MA Average'] = fourth_regression_holder[ticker].predict(holder_test[ticker][['50 MA', '50 MA^2','50 MA^3','50 MA^4']])
        holder_test[ticker]['4th 50 MA Upper'] =  holder_test[ticker]['4th 50 MA Average'] + 2*holder_test[ticker]['50 STD']
        holder_test[ticker]['4th 50 MA Lower'] =  holder_test[ticker]['4th 50 MA Average'] - 2*holder_test[ticker]['50 STD']

    # Assign the signal to the validation data
    for ticker in tickers_with_data:
        holder_validate[ticker] = assign_signal(holder_validate[ticker])
        holder_validate[ticker]['signal'] = holder_validate[ticker]['signal'].shift(1)

        holder_test[ticker] = assign_signal(holder_test[ticker])
        holder_test[ticker]['signal'] = holder_test[ticker]['signal'].shift(1)


    # Assign the position to the validation data
    for ticker in tickers_with_data:
        holder_validate[ticker] = assign_position(holder_validate[ticker])
        # holder_validate[ticker]['position'] = holder_validate[ticker]['position'].shift(1)

        holder_test[ticker] = assign_position(holder_test[ticker])
        # holder_test[ticker]['position'] = holder_test[ticker]['position'].shift(1)

    # Calculate the profit/loss for the validation data
    for ticker in tickers_with_data:
        holder_validate[ticker]['profit/loss'] = 1000*(1+holder_validate[ticker]['position']*holder_validate[ticker
]['price change']).cumprod()
        holder_test[ticker]['profit/loss'] = 1000*(1+holder_test[ticker]['position']*holder_test[ticker]['price change']).cumprod()

        

    # Select the profitable stocks in the validation data
    profitable_tickers = []
    for ticker in tickers_with_data:
        if holder_validate[ticker]['profit/loss'][-1] > 1000:
            profitable_tickers.append(ticker)
    # Use the profitable stocks to apply the model to the test data
   
    # Use the profitable stocks to predict the test data
    profit_loss_test_ = []
    count_profit_test = 0
    count_loss_test_test = 0
    for ticker in profitable_tickers:
        profit_loss_test.append(holder_test[ticker]['profit/loss'][-1] - 1000)

    # check if there is any nan value in the profit/loss column
    
        if holder_test[ticker]['profit/loss'][-1] > 1000:
            count_profit_test += 1
        else:
            count_loss_test += 1
    # Print the result for each year
    print(f'Year {i+1}')
    # sum ignore the nan value
    print(f'Total profit/loss for test data: {sum(profit_loss_test)}')
    print(f'Percentage of profitable stock for test data: {sum(profit_loss_test)/len(profitable_tickers)/1000}')
    print(f'Number of profitable stock for test data: {count_profit_test}')
    print(f'Number of loss stock for test data: {count_loss_test}')
    cumulative_profit_loss_test += sum(profit_loss_test)/len(profitable_tickers)/1000



Year 1
Total profit/loss for test data: -1824.5807894956242
Percentage of profitable stock for test data: -0.018810111231913652
Number of profitable stock for test data: 14
Number of loss stock for test data: 97
Year 2
Total profit/loss for test data: 46065.20727656926
Percentage of profitable stock for test data: 0.2826086335985844
Number of profitable stock for test data: 127
Number of loss stock for test data: 133
Year 3
Total profit/loss for test data: 105349.66172236647
Percentage of profitable stock for test data: 0.3211879930559953
Number of profitable stock for test data: 226
Number of loss stock for test data: 235
Year 4
Total profit/loss for test data: 160420.63968610918
Percentage of profitable stock for test data: 0.5791358833433544
Number of profitable stock for test data: 217
Number of loss stock for test data: 295
Year 5
Total profit/loss for test data: 133569.77938258628
Percentage of profitable stock for test data: 0.4174055605705821
Number of profitable stock for test

In [18]:
cumulative_profit_loss_test

4.085026729297333