**Quant Model: Multiple Signals**

Decile Formation based on combining b2m and CashFlow2TA signals, and Long Short Portfolio Returns





---
# Program Flow
**Data**




*   Read CRSP and Compustat data
*   Merge data
*   Compute percentile ranks for selected features - b2m and CashFlow2TA here

**Fit regressions and check for significace**

*  Fit Regression each month (This code fits univaraite regressions with each
feature individaully and then a multvariate regression with both features)

* Test if the regression coefficients are statistically significant

**Return Prediction and Trading Strategy**


*   Compute regressions coefficient estimates for each month t within rolling windows (t-60 to t-1)

* Predict returns for month t
* For portfolio deciles and evaluate the trading strategy



---








---


**Data Description **

Important Dataframes

1.  "Returns" dataframe : It contains monthly returns(RET), shares  outstanding (SHROUT) values, Price (PRC), Primary Exchange Code (PRIMEXCH) and  Unique Identifiers (PERMNO). The data are downloaded from  CRSP.

Key Input data:
date:    yyyymmdd format
RET:     return for the month ending yyyymmdd
EXCHCD:  Exchange where listed
PRC:   Price as of month-end
SHROUT:  Shares outstanding as of month ending yyyymmdd


2.   "Cstat_data" dataframe : Compustat data used to construct features

  LPERMNO: CRSP identifier - relable to PERMNO to merge
  ceq: book value of common equity
  oancf: Cash flow from (oancf))/ total assets (at)

  #normalized by total assetes so that CashFlows are comparable across stocks of different sizes



3. merged_data : Dataframe obtained from Merging "Returns" & "Cstat_data" dataframe on "PERMNO" & "date". Merge with "pd.merge_asof" command to match CRSP 'date' with the lastest COMPUSTAT 'datadate' with 1 year tolerance for merging. Book to Market Ratio (b2m) is calculated using ceq and marketcap values.


---










---


# Functions defined in this code



*   **monthly_regression(Input_data, feature_list)**: *Input*: Dataframe with columns include date, 'RET', and items in feature_list.

*output*: Month Regression estimates of 'RET' against  feature_list


*   **rolling_window_prediction:** (1) Computes regressions coefficient estimates for each month t as the avereage within rolling windows (t-60 to t-1). (ii)Predicts return for each stock for month t and (iii) assigns decile ranks for each stock for each month.
* **portfolio_returns_statistics**: Uses the output the function "rolling_window_prediction" and from Computes portfolio stats: average returns for each decile portfolio and the diff portfolio and  t-statistics



---




In [None]:
# Connecting the Python Code with the google drive to access the datasets
from google.colab import drive
drive.mount("/content/drive")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# Importing Necessary Python Libraries
import pandas as pd
import numpy as np
import statsmodels.api as sm
import datetime as dt
from datetime import timedelta
from pandas import DateOffset

In [None]:
#CRSP Data

# Importing CRSP price and returns datasets
Returns = pd.read_csv("/content/drive/MyDrive/MAF data/MonthlyRet_198001_202312csv.zip") #Importing Cleaned CRSP data

# Handling Missing values
Returns.PRC = abs(Returns.PRC)                                         # Converting Price Values to absolute numbers (CRSP sets PRC with a "-" symbol if it is comuted as bid-ask average when there is no actual trade)

# Market Cap Calculation
Returns['marketcap'] = Returns.SHROUT * Returns.PRC                    #  Market Capitalization as of month end
Returns['marketcap'] = Returns.groupby('PERMNO')['marketcap'].shift()  # Lagged Market Capitalization = market cap as of the end of the previous month
Returns['marketcap'] = np.where(Returns['marketcap'] < 10000, np.nan, Returns['marketcap']) # exclude marketcap < $10m

# Exchange Code Filters
exch_nyse_amex_Nasdaq = ['N', 'Q', 'A']
Returns = Returns[Returns.PRIMEXCH.isin(exch_nyse_amex_Nasdaq)].copy() #keeping only NYSE (N), AMEX(A) and Nasdaq (Q) stocks, ie. stocks listed on  US exchanges)

#Keep only ordinary common shares
ord_common_shares = [10, 11, 12]
Returns = Returns[Returns.SHRCD.isin(ord_common_shares)].copy()             #keeping only ordinary common shares - excludes unit trusts, ADRS, REITS, closed-end funds

# Minor Pre-processing
Returns.reset_index(inplace = True, drop = True)                                                # Reset Index

Returns = Returns[["PERMNO","PRIMEXCH","date","RET","PRC","SHROUT","marketcap"]].copy() # Reordering the columns for clarity
Returns.RET = pd.to_numeric(Returns.RET, errors = 'coerce')                      #RET denoted missing value with alphanumeric values. convert it to Numeric with the 'coerce' option to set nonnumeric value to nan.

Returns.dropna(inplace = True)
#CRSP Data , prepare Date-time for merging with Compustat data

Returns["date"] = pd.to_datetime(Returns["date"])                       # Convert  "date" to a DateTime object
Returns["year"] = Returns["date"].dt.year                              # Extracting year
Returns["month"] = Returns["date"].dt.month                            # Extracting month



In [None]:
#Compustat Data

# Importing Compustat Data
Cstat_data = pd.read_csv('/content/drive/MyDrive/MAF data/Cstat_20250108.zip')     # Importing monthly Compustat data

Cstat_data.rename(columns = {'LPERMNO' : 'PERMNO'}, inplace = True) # Renaming "LPERMNO" for merging Cstat data with CRSP data
Cstat_data['at'] = Cstat_data['at'].apply(lambda x: 0.5 if x < 0.5 else x) #setting at to a min value of 0.5 because 'at' can be < 0 some stocks
Cstat_data['CashFlow2TA'] = Cstat_data['oancf']/ Cstat_data['at']               # Cash flow from operations (oancf)]/Assets (AT)

#Date time for Compustat Data - When will the data be available to the market?

# Datetime Manipulations
Cstat_data["date"] = pd.to_datetime(Cstat_data["datadate"])        # Convert to  DateTime object for datetime manipulations
Cstat_data['date'] = Cstat_data['date'].apply(lambda x: x + DateOffset(months=+5))  # Adding five months (using DataOffset library) assuming it takes at most 4 months for the data to reach the market

Cstat_data = Cstat_data[['date', 'PERMNO', 'datadate', 'ceq', 'CashFlow2TA']].copy()  #retain only data needed further




Cstat_data.head()

Unnamed: 0,date,PERMNO,datadate,ceq,CashFlow2TA
0,1991-10-31,54594,1991-05-31,193.778,0.097092
1,1992-10-31,54594,1992-05-31,196.737,0.022294
2,1993-10-31,54594,1993-05-31,189.216,0.046025
3,1994-10-31,54594,1994-05-31,189.488,0.016036
4,1995-10-31,54594,1995-05-31,197.119,0.035826


Merge CRSP and Compusta data by PERMNO.
Ensure no look-ahead bias:

In [None]:
# Merged Data

Returns.sort_values(by = 'date', inplace = True)                       # Sort CRSP data by date to use merge_asof
Cstat_data.sort_values(by = 'date', inplace = True)                 # Sort Cstat data by date to use merge_asof


merged_data = pd.merge_asof(Returns, Cstat_data, by = 'PERMNO', left_on = 'date', right_on= 'date', tolerance=dt.timedelta(days = 365)) # Merging "Returns" & "Cstat" dataframe on "PERMNO" & "date" with a 1-year tolerance for date

# Calculating Book to Market Ratio
merged_data['b2m'] = merged_data.ceq / merged_data.marketcap      # Book to Market Ratio


merged_data.dropna(subset=['RET', 'CashFlow2TA', 'b2m'], how = 'any', inplace = True) #drop only if the  data items we need later are missing
merged_data.head()


Unnamed: 0,PERMNO,PRIMEXCH,date,RET,PRC,SHROUT,marketcap,year,month,datadate,ceq,CashFlow2TA,b2m
555659,65816,A,1990-11-30,0.0,2.875,158992.0,457102.0,1990,11,1990-06-30,414.9,0.023739,0.000908
555669,85965,N,1990-11-30,0.141176,12.125,12171.0,129316.875,1990,11,1990-06-30,150.782,0.172326,0.001166
555670,86765,A,1990-11-30,0.071429,1.875,6024.0,10542.0,1990,11,1990-06-30,-5.158,0.324619,-0.000489
555679,59352,N,1990-11-30,0.022727,56.25,18209.0,1001495.0,1990,11,1990-06-30,286.451,0.124932,0.000286
555681,23035,Q,1990-11-30,0.041667,18.75,4592.0,82656.0,1990,11,1990-06-30,21.2,0.157101,0.000256


In [None]:
q = merged_data[['CashFlow2TA', 'b2m']].quantile(q = [ 0,.01, .05, .95, .99, 1])             #Are there outliers among features?
q.T

Unnamed: 0,0.00,0.01,0.05,0.95,0.99,1.00
CashFlow2TA,-79.568445,-0.95983,-0.372084,0.22036,0.336218,61.49
b2m,-0.234322,-0.000527,3.2e-05,0.002033,0.004987,0.15127


Ranked variables to avoid problems with outliers.  



*  Assign ranks with  **'groupby(['year','month'])'** (why?)




In [None]:
#Ranked signals: Rank the signals each month and assign ranks from 0 for smallest to 1 for largest. A score of .90 means 90% of stocks have smaller signal in that month
merged_data['b2m_pct_rank']= merged_data.groupby(['year','month'])['b2m'].rank(pct = True)
merged_data['CashFlow2TA_pct_rank']= merged_data.groupby(['year','month'])['CashFlow2TA'].rank(pct = True)

merged_data.reset_index(inplace = True, drop = True)              # Reset Index
# Set 'date' as index
merged_data.set_index('date', inplace=True)
merged_data.sort_index(inplace = True)
merged_data.head()


Unnamed: 0_level_0,PERMNO,PRIMEXCH,RET,PRC,SHROUT,marketcap,year,month,datadate,ceq,CashFlow2TA,b2m,b2m_pct_rank,CashFlow2TA_pct_rank
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
1990-11-30,65816,A,0.0,2.875,158992.0,457102.0,1990,11,1990-06-30,414.9,0.023739,0.000908,0.599359,0.341346
1990-11-30,85965,N,0.141176,12.125,12171.0,129316.875,1990,11,1990-06-30,150.782,0.172326,0.001166,0.772436,0.871795
1990-11-30,86765,A,0.071429,1.875,6024.0,10542.0,1990,11,1990-06-30,-5.158,0.324619,-0.000489,0.012821,0.990385
1990-11-30,59352,N,0.022727,56.25,18209.0,1001495.0,1990,11,1990-06-30,286.451,0.124932,0.000286,0.192308,0.708333
1990-11-30,23035,Q,0.041667,18.75,4592.0,82656.0,1990,11,1990-06-30,21.2,0.157101,0.000256,0.166667,0.833333


In [None]:

##############################################################
# Function to compute monthly regression coefficients
##############################################################

def monthly_regression(merged_data, feature_list) :
  ## Initialization of lists and dataframes
  reg_data = merged_data.copy()
  reg_data['constant'] = 1                                                     # The Coeffcient on constant is the  intercept
  reg_factors = feature_list                                                 # Input the factors here
  reg_factors.append("constant")
  datelist = reg_data.index.get_level_values(0).unique()                       # "datelist" - list of all unique dates in the sample
  monthly_reg_df = pd.DataFrame(index=reg_factors, columns=datelist, data=np.nan)     # DataFrame with regression coeffcients for each month

  ## Monthly Regression over Regression Factors
  for month in datelist:                                                          # Iterating over all months to calculate monthly regression coefficients

      # Variables initialized

      x = reg_data.loc[month, reg_factors].copy()                              # Regression factors as independent variables
      y = reg_data.loc[month, 'RET'].copy()                                    # "ret" or monthly return as dependent variable

      # Model fit & Regression
      model = sm.OLS(y,x)                                                         # OLS Regression Specification
      results = model.fit()                                                       # regression output stored in results

      # Results Stored
      monthly_reg_df[month] = results.params                                      # results.params gets the parameters

  ## Print Output : Monthly_Reg_Df
  print("****************************************************************************")
  print("Monthly Regression Coefficients DataFrame")
  print("****************************************************************************")
  return monthly_reg_df


############################################################################################################################
## Function to compute Rolling Window Average of coefficients and associated Predicted Returns / Decile Formation ##
############################################################################################################################

def rolling_window_prediction(monthly_regression_dataframe, reg_factor_list, rolling_window = 60) :
  # Variables Initialization
  monthly_reg_df = monthly_regression_dataframe.copy()
  datelist = monthly_reg_df.T.index.unique()                                                                                              # Input the factors here
  reg_factor_list.append("constant")
  predicted_ret_df = pd.DataFrame()
  output_df = pd.DataFrame(index=reg_factor_list, columns=datelist[rolling_window-1 :], data=np.nan)                                # DataFrame with rolling coefficients for each month

  for i in range(0,len(datelist)-rolling_window + 1):                                        # Iterating over rolling data of length "rolling window"

    # Initializing Dataframes and Variables
    rolling_datelist =  datelist[i : i + rolling_window]                                     # Extracting Year-months in rolling period
    rolling_df   =  pd.DataFrame(); temp_df = pd.DataFrame()                                 # Dataframe Initialization

    # Calculating Mean Regression coefficients
    rolling_df   =  monthly_reg_df[rolling_datelist].copy()                                  # DataFrame with regression coeffcients for each month in the rolling period
    rolling_df["mean_coeff"] =    rolling_df.mean(axis = 1)                                  # Regression coeffcient for each months computes as the moving average of the estimates in the rolling window
    reg_coefficients = list( rolling_df["mean_coeff"])[:-1]                                  # Storing means in a list
    output_df[rolling_datelist[-1]] = rolling_df["mean_coeff"]                               # Storing averaged coefficients in a dataframe(for reference)

    # Predicted Returns Calculation
    temp_df = merged_data.loc[rolling_datelist[-1]].copy()                                     # Temporary Dataframe (Monthly Slice) to store  predicted each mnth
    temp_df['predicted_returns'] = temp_df[reg_factor_list[:-1]].mul(reg_coefficients).sum(1)   # Predicted Returns for each stock for month i
    predicted_ret_df = pd.concat([predicted_ret_df,temp_df] , axis=0)                       # Predicted Returns for each stock over the entire sample period


  # Decile formation & Predicted Returns Calculation
  predicted_ret_df['rank'] = predicted_ret_df.groupby(['year','month'])['predicted_returns'].transform(lambda x: pd.qcut(x, 10, duplicates='drop',labels=False))         # Calculating Decile Ranks based on the Predicted Returns
  #predicted_ret_df.rename(columns = {"ret": "RET"},inplace = True)                                                                                                       # Renaming Columns
  predicted_ret_df.reset_index(inplace =True)                                                                                                                            # Reseting Index
  predicted_ret_df = predicted_ret_df[["PERMNO","year","month","RET"] + reg_factor_list[:-1] + ["predicted_returns","rank"]]                                                 # Selecting Relevant Colums

  ## Print Output : Predicted_Ret_Df
  print("****************************************************************************")
  print(f"DataFrame with Predicted Returns, Deciles(rank) and Relevant Regression factors ")
  print("****************************************************************************")
  return predicted_ret_df, output_df

In [None]:
# Function to compute returns statistics for each decile portfolio
def portfolio_returns_statistics(predicted_ret_df):
  # Monthly Mean Portfolio Returns
  meanret = predicted_ret_df.groupby(['year','month', 'rank'])['RET'].mean().to_frame()    # Calculating average return for each decile (according to b2m ratio) for each month
  meanret = meanret.unstack(level = -1).copy()                                             # Unstacking the grouped dataframe
  meanret[('RET', 'diff')] = meanret[('RET', 9)] -  meanret[('RET', 0)]                    # Calculating the long short returns of the portfolio by substracting "rank 0" avg. return from "rank 9" avg. return

  nmon = len(meanret)                                                                      # nmon in number of months
  meanret = meanret.stack(level = -1, future_stack=True).copy()                                               # Stacking the dataframe to year-month index level

  # Overall Portfolio Returns Statistics
  global_mean = meanret.groupby('rank')['RET'].agg(["mean", "std"])                      # mean and standard deviation of regression coefficients
  global_mean['t-stat'] =np.sqrt(nmon - 1) *  global_mean['mean']/global_mean['std']       # t-statistics calculation
  return global_mean

The regression estimate each month is based on stock returns and b2m ratios that month.  The regression estimates in "monthly_reg_df" are the monthly estimates over the entire sample period.  The  regression estimate for the entire  sample period is the time-series average of monthly  estimates.





In [None]:
##############################################################
# Call Monthly Regression Function for Book-to-Market
##############################################################
regression_factors = ["b2m_pct_rank"]
monthly_reg_dataframe = monthly_regression(merged_data, regression_factors)
monthly_reg_dataframe.T

****************************************************************************
Monthly Regression Coefficients DataFrame
****************************************************************************


Unnamed: 0_level_0,b2m_pct_rank,constant
date,Unnamed: 1_level_1,Unnamed: 2_level_1
1990-11-30,0.017771,0.049506
1990-12-31,0.009203,0.010302
1991-01-31,-0.010718,0.108559
1991-02-28,-0.008463,0.114079
1991-03-28,-0.026466,0.090669
...,...,...
2023-08-31,0.001272,-0.064574
2023-09-29,0.000689,-0.067468
2023-10-31,-0.010271,-0.072290
2023-11-30,0.009011,0.081406


The regression estimate each month is based on stock returns and b2m ratios that month.  The regression estimates in "monthly_reg_df" are the monthly estimates over the entire sample period.  The  regression estimate for the entire  sample period is the time-series average of monthly  estimates.





In [None]:
##############################################################
# Time Series average of coefficients
##############################################################
Regression_stats = monthly_reg_dataframe.T.agg(["mean", "std"])                      # The regression estimates are the time-series averages of mponthly estimates
num_months = monthly_reg_dataframe.shape[1]                                            # Number of months in the sample period
Regression_stats.loc['t-stat'] = np.sqrt(num_months - 1) * Regression_stats.loc['mean']/ Regression_stats.loc['std']
Regression_stats


Unnamed: 0,b2m_pct_rank,constant
mean,0.007027,0.006567
std,0.056658,0.065625
t-stat,2.471248,1.993876


Return Prediction - Avoid look ahead bias. Use only information available up to month *t* to predict returns for month *t+1*


1.  Compute rolling regression estimates with regression estimates from the past 60 months (from *t*-59 to *t*)
2.   Predict month returns for month *t+1*



In [None]:
##############################################################
# Rolling Regression Estimates for Book-to-market
##############################################################
regression_factors = ["b2m_pct_rank"] # # Always define this before calling rolling window prediction
predicted_returns = rolling_window_prediction(monthly_reg_dataframe, regression_factors)[0] #Function to compute Rolling Window Average of coefficients and associated Predicted Returns / Decile Formation ##
predicted_returns

****************************************************************************
DataFrame with Predicted Returns, Deciles(rank) and Relevant Regression factors 
****************************************************************************


Unnamed: 0,PERMNO,year,month,RET,b2m_pct_rank,predicted_returns,rank
0,27239,1995,10,-0.037037,0.558195,0.003803,5
1,11198,1995,10,-0.073939,0.579968,0.003951,5
2,62033,1995,10,-0.027778,0.631235,0.004300,6
3,80415,1995,10,-0.038095,0.692993,0.004721,6
4,62010,1995,10,0.007246,0.845804,0.005762,8
...,...,...,...,...,...,...,...
1459911,21617,2023,12,0.115033,0.778547,0.008209,7
1459912,15395,2023,12,0.046065,0.529906,0.005587,5
1459913,91575,2023,12,0.209142,0.335393,0.003536,3
1459914,21612,2023,12,0.010292,0.343796,0.003625,3


In [None]:
##############################################################
# Compute Decile Portfolio Returns Statistics
##############################################################
predicted_returns_2000 = predicted_returns[predicted_returns['year'] >= 2000].copy() #computes stats for the post-2000 period
portfolio_returns_statistics(predicted_returns_2000)

Unnamed: 0_level_0,mean,std,t-stat
rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.003978,0.076968,0.875614
1,0.004206,0.06249,1.140377
2,0.006931,0.059204,1.983286
3,0.007386,0.058306,2.146003
4,0.008137,0.059079,2.33328
5,0.008696,0.05952,2.475052
6,0.008737,0.061318,2.413863
7,0.010138,0.064122,2.678546
8,0.012103,0.072757,2.818026
9,0.014827,0.097462,2.577212


In [None]:
##############################################################
# Call Monthly Regression Function for CashFlow2TA
##############################################################
regression_factors = ["CashFlow2TA_pct_rank"]
monthly_reg_dataframe = monthly_regression(merged_data, regression_factors)
monthly_reg_dataframe.T

****************************************************************************
Monthly Regression Coefficients DataFrame
****************************************************************************


Unnamed: 0_level_0,CashFlow2TA_pct_rank,constant
date,Unnamed: 1_level_1,Unnamed: 2_level_1
1990-11-30,0.056379,0.030141
1990-12-31,0.081304,-0.025842
1991-01-31,-0.034083,0.120268
1991-02-28,0.003531,0.108074
1991-03-28,-0.035437,0.095160
...,...,...
2023-08-31,0.107879,-0.117890
2023-09-29,0.081486,-0.107876
2023-10-31,0.093600,-0.124238
2023-11-30,-0.009047,0.090437


In [None]:
##############################################################
# Time Series average of coefficients
##############################################################
Regression_stats = monthly_reg_dataframe.T.agg(["mean", "std"])                      # The regression estimates are the time-series averages of mponthly estimates
num_months = monthly_reg_dataframe.shape[1]                                            # Number of months in the sample period
Regression_stats.loc['t-stat'] = np.sqrt(num_months - 1) * Regression_stats.loc['mean']/ Regression_stats.loc['std']
Regression_stats


Unnamed: 0,CashFlow2TA_pct_rank,constant
mean,0.011838,0.004161
std,0.065425,0.082578
t-stat,3.605138,1.003999


In [None]:
##############################################################
# Rolling Regression Estimates for CashFlow2TA
##############################################################
regression_factors = ["CashFlow2TA_pct_rank"] # # Always define this before calling rolling window prediction
predicted_returns = rolling_window_prediction(monthly_reg_dataframe, regression_factors)[0] #Function to compute Rolling Window Average of coefficients and associated Predicted Returns / Decile Formation ##
predicted_returns

****************************************************************************
DataFrame with Predicted Returns, Deciles(rank) and Relevant Regression factors 
****************************************************************************


Unnamed: 0,PERMNO,year,month,RET,CashFlow2TA_pct_rank,predicted_returns,rank
0,27239,1995,10,-0.037037,0.706453,0.008326,7
1,11198,1995,10,-0.073939,0.794735,0.009367,7
2,62033,1995,10,-0.027778,0.732185,0.008630,7
3,80415,1995,10,-0.038095,0.881631,0.010391,8
4,62010,1995,10,0.007246,0.441409,0.005203,4
...,...,...,...,...,...,...,...
1459911,21617,2023,12,0.115033,0.506179,0.007648,5
1459912,15395,2023,12,0.046065,0.800791,0.012100,8
1459913,91575,2023,12,0.209142,0.887296,0.013407,8
1459914,21612,2023,12,0.010292,0.572912,0.008656,5


In [None]:
##############################################################
# Compute Decile Portfolio Returns Statistics
##############################################################
predicted_returns_2000 = predicted_returns[predicted_returns['year'] >= 2000].copy() #computes stats for the post-2000 period
portfolio_returns_statistics(predicted_returns_2000)

Unnamed: 0_level_0,mean,std,t-stat
rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.000731,0.106178,0.116691
1,0.004542,0.087424,0.880172
2,0.005123,0.067199,1.29164
3,0.007779,0.062991,2.092133
4,0.010508,0.061815,2.879759
5,0.011638,0.05929,3.325325
6,0.011919,0.057743,3.497
7,0.011344,0.057556,3.339085
8,0.01104,0.060196,3.10693
9,0.010519,0.061367,2.903868


Multivariate regression: Regressions with multiple features

In [None]:
##############################################################
# Call Monthly Regression Coefficient Function
##############################################################
regression_factors = ["b2m_pct_rank","CashFlow2TA_pct_rank"]                    # # Always define this before calling rolling window prediction
monthly_reg_dataframe = monthly_regression(merged_data, regression_factors)
monthly_reg_dataframe.T

****************************************************************************
Monthly Regression Coefficients DataFrame
****************************************************************************


Unnamed: 0_level_0,b2m_pct_rank,CashFlow2TA_pct_rank,constant
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1990-11-30,0.011312,0.055052,0.025132
1990-12-31,0.002020,0.081125,-0.026765
1991-01-31,-0.009429,-0.033723,0.124812
1991-02-28,-0.008531,0.003688,0.112266
1991-03-28,-0.026234,-0.035265,0.108208
...,...,...,...
2023-08-31,0.015994,0.110019,-0.126959
2023-09-29,0.013773,0.083640,-0.115842
2023-10-31,0.005609,0.094542,-0.127514
2023-11-30,0.007626,-0.007670,0.085934


In [None]:
##############################################################
# Time Series average of coefficients#
##############################################################
Regression_stats = monthly_reg_dataframe.T.agg(["mean", "std"])                      # The regression estimates are the time-series averages of mponthly estimates
num_months = monthly_reg_dataframe.shape[1]                                            # Number of months in the sample period
Regression_stats.loc['t-stat'] = np.sqrt(num_months - 1) * Regression_stats.loc['mean']/ Regression_stats.loc['std']
Regression_stats

Unnamed: 0,b2m_pct_rank,CashFlow2TA_pct_rank,constant
mean,0.008173,0.012609,-0.000312
std,0.055183,0.064204,0.089222
t-stat,2.95097,3.912958,-0.069658


In [None]:
##############################################################
# Rolling Average Prediction Estimates for Book-to-Market &CashFlow2TA
##############################################################
regression_factors = ["b2m_pct_rank","CashFlow2TA_pct_rank"] # Always define this before calling rolling window prediction
predicted_returns = rolling_window_prediction(monthly_reg_dataframe, regression_factors)[0]
predicted_returns

****************************************************************************
DataFrame with Predicted Returns, Deciles(rank) and Relevant Regression factors 
****************************************************************************


Unnamed: 0,PERMNO,year,month,RET,b2m_pct_rank,CashFlow2TA_pct_rank,predicted_returns,rank
0,27239,1995,10,-0.037037,0.558195,0.706453,0.011774,7
1,11198,1995,10,-0.073939,0.579968,0.794735,0.012931,8
2,62033,1995,10,-0.027778,0.631235,0.732185,0.012547,8
3,80415,1995,10,-0.038095,0.692993,0.881631,0.014669,9
4,62010,1995,10,0.007246,0.845804,0.441409,0.010603,6
...,...,...,...,...,...,...,...,...
1459911,21617,2023,12,0.115033,0.778547,0.506179,0.018021,7
1459912,15395,2023,12,0.046065,0.529906,0.800791,0.019377,8
1459913,91575,2023,12,0.209142,0.335393,0.887296,0.018193,7
1459914,21612,2023,12,0.010292,0.343796,0.572912,0.013403,3


In [None]:
##############################################################
# Compute Decile Portfolio Returns Statistics
##############################################################
#regression_factors = ["b2m_pct_rank","CashFlow2TA_pct_rank"]
predicted_returns_2000 = predicted_returns[predicted_returns['year'] >= 2000].copy() #computes stats for the post-2000 period
portfolio_returns_statistics(predicted_returns_2000)

Unnamed: 0_level_0,mean,std,t-stat
rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,-0.001101,0.089339,-0.208704
1,0.003679,0.079256,0.786356
2,0.004964,0.068496,1.227663
3,0.007667,0.064012,2.029211
4,0.009301,0.061368,2.567488
5,0.01042,0.060306,2.927045
6,0.01124,0.06139,3.101652
7,0.012247,0.059737,3.473208
8,0.012397,0.06339,3.313059
9,0.014333,0.069649,3.4862
