**Quant Model: Multiple Signals**

Decile Formation based on combining b2m and CashFlow2TA signals, and Long Short Portfolio Returns





**Data Description and code sequence**

Important Dataframes

1.  "Returns" dataframe : It contains monthly returns(RET), shares  outstanding (SHROUT) values, Price (PRC), Primary Exchange Code (PRIMEXCH) and  Unique Identifiers (PERMNO). The data are downloaded from  CRSP.

Key Input data:
date:    yyyymmdd format
RET:     return for the month ending yyyymmdd
EXCHCD:  Exchange where listed
PRC:   Price as of month-end
SHROUT:  Shares outstanding as of month ending yyyymmdd


2.   "Cstat_data" dataframe : Compustat data used to construct features

  LPERMNO: CRSP identifier - relable to PERMNO to merge
  ceq: book value of common equity
  oancf: Cash flow from (oancf))/ total assets (at)

  #normalized by total assetes so that CashFlows are comparable across stocks of different sizes



3. merged_data : Dataframe obtained from Merging "Returns" & "Cstat_data" dataframe on "PERMNO" & "date". Merge with "pd.merge_asof" command to match CRSP 'date' with the lastest COMPUSTAT 'datadate' with 1 year tolerance for merging. Book to Market Ratio (b2m) is calculated using ceq and marketcap values.

4. Signals:



*  Sort based on one signal at a time


  a. b2m: Book-to-market

  b. CashFlow2AT: oancf/Assets (AT)

* combine signals and then sort based on the combined signal
  
  c. Combined signal = pct score of b2m +  pct score of CashFlow2TA_rank







In [None]:
# Connecting the Python Code with the google drive to access the datasets
from google.colab import drive
drive.mount("/content/drive")

In [None]:
# Importing Necessary Python Libraries
import pandas as pd
import numpy as np
import datetime as dt
from datetime import timedelta
from pandas import DateOffset

In [None]:
#CRSP Data

# Importing CRSP price and returns datasets
Returns = pd.read_csv("/content/drive/MyDrive/MAF data/MonthlyRet_198001_202312csv.zip") #Importing Cleaned CRSP data

# Handling Missing values
Returns.PRC = abs(Returns.PRC)                                         # Converting Price Values to absolute numbers (CRSP sets PRC with a "-" symbol if it is comuted as bid-ask average when there is no actual trade)

# Market Cap Calculation
Returns['marketcap'] = Returns.SHROUT * Returns.PRC                    #  Market Capitalization as of month end
Returns['marketcap'] = Returns.groupby('PERMNO')['marketcap'].shift()  # Lagged Market Capitalization = market cap as of the end of the previous month
Returns['marketcap'] = np.where(Returns['marketcap'] < 10000, np.nan, Returns['marketcap']) # exclude marketcap < $10m

# Exchange Code Filters
exch_nyse_amex_Nasdaq = ['N', 'Q', 'A']
Returns = Returns[Returns.PRIMEXCH.isin(exch_nyse_amex_Nasdaq)].copy() #keeping only NYSE (N), AMEX(A) and Nasdaq (Q) stocks, ie. stocks listed on  US exchanges)

#Keep only ordinary common shares
ord_common_shares = [10, 11, 12]
Returns = Returns[Returns.SHRCD.isin(ord_common_shares)].copy()             #keeping only ordinary common shares - excludes unit trusts, ADRS, REITS, closed-end funds

# Minor Pre-processing
Returns.reset_index(inplace = True, drop = True)                                                # Reset Index

Returns = Returns[["PERMNO","PRIMEXCH","date","RET","PRC","SHROUT","marketcap"]].copy() # Reordering the columns for clarity
Returns.RET = pd.to_numeric(Returns.RET, errors = 'coerce')                      #RET denoted missing value with alphanumeric values. convert it to Numeric with the 'coerce' option to set nonnumeric value to nan.

Returns.dropna(inplace = True)
#CRSP Data , prepare Date-time for merging with Compustat data

Returns["date"] = pd.to_datetime(Returns["date"])                       # Convert  "date" to a DateTime object
Returns["year"] = Returns["date"].dt.year                              # Extracting year
Returns["month"] = Returns["date"].dt.month                            # Extracting month



In [None]:
#Compustat Data

# Importing Compustat Data
Cstat_data = pd.read_csv('/content/drive/MyDrive/MAF data/Cstat_20250108.zip')     # Importing monthly Compustat data

Cstat_data.rename(columns = {'LPERMNO' : 'PERMNO'}, inplace = True) # Renaming "LPERMNO" for merging Cstat data with CRSP data
Cstat_data['at'] = Cstat_data['at'].apply(lambda x: 0.5 if x < 0.5 else x) #setting at to a min value of 0.5 because 'at' can be < 0 some stocks
Cstat_data['CashFlow2TA'] = Cstat_data['oancf']/ Cstat_data['at']               # Cash flow from operations (oancf)]/Assets (AT)

#Date time for Compustat Data - When will the data be available to the market?

# Datetime Manipulations
Cstat_data["date"] = pd.to_datetime(Cstat_data["datadate"])        # Convert to  DateTime object for datetime manipulations
Cstat_data['date'] = Cstat_data['date'].apply(lambda x: x + DateOffset(months=+5))  # Adding five months (using DataOffset library) assuming it takes at most 4 months for the data to reach the market

Cstat_data = Cstat_data[['date', 'PERMNO', 'datadate', 'ceq', 'CashFlow2TA']].copy()  #retain only data needed further




Cstat_data.head()

In [None]:
Cstat_data['CashFlow2TA'].min(), Cstat_data['CashFlow2TA'].max()

Merge CRSP and Compusta data by PERMNO.
Ensure no look-ahead bias:

In [None]:
# Merged Data

Returns.sort_values(by = 'date', inplace = True)                       # Sort CRSP data by date to use merge_asof
Cstat_data.sort_values(by = 'date', inplace = True)                 # Sort Cstat data by date to use merge_asof


merged_data = pd.merge_asof(Returns, Cstat_data, by = 'PERMNO', left_on = 'date', right_on= 'date', tolerance=dt.timedelta(days = 365)) # Merging "Returns" & "Cstat" dataframe on "PERMNO" & "date" with a 1-year tolerance for date

# Calculating Book to Market Ratio
merged_data['b2m'] = merged_data.ceq / merged_data.marketcap      # Book to Market Ratio


merged_data.dropna(subset=['RET', 'CashFlow2TA', 'b2m'], how = 'any', inplace = True) #drop only if the  data items we need later are missing
merged_data.head()


In [None]:
# Compute decile portfolio returns
merged_data = merged_data[merged_data.year >= 2000].copy()        # Including data with year greater than equal to 2000
merged_data['b2m_rank'] = merged_data.groupby(['year','month'])['b2m'].transform(lambda x: pd.qcut(x, 10, duplicates='drop',labels=False)) #  Ranks based on Book to Market Value each month
merged_data['CashFlow2TA_rank'] = merged_data.groupby(['year','month'])['CashFlow2TA'].transform(lambda x: pd.qcut(x, 10, duplicates='drop',labels=False)) # Ranks based on CashFlow2TA each month

merged_data.reset_index(inplace = True, drop = True)              # Reset Index

merged_data.head()


In [None]:
# Monthly Mean Portfolio Returns for b2m_rank
meanret = merged_data.groupby(['year','month', 'b2m_rank'])['RET'].mean().to_frame()   # Calculating average return for each decile (according to b2m ratio) for each month
meanret.head()

In [None]:
# Compute the difference between extreme portfolio returns and the Global mean
meanret = meanret.unstack(level = -1).copy()                                       # Unstacking the grouped dataframe
meanret[('RET', 'diff')] = meanret[('RET', 9)] -  meanret[('RET', 0)]              # Calculating the long short returns of the portfolio by substracting "rank 0" avg. return from "rank 9" avg. return

nmon = len(meanret)                                                                # nmon in number of months
meanret = meanret.stack(level = -1, future_stack=True).copy()                                         # Stacking the dataframe to year-month index level
meanret

In [None]:
#  Portfolio  Statistics
# Third level index (b2m_rank) and make it a column
meanret = meanret.reset_index(level=2)
# Calculate mean and standard deviation for each b2m_rank
global_mean_b2m = meanret.groupby('b2m_rank')['RET'].agg(['mean', 'std'])
global_mean_b2m['t-stat'] = np.sqrt(nmon - 1) * global_mean_b2m['mean'] / global_mean_b2m['std']  # t-statistics calculation

global_mean_b2m

In [None]:
# Monthly Mean Portfolio Returns for CashFlow2TA_rank
meanret = merged_data.groupby(['year','month', 'CashFlow2TA_rank'])['RET'].mean().to_frame()   # Calculating average monthly return for each decile (formed based on CashFlow2TA_rank)
meanret

In [None]:
# Compute the difference between extreme portfolio returns and the Global mean for CashFlow2TA_rank
meanret = meanret.unstack(level = -1).copy()                                       # Unstacking the grouped dataframe
meanret[('RET', 'diff')] = meanret[('RET', 9)] -  meanret[('RET', 0)]              # Calculating the long short returns of the portfolio by substracting "rank 0" avg. return from "rank 9" avg. return

nmon = len(meanret)                                                                # nmon in number of months
meanret = meanret.stack(level = -1, future_stack=True).copy()                                         # Stacking the dataframe to year-month index level

# Overall Portfolio Returns Statistics
global_mean_CashFlow2TA = meanret.groupby('CashFlow2TA_rank')['RET'].agg(['mean', 'std'])                # mean and standard deviation of regression coefficients
global_mean_CashFlow2TA['t-stat'] =np.sqrt(nmon - 1) *  global_mean_CashFlow2TA['mean']/global_mean_CashFlow2TA['std'] # t-statistics calculation
global_mean_CashFlow2TA



What does the evidence that bigger Cash flow stocks earn bigger returns suggest?

In [None]:
cf2ta = global_mean_CashFlow2TA['mean'].to_frame().copy()
b2m = global_mean_b2m['mean'].to_frame().copy()
cf2ta.reset_index(inplace=True)
b2m.reset_index(inplace=True)
b2m.rename(columns = {'mean' : 'b2m_mean', 'b2m_rank' : 'rank'}, inplace = True)
cf2ta.rename(columns = {'mean' : 'cf2ta_mean', 'CashFlow2TA_rank' : 'rank'}, inplace = True)
print(cf2ta)

In [None]:
# Merge data
df = pd.merge(b2m, cf2ta, on = 'rank')
# Filter out rows with 'diff' before conversion
df = df[df['rank'] != 'diff']  # Filter out 'diff'
df['rank']=df['rank'].astype(int)
df

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from mpl_toolkits.mplot3d import Axes3D

# Merge data
df = pd.merge(b2m, cf2ta, on = 'rank')
# Filter out rows with 'diff' before conversion
df = df[df['rank'] != 'diff']  # Filter out 'diff'
df['rank']=df['rank'].astype(int)

# Create mesh grid
X, Y = np.meshgrid(df['rank'], df['rank'])
X = X.flatten()  # Convert to 1D array
Y = Y.flatten()
Z = np.zeros_like(X)  # Z starts at zero

# Heights of bars (b2m_mean from both datasets)
bar_heights = np.array([b2m_val + cf2ta_val for b2m_val in df['b2m_mean'] for cf2ta_val in df['cf2ta_mean']])

# Bar sizes
width = depth = 0.4

# Create 3D figure
fig = plt.figure(figsize=(12, 8))
ax = fig.add_subplot(111, projection='3d')

# Plot 100 bars
ax.bar3d(X, Y, Z, width, depth, bar_heights, shade=True)

# Labels and title
ax.set_xlabel('b2m Rank')
ax.set_ylabel('cf2ta Rank')
ax.set_zlabel(' Return')
plt.title('B2M Mean vs. CF2TA Mean by Rank')

plt.show()


In [None]:
#combine scores - ensure that bigger score implies better signal
merged_data['b2m_pct_rank']= merged_data.groupby(['year','month'])['b2m'].rank(pct = True)
merged_data['CashFlow2TA_rank']= merged_data.groupby(['year','month'])['CashFlow2TA'].rank(pct = True)
merged_data['combined_pct_rank'] = merged_data['b2m_pct_rank'] + merged_data['CashFlow2TA_rank']
merged_data.dropna(subset = ['RET', 'combined_pct_rank'], how = 'any', inplace = True)   #Drop only observations where relevant variables are nan
merged_data = merged_data.loc[merged_data.year >= 2000].copy()
merged_data.head()

In [None]:
n_cut = 10
merged_data['combined_rank'] = merged_data.groupby(['year','month'])['combined_pct_rank'].transform(lambda x: pd.qcut(x, n_cut, duplicates='drop',labels=False)) # Calculating Ranks based on combined_pct_rank
meanret = merged_data.groupby(['year','month', 'combined_rank'])['RET'].mean().to_frame()   # Calculate average monthly return for each combined_rank decile


In [None]:
# Compute the difference between extreme portfolio returns and the Global mean for combined_pct_rank
meanret = meanret.unstack(level = -1).copy()                                       # Unstacking the grouped dataframe
meanret[('RET', 'diff')] = meanret[('RET', n_cut - 1)] -  meanret[('RET', 0)]              # Calculating the long short returns of the portfolio by substracting "rank 0" avg. return from "rank 9" avg. return

nmon = len(meanret)                                                                # nmon in number of months
meanret = meanret.stack(level = -1, future_stack=True).copy()                                         # Stacking the dataframe to year-month index level

# Overall Portfolio Returns Statistics
global_mean_combined = meanret.groupby('combined_rank')['RET'].agg(['mean', 'std'])                # mean and standard deviation of regression coefficients
global_mean_combined['t-stat'] =np.sqrt(nmon - 1) *  global_mean_combined['mean']/global_mean_combined['std'] # t-statistics calculation
global_mean_combined



In [None]:
#Plot average returns for each decile
import matplotlib.pyplot as plt


# Plot bar chart using DataFrame
plt.figure(figsize=(10, 5))

# Filter out the 'diff' row before plotting
global_mean_b2m_filtered = global_mean_b2m[global_mean_b2m.index != 'diff']

# Accessing the index values instead of a column
plt.bar(global_mean_b2m_filtered.index, global_mean_b2m_filtered['mean'], color='skyblue', edgecolor='black')


# Labels and title
plt.xlabel("Combined Rank Deciles")
plt.ylabel("Mean Return")
plt.title("Mean Return by Combined Rank Decile")
# Ensuring integer tick labels:
# Convert index to integers or strings before passing to plt.xticks
plt.xticks(global_mean_b2m_filtered.index.astype(int)) #or plt.xticks(global_mean_b2m_filtered.index.astype(str))
plt.grid(axis='y', linestyle='--', alpha=0.7)

# Show plot
plt.show()