# Panel Regression - Firm Characteristics

A panel regression is a suitable regression method for analyzing our data, especially since we have quarterly data for multiple companies over a long period of time. A panel regression model accounts for both within-entity and between-entity variations, making it a useful tool to analyze data with a time series and cross-sectional dimension.  
With panel data, we can control for individual-level characteristics that may affect forecast accuracy by including fixed effects for each company or industry. We can also account for time-specific factors that may affect forecast accuracy by including time fixed effects or time-varying covariates.

### Random Effects Panel Regression

In [97]:
import pandas as pd
import numpy as np
import datetime as dt
import sklearn
import statsmodels.api as sm
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from linearmodels.panel import PanelOLS
from linearmodels.panel import RandomEffects


Data

In [98]:
df = pd.read_csv("Dataframes/economic_story_regression.csv")
df["Date"] = pd.to_datetime(df["Date"])
df

Unnamed: 0,Instrument,Date,Earnings Per Share - Actual Surprise AbsVals,GICS Industry Group Name,Earnings Per Share – Coefficient of Variation,Market Capitalization,Revenue - Actual,Number of Analysts,Recommendation - Mean (1-5).1,Price Target - Standard Deviation,3 Month Total Return,loss firm status,CBOE Crude Oil ETF Volatility Index,90-Day AA Financial Commercial Paper Interest Rate,Inflation Risk Premium,"University of Michigan: Consumer Sentiment, Index 1966:Q1=100",Unemployment Rate
0,AVY.N,2013-01-01,2.499631,Materials,-0.170865,-0.392700,-0.378055,-1.454924,3.000,-0.372481,0.469207,-1,-1.004420,-0.795818,-0.380688,-0.747049,1.389797
1,AVY.N,2013-04-01,1.247607,Materials,-0.235084,-0.385490,-0.380857,-1.454924,3.000,-0.314283,1.410685,-1,-0.944876,-0.817719,-0.367655,-0.338211,1.278991
2,AVY.N,2013-07-01,0.726582,Materials,-0.255193,-0.385908,-0.376364,-1.317440,2.875,-0.299006,-0.275801,-1,-0.904454,-0.835560,0.931854,-0.346387,1.112782
3,AVY.N,2013-10-01,2.207725,Materials,-0.205410,-0.385784,-0.380352,-1.454924,2.875,-0.332867,-0.100077,-1,-1.183607,-0.835381,0.655696,-0.725244,0.946572
4,AVY.N,2014-01-01,0.904623,Materials,-0.187991,-0.380857,-0.373706,-1.317440,2.625,-0.333923,0.843142,-1,-1.194955,-0.834965,0.952897,-0.398173,0.798831
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18604,POOL.OQ,2021-10-01,2.901092,Retailing,-0.202958,-0.278490,-0.388214,-1.317440,2.300,1.525227,-0.629509,-1,0.308492,-0.820667,-0.089760,-1.300343,-0.567779
18605,POOL.OQ,2022-01-01,3.720063,Retailing,-0.185330,-0.235483,-0.419838,-1.317440,2.300,1.163772,1.852531,-1,0.927196,-0.499303,-0.111148,-1.853637,-0.789391
18606,POOL.OQ,2022-04-01,3.565072,Retailing,-0.048679,-0.282188,-0.388113,-1.179957,2.300,1.011830,-2.025245,-1,0.783834,0.413058,1.332205,-2.284280,-0.900197
18607,POOL.OQ,2022-07-01,0.917490,Retailing,-0.170880,-0.305930,-0.334002,-1.179957,2.000,1.805131,-1.781424,-1,0.745658,1.881451,0.418886,-2.428736,-0.918665


#### Fixed Effects

In a fixed effects panel regression, the individual-specific effects are modeled as fixed variables that do not vary across time. This means that the coefficients of the independent variables are estimated based on the within-entity variation in the data, which eliminates the effect of time-invariant unobserved heterogeneity.

Fixed effects models are useful when there are time-invariant unobserved variables that may affect the dependent variable, but are not included in the model. By modeling the individual-specific effects as fixed variables, fixed effects models can control for this unobserved heterogeneity and estimate the coefficients of the independent variables based on the within-entity variation, which provides more efficient estimates of the coefficients.

One limitation of fixed effects models is that they do not allow for testing the effect of time-invariant variables on the dependent variable. In addition, fixed effects models may suffer from the incidental parameter problem, which may lead to biased estimates of the coefficients of the independent variables in the presence of a large number of fixed effects.

# Fixed Effects Model

In [99]:
df = df.dropna()
df.set_index(['Instrument', 'Date'], inplace=True)
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Earnings Per Share - Actual Surprise AbsVals,GICS Industry Group Name,Earnings Per Share – Coefficient of Variation,Market Capitalization,Revenue - Actual,Number of Analysts,Recommendation - Mean (1-5).1,Price Target - Standard Deviation,3 Month Total Return,loss firm status,CBOE Crude Oil ETF Volatility Index,90-Day AA Financial Commercial Paper Interest Rate,Inflation Risk Premium,"University of Michigan: Consumer Sentiment, Index 1966:Q1=100",Unemployment Rate
Instrument,Date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
AVY.N,2013-01-01,2.499631,Materials,-0.170865,-0.392700,-0.378055,-1.454924,3.000,-0.372481,0.469207,-1,-1.004420,-0.795818,-0.380688,-0.747049,1.389797
AVY.N,2013-04-01,1.247607,Materials,-0.235084,-0.385490,-0.380857,-1.454924,3.000,-0.314283,1.410685,-1,-0.944876,-0.817719,-0.367655,-0.338211,1.278991
AVY.N,2013-07-01,0.726582,Materials,-0.255193,-0.385908,-0.376364,-1.317440,2.875,-0.299006,-0.275801,-1,-0.904454,-0.835560,0.931854,-0.346387,1.112782
AVY.N,2013-10-01,2.207725,Materials,-0.205410,-0.385784,-0.380352,-1.454924,2.875,-0.332867,-0.100077,-1,-1.183607,-0.835381,0.655696,-0.725244,0.946572
AVY.N,2014-01-01,0.904623,Materials,-0.187991,-0.380857,-0.373706,-1.317440,2.625,-0.333923,0.843142,-1,-1.194955,-0.834965,0.952897,-0.398173,0.798831
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
POOL.OQ,2021-10-01,2.901092,Retailing,-0.202958,-0.278490,-0.388214,-1.317440,2.300,1.525227,-0.629509,-1,0.308492,-0.820667,-0.089760,-1.300343,-0.567779
POOL.OQ,2022-01-01,3.720063,Retailing,-0.185330,-0.235483,-0.419838,-1.317440,2.300,1.163772,1.852531,-1,0.927196,-0.499303,-0.111148,-1.853637,-0.789391
POOL.OQ,2022-04-01,3.565072,Retailing,-0.048679,-0.282188,-0.388113,-1.179957,2.300,1.011830,-2.025245,-1,0.783834,0.413058,1.332205,-2.284280,-0.900197
POOL.OQ,2022-07-01,0.917490,Retailing,-0.170880,-0.305930,-0.334002,-1.179957,2.000,1.805131,-1.781424,-1,0.745658,1.881451,0.418886,-2.428736,-0.918665


In [100]:
y = df["Earnings Per Share - Actual Surprise AbsVals"]
X = df.loc[:, "Earnings Per Share – Coefficient of Variation":]
# perform the fixed effects panel regression, setting check_rank=False
fixed_effects_model = PanelOLS(y, X, entity_effects=True, time_effects=True, drop_absorbed=True, check_rank=False)

# fit the model and print the summary statistics
fixed_effects_results = fixed_effects_model.fit()
fixed_effects_results.summary

Variables have been fully absorbed and have removed from the regression:

CBOE Crude Oil ETF Volatility Index, 90-Day AA Financial Commercial Paper Interest Rate, Inflation Risk Premium, University of Michigan: Consumer Sentiment, Index 1966:Q1=100, Unemployment Rate

  fixed_effects_results = fixed_effects_model.fit()


0,1,2,3
Dep. Variable:,Earnings Per Share - Actual Surprise AbsVals,R-squared:,0.0554
Estimator:,PanelOLS,R-squared (Between):,0.4593
No. Observations:,17595,R-squared (Within):,0.0619
Date:,"Wed, Mar 08 2023",R-squared (Overall):,0.4062
Time:,20:20:09,Log-likelihood,-2.218e+04
Cov. Estimator:,Unadjusted,,
,,F-statistic:,125.01
Entities:,500,P-value,0.0000
Avg Obs:,35.190,Distribution:,"F(8,17048)"
Min Obs:,1.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
Earnings Per Share – Coefficient of Variation,0.1952,0.0073,26.771,0.0000,0.1809,0.2095
Market Capitalization,-0.0067,0.0156,-0.4312,0.6663,-0.0373,0.0239
Revenue - Actual,-0.0414,0.0277,-1.4954,0.1348,-0.0956,0.0129
Number of Analysts,-0.0955,0.0194,-4.9285,0.0000,-0.1334,-0.0575
Recommendation - Mean (1-5).1,0.2893,0.0274,10.550,0.0000,0.2355,0.3430
Price Target - Standard Deviation,-0.0102,0.0105,-0.9736,0.3303,-0.0308,0.0103
3 Month Total Return,0.0408,0.0079,5.1518,0.0000,0.0253,0.0563
loss firm status,0.0794,0.0163,4.8662,0.0000,0.0474,0.1114


In [101]:
#Robustness

#Panel A
## Model 1
#X = df[["Earnings Per Share – Coefficient of Variation"]]
##Model 2
#X = df[["Market Capitalization"]]
## Model 3
#X = df[["Revenue - Actual"]]
## Model 4
#X = df[["Number of Analysts"]]
## Model 5
#X = df[["Recommendation - Mean (1-5).1"]]
## Model 6
#X = df[["Price Target - Standard Deviation"]]
## Model 7
#X = df[["3 Month Total Return"]]
## Model 8
#X = df[["loss firm status"]]
## Model 9
#X = df[["Earnings Per Share – Coefficient of Variation", "Market Capitalization", "Revenue - Actual", "Number of Analysts", "Recommendation - Mean (1-5).1", "Price Target - Standard Deviation", "3 Month Total Return", "loss firm status"]]
## Model 10
#X = df[["Market Capitalization", "Revenue - Actual", "Recommendation - Mean (1-5).1"]]
## Model 11
#X = df[["Earnings Per Share – Coefficient of Variation", "Revenue - Actual", "Number of Analysts", "Recommendation - Mean (1-5).1", "3 Month Total Return", "loss firm status"]]


#Panel B
## Model 1
#X = df[["CBOE Crude Oil ETF Volatility Index"]]
## Model 2
#X = df[["90-Day AA Financial Commercial Paper Interest Rate" ]]
## Model 3
#X = df[["Inflation Risk Premium"]]
## Model 4
#X = df[["University of Michigan: Consumer Sentiment, Index 1966:Q1=100"]]
## Model 5
#X = df[["Unemployment Rate"]]
## Model 6
#X = df[["CBOE Crude Oil ETF Volatility Index", "90-Day AA Financial Commercial Paper Interest Rate", "Inflation Risk Premium","University of Michigan: Consumer Sentiment, Index 1966:Q1=100", "Unemployment Rate"]]


# Without covid

In [102]:
df = pd.read_csv("Dataframes/economic_story_regression.csv")
df["Date"] = pd.to_datetime(df["Date"])

In [103]:
covid_start = pd.to_datetime("2020-01-01")

df_nocovid = df[df["Date"] < covid_start]
df_nocovid

Unnamed: 0,Instrument,Date,Earnings Per Share - Actual Surprise AbsVals,GICS Industry Group Name,Earnings Per Share – Coefficient of Variation,Market Capitalization,Revenue - Actual,Number of Analysts,Recommendation - Mean (1-5).1,Price Target - Standard Deviation,3 Month Total Return,loss firm status,CBOE Crude Oil ETF Volatility Index,90-Day AA Financial Commercial Paper Interest Rate,Inflation Risk Premium,"University of Michigan: Consumer Sentiment, Index 1966:Q1=100",Unemployment Rate
0,AVY.N,2013-01-01,2.499631,Materials,-0.170865,-0.392700,-0.378055,-1.454924,3.00000,-0.372481,0.469207,-1,-1.004420,-0.795818,-0.380688,-0.747049,1.389797
1,AVY.N,2013-04-01,1.247607,Materials,-0.235084,-0.385490,-0.380857,-1.454924,3.00000,-0.314283,1.410685,-1,-0.944876,-0.817719,-0.367655,-0.338211,1.278991
2,AVY.N,2013-07-01,0.726582,Materials,-0.255193,-0.385908,-0.376364,-1.317440,2.87500,-0.299006,-0.275801,-1,-0.904454,-0.835560,0.931854,-0.346387,1.112782
3,AVY.N,2013-10-01,2.207725,Materials,-0.205410,-0.385784,-0.380352,-1.454924,2.87500,-0.332867,-0.100077,-1,-1.183607,-0.835381,0.655696,-0.725244,0.946572
4,AVY.N,2014-01-01,0.904623,Materials,-0.187991,-0.380857,-0.373706,-1.317440,2.62500,-0.333923,0.843142,-1,-1.194955,-0.834965,0.952897,-0.398173,0.798831
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18592,POOL.OQ,2018-10-01,1.754231,Retailing,-0.238465,-0.365672,-0.438705,-1.454924,2.33333,0.190893,0.457038,-1,0.133228,1.530977,0.769665,1.008230,-0.770923
18593,POOL.OQ,2019-01-01,2.089639,Retailing,-0.043962,-0.371814,-0.461271,-1.317440,2.42857,0.050885,-1.014751,-1,-0.267943,1.522915,-1.892097,0.708415,-0.752456
18594,POOL.OQ,2019-04-01,3.181257,Retailing,-0.030662,-0.367346,-0.456696,-1.454924,2.50000,-0.056342,0.514687,-1,-0.321476,1.406931,-3.116375,1.035485,-0.881730
18595,POOL.OQ,2019-07-01,0.177309,Retailing,-0.226331,-0.358699,-0.412622,-1.317440,2.14286,0.180828,0.851839,-1,-0.208906,1.111593,-0.887605,0.653903,-0.881730


In [104]:
df_nocovid['Instrument'] = df_nocovid['Instrument'].astype('category')
df_nocovid['Date'] = pd.to_datetime(df_nocovid['Date'])

# set the index to be the time variable and the cross-sectional variable
df_nocovid.set_index(['Instrument', 'Date'], inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_nocovid['Instrument'] = df_nocovid['Instrument'].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_nocovid['Date'] = pd.to_datetime(df_nocovid['Date'])


In [105]:
y = df_nocovid["Earnings Per Share - Actual Surprise AbsVals"]
X = df_nocovid.loc[:, "Earnings Per Share – Coefficient of Variation":]
# perform the fixed effects panel regression, setting check_rank=False
fixed_effects_model = PanelOLS(y, X, entity_effects=True, time_effects=True, drop_absorbed=True, check_rank=False)

# fit the model and print the summary statistics
fixed_effects_results = fixed_effects_model.fit()
fixed_effects_results.summary


Inputs contain missing values. Dropping rows with missing observations.
  super().__init__(dependent, exog, weights=weights, check_rank=check_rank)
Variables have been fully absorbed and have removed from the regression:

CBOE Crude Oil ETF Volatility Index, 90-Day AA Financial Commercial Paper Interest Rate, Inflation Risk Premium, University of Michigan: Consumer Sentiment, Index 1966:Q1=100, Unemployment Rate

  fixed_effects_results = fixed_effects_model.fit()


0,1,2,3
Dep. Variable:,Earnings Per Share - Actual Surprise AbsVals,R-squared:,0.0491
Estimator:,PanelOLS,R-squared (Between):,0.4876
No. Observations:,12394,R-squared (Within):,0.0504
Date:,"Wed, Mar 08 2023",R-squared (Overall):,0.4204
Time:,20:20:10,Log-likelihood,-1.501e+04
Cov. Estimator:,Unadjusted,,
,,F-statistic:,76.727
Entities:,497,P-value,0.0000
Avg Obs:,24.938,Distribution:,"F(8,11899)"
Min Obs:,0.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
Earnings Per Share – Coefficient of Variation,0.2423,0.0114,21.242,0.0000,0.2199,0.2647
Market Capitalization,0.0739,0.0411,1.7976,0.0723,-0.0067,0.1544
Revenue - Actual,-0.1730,0.0443,-3.9049,0.0001,-0.2598,-0.0862
Number of Analysts,-0.0295,0.0256,-1.1517,0.2495,-0.0797,0.0207
Recommendation - Mean (1-5).1,0.2851,0.0341,8.3693,0.0000,0.2183,0.3519
Price Target - Standard Deviation,0.0383,0.0265,1.4443,0.1487,-0.0137,0.0902
3 Month Total Return,0.0326,0.0100,3.2666,0.0011,0.0130,0.0522
loss firm status,0.0608,0.0193,3.1567,0.0016,0.0230,0.0985
