# Panel Regression - Firm Characteristics

A panel regression is a suitable regression method for analyzing our data, especially since you have quarterly data for multiple companies over a long period of time. A panel regression model accounts for both within-entity and between-entity variations, making it a useful tool to analyze data with a time series and cross-sectional dimension.  
With panel data, we can control for individual-level characteristics that may affect forecast accuracy by including fixed effects for each company or industry. We can also account for time-specific factors that may affect forecast accuracy by including time fixed effects or time-varying covariates.

### Random Effects Panel Regression

In [1]:
import pandas as pd
import numpy as np
import datetime as dt
import sklearn
import statsmodels.api as sm
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from linearmodels.panel import PanelOLS
from linearmodels.panel import RandomEffects


Data

In [2]:
df = pd.read_csv("Dataframes/characteristics_regression.csv")
df["Date"] = pd.to_datetime(df["Date"])
df

Unnamed: 0,Instrument,Date,GICS Industry Group Name,Earnings Per Share - Actual Surprise AbsVals,Revenue - Actual,Market Capitalization,Earnings Per Share – Coefficient of Variation,loss firm status
0,AVY.N,2013-01-01,Materials,2.499631,-0.378055,-0.392700,-0.170865,-1
1,AVY.N,2013-04-01,Materials,1.247607,-0.380857,-0.385490,-0.235084,-1
2,AVY.N,2013-07-01,Materials,0.726582,-0.376364,-0.385908,-0.255193,-1
3,AVY.N,2013-10-01,Materials,2.207725,-0.380352,-0.385784,-0.205410,-1
4,AVY.N,2014-01-01,Materials,0.904623,-0.373706,-0.380857,-0.187991,-1
...,...,...,...,...,...,...,...,...
18604,POOL.OQ,2021-10-01,Retailing,2.901092,-0.388214,-0.278490,-0.202958,-1
18605,POOL.OQ,2022-01-01,Retailing,3.720063,-0.419838,-0.235483,-0.185330,-1
18606,POOL.OQ,2022-04-01,Retailing,3.565072,-0.388113,-0.282188,-0.048679,-1
18607,POOL.OQ,2022-07-01,Retailing,0.917490,-0.334002,-0.305930,-0.170880,-1


Standardising and log transformation

#### Fixed Effects Panel Regression

In a fixed effects panel regression, the individual-specific effects are modeled as fixed variables that do not vary across time. This means that the coefficients of the independent variables are estimated based on the within-entity variation in the data, which eliminates the effect of time-invariant unobserved heterogeneity.

Fixed effects models are useful when there are time-invariant unobserved variables that may affect the dependent variable, but are not included in the model. By modeling the individual-specific effects as fixed variables, fixed effects models can control for this unobserved heterogeneity and estimate the coefficients of the independent variables based on the within-entity variation, which provides more efficient estimates of the coefficients.

One limitation of fixed effects models is that they do not allow for testing the effect of time-invariant variables on the dependent variable. In addition, fixed effects models may suffer from the incidental parameter problem, which may lead to biased estimates of the coefficients of the independent variables in the presence of a large number of fixed effects.

In [3]:
#reformatting indices of dataframe for panel regression
df['Instrument'] = df['Instrument'].astype('category')
df['Date'] = pd.to_datetime(df['Date'])

# set the index to be the time variable and the cross-sectional variable
df.set_index(['Instrument', 'Date'], inplace=True)
#df_clean

In [4]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,GICS Industry Group Name,Earnings Per Share - Actual Surprise AbsVals,Revenue - Actual,Market Capitalization,Earnings Per Share – Coefficient of Variation,loss firm status
Instrument,Date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
AVY.N,2013-01-01,Materials,2.499631,-0.378055,-0.392700,-0.170865,-1
AVY.N,2013-04-01,Materials,1.247607,-0.380857,-0.385490,-0.235084,-1
AVY.N,2013-07-01,Materials,0.726582,-0.376364,-0.385908,-0.255193,-1
AVY.N,2013-10-01,Materials,2.207725,-0.380352,-0.385784,-0.205410,-1
AVY.N,2014-01-01,Materials,0.904623,-0.373706,-0.380857,-0.187991,-1
...,...,...,...,...,...,...,...
POOL.OQ,2021-10-01,Retailing,2.901092,-0.388214,-0.278490,-0.202958,-1
POOL.OQ,2022-01-01,Retailing,3.720063,-0.419838,-0.235483,-0.185330,-1
POOL.OQ,2022-04-01,Retailing,3.565072,-0.388113,-0.282188,-0.048679,-1
POOL.OQ,2022-07-01,Retailing,0.917490,-0.334002,-0.305930,-0.170880,-1


Regression with **Absolute** Surprise Values

In [5]:
y = df.loc[:, "Earnings Per Share - Actual Surprise AbsVals"]
X = df.loc[:, "Revenue - Actual":]

# perform the fixed effects panel regression
fixed_effects_model = PanelOLS(y, X, entity_effects=True, time_effects=True)

# fit the model and print the summary statistics
fixed_effects_results = fixed_effects_model.fit()
fixed_effects_results.summary


Inputs contain missing values. Dropping rows with missing observations.
  super().__init__(dependent, exog, weights=weights, check_rank=check_rank)


0,1,2,3
Dep. Variable:,Earnings Per Share - Actual Surprise AbsVals,R-squared:,0.0479
Estimator:,PanelOLS,R-squared (Between):,-0.0456
No. Observations:,18541,R-squared (Within):,0.0539
Date:,"Wed, Mar 08 2023",R-squared (Overall):,-0.0407
Time:,16:38:11,Log-likelihood,-2.349e+04
Cov. Estimator:,Unadjusted,,
,,F-statistic:,226.57
Entities:,502,P-value,0.0000
Avg Obs:,36.934,Distribution:,"F(4,17996)"
Min Obs:,2.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
Revenue - Actual,-0.0422,0.0276,-1.5307,0.1259,-0.0962,0.0118
Market Capitalization,-0.0070,0.0152,-0.4622,0.6439,-0.0369,0.0228
Earnings Per Share – Coefficient of Variation,0.2016,0.0071,28.220,0.0000,0.1876,0.2156
loss firm status,0.0874,0.0157,5.5635,0.0000,0.0566,0.1182


#### By Industry: Example Tech Industry

With Absolute Surprise Values

In [14]:
group = df_panel[df_panel['GICS Industry Group Name'] == 'Technology Hardware & Equipment']

group['Instrument'] = group['Instrument'].astype('category')
group['Date'] = pd.to_datetime(group['Date'])
group.set_index(['Instrument', 'Date'], inplace=True)

y = group.loc[:, "Earnings Per Share - Actual Surprise AbsVals"]
X = group.loc[:, "Revenue - Actual":]

model = PanelOLS(y, X, entity_effects=True, time_effects=True).fit()
model

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  group['Instrument'] = group['Instrument'].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  group['Date'] = pd.to_datetime(group['Date'])
Inputs contain missing values. Dropping rows with missing observations.
  super().__init__(dependent, exog, weights=weights, check_rank=check_rank)


0,1,2,3
Dep. Variable:,Earnings Per Share - Actual Surprise AbsVals,R-squared:,0.0281
Estimator:,PanelOLS,R-squared (Between):,-0.1188
No. Observations:,728,R-squared (Within):,0.0361
Date:,"Tue, Mar 07 2023",R-squared (Overall):,-0.0972
Time:,20:19:03,Log-likelihood,-842.14
Cov. Estimator:,Unadjusted,,
,,F-statistic:,3.1984
Entities:,19,P-value,0.0042
Avg Obs:,38.316,Distribution:,"F(6,664)"
Min Obs:,27.000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
Revenue - Actual,-0.1008,0.1082,-0.9318,0.3518,-0.3132,0.1116
Market Capitalization,0.0494,0.0340,1.4518,0.1470,-0.0174,0.1163
Enterprise Value To Sales (Daily Time Series Ratio),-0.3090,0.4395,-0.7031,0.4823,-1.1719,0.5540
3 Month Total Return,0.1077,0.0395,2.7279,0.0065,0.0302,0.1852
Volume,0.0069,0.0169,0.4054,0.6853,-0.0263,0.0400
loss firm status,0.2138,0.0702,3.0449,0.0024,0.0759,0.3517
