# 📘 Notebook 4: Fixed Effects Models with Python (Entity Effects Only)

## 💡 Preamble: Understanding Panel Estimators
This notebook demonstrates five key approaches to estimating models with panel data:
- **Pooled OLS**
- **First Differences (FD)**
- **LSDV (Fixed Effects via Dummies)**
- **Within Estimator (Manual Demeaning)**
- **Between Estimator (Firm-Level Averages)**

Each section includes: the model implementation and interpretation based on output from synthetic Indian firm data.

In [1]:
import pandas as pd
import statsmodels.api as sm

# Load the synthetic panel dataset
df_panel = pd.read_csv("indian_firms_financials_synthetic.csv")
df_panel = df_panel.set_index(['FirmID', 'Year'])

Y = df_panel['Profitability']
X_vars = ['RND_Expenses', 'Advertising_Spends', 'Debt_Equity_Ratio', 'Firm_Size']
X = df_panel[X_vars]
X = sm.add_constant(X)

## 🔹 Part 1: Naive Pooled OLS
**Concept**: Ignores the panel structure. Treats all firm-year observations as one flat dataset.

**Use Case**: Simplest model but often biased if unobserved firm-level effects are correlated with regressors.

**Interpretation of Output**:
- R&D and Advertising positively affect profitability.
- Firm Size also has a significant positive effect.
- However, no firm-specific controls means omitted variable bias is likely.

In [2]:
print("\n--- Pooled OLS (Baseline) ---")
model_pooled = sm.OLS(Y, X).fit()
print(model_pooled.summary())


--- Pooled OLS (Baseline) ---
                            OLS Regression Results                            
Dep. Variable:          Profitability   R-squared:                       0.768
Model:                            OLS   Adj. R-squared:                  0.748
Method:                 Least Squares   F-statistic:                     37.32
Date:                Wed, 28 May 2025   Prob (F-statistic):           9.34e-14
Time:                        23:13:11   Log-Likelihood:                -61.131
No. Observations:                  50   AIC:                             132.3
Df Residuals:                      45   BIC:                             141.8
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
                         coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------
const

## 🔹 Part 2: First Differences (FD)
**Concept**: Subtracts each variable's previous year value from its current value. Removes fixed (time-invariant) firm effects.

**Use Case**: Good for small T panels. Captures year-to-year firm-level changes.

**Interpretation of Output**:
- R&D and Ad Spend remain significant ➤ Their impact is not due to firm traits.
- Firm Size disappears (no within-firm change).

In [3]:
df_fd = df_panel.reset_index()
df_fd_diff = df_fd.groupby('FirmID')[['Profitability'] + X_vars].diff().dropna()
Y_fd = df_fd_diff['Profitability']
X_fd = df_fd_diff[X_vars]
model_fd = sm.OLS(Y_fd, X_fd).fit()
print(model_fd.summary())

                                 OLS Regression Results                                
Dep. Variable:          Profitability   R-squared (uncentered):                   0.614
Model:                            OLS   Adj. R-squared (uncentered):              0.582
Method:                 Least Squares   F-statistic:                              19.58
Date:                Wed, 28 May 2025   Prob (F-statistic):                    9.04e-08
Time:                        23:13:21   Log-Likelihood:                         -62.013
No. Observations:                  40   AIC:                                      130.0
Df Residuals:                      37   BIC:                                      135.1
Df Model:                           3                                                  
Covariance Type:            nonrobust                                                  
                         coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------

  return np.sqrt(eigvals[0]/eigvals[-1])


## 🔹 Part 3: LSDV (Fixed Effects using Dummies)
**Concept**: Adds dummy variables for each firm to absorb unobserved firm-specific characteristics.

**Use Case**: Useful when you want to explicitly estimate firm-level fixed effects.

**Interpretation of Output**:
- Identical β coefficients to Within estimator.
- Significant positive effects for R&D and Ad Spend.
- Firm dummies estimate each firm's intercept.

In [4]:
df_lsdv = df_panel.reset_index()
Y_lsdv = df_lsdv['Profitability']
X_lsdv_original = df_lsdv[X_vars]
entity_dummies = pd.get_dummies(df_lsdv['FirmID'], drop_first=True, prefix='Firm').astype(int)
X_lsdv_entity = pd.concat([X_lsdv_original, entity_dummies], axis=1)
X_lsdv_entity = sm.add_constant(X_lsdv_entity)
model_lsdv_entity = sm.OLS(Y_lsdv, X_lsdv_entity).fit()
print(model_lsdv_entity.summary())

                            OLS Regression Results                            
Dep. Variable:          Profitability   R-squared:                       0.804
Model:                            OLS   Adj. R-squared:                  0.741
Method:                 Least Squares   F-statistic:                     12.65
Date:                Wed, 28 May 2025   Prob (F-statistic):           1.09e-09
Time:                        23:13:35   Log-Likelihood:                -56.945
No. Observations:                  50   AIC:                             139.9
Df Residuals:                      37   BIC:                             164.7
Df Model:                          12                                         
Covariance Type:            nonrobust                                         
                         coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------
const                  0.3714      0

## 🔹 Part 4: Within Estimator (Manual Demeaning)
**Concept**: Subtracts firm-level means from each variable, removing all time-invariant firm effects.

**Use Case**: More computationally efficient than LSDV.

**Interpretation of Output**:
- Coefficients match LSDV.
- Estimates driven entirely by within-firm variation over time.

In [5]:
X_demeaned = X.groupby('FirmID').transform(lambda x: x - x.mean())
Y_demeaned = Y.groupby('FirmID').transform(lambda y: y - y.mean())
X_demeaned_ols = X_demeaned.drop(columns=['const'], errors='ignore')
model_manual_within = sm.OLS(Y_demeaned, X_demeaned_ols).fit()
print(model_manual_within.summary())

                                 OLS Regression Results                                
Dep. Variable:          Profitability   R-squared (uncentered):                   0.596
Model:                            OLS   Adj. R-squared (uncentered):              0.561
Method:                 Least Squares   F-statistic:                              16.99
Date:                Wed, 28 May 2025   Prob (F-statistic):                    1.28e-08
Time:                        23:13:44   Log-Likelihood:                         -56.945
No. Observations:                  50   AIC:                                      121.9
Df Residuals:                      46   BIC:                                      129.5
Df Model:                           4                                                  
Covariance Type:            nonrobust                                                  
                         coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------

## 🔹 Part 5: Between Estimator
**Concept**: Uses only firm-level average values over time to estimate relationships.

**Use Case**: Describes long-run cross-sectional relationships. Does NOT remove αᵢ.

**Interpretation of Output**:
- R² is high due to averaging.
- Firm Size is significant (explains cross-firm variation).
- Not useful for causal inference.

## 🔚 Comparison of Estimators
| Estimator         | Controls for αᵢ? | Best For                          | Key Takeaway                         |
|------------------|------------------|-----------------------------------|--------------------------------------|
| Pooled OLS       | ❌               | Basic regression (biased)         | R&D and Ad Spend look strong, but likely biased |
| First Differences| ✅               | Year-over-year firm-level changes | R&D and Ads still significant        |
| LSDV             | ✅               | Interpretable fixed effects       | Matches Within; adds dummy terms    |
| Within           | ✅               | Efficient FE estimation           | Same as LSDV, less memory-intensive  |
| Between          | ❌               | Comparing firm averages           | High R² but poor for causal inference |

In [7]:
print("\n--- Comparison of Estimators ---")
print("\nManual Within Estimator Coefficients:")
print(model_manual_within.params)
print("\nLSDV Coefficients:")
print(model_lsdv_entity.summary())
print("\nFirst Differences Coefficients:")
print(model_fd.params)
print("\nBetween Estimator Coefficients (Manual):") 
print(model_between.params)


--- Comparison of Estimators ---

Manual Within Estimator Coefficients:
RND_Expenses          0.470352
Advertising_Spends    0.285660
Debt_Equity_Ratio    -0.196385
Firm_Size             0.019531
dtype: float64

LSDV Coefficients:
                            OLS Regression Results                            
Dep. Variable:          Profitability   R-squared:                       0.804
Model:                            OLS   Adj. R-squared:                  0.741
Method:                 Least Squares   F-statistic:                     12.65
Date:                Wed, 28 May 2025   Prob (F-statistic):           1.09e-09
Time:                        23:14:01   Log-Likelihood:                -56.945
No. Observations:                  50   AIC:                             139.9
Df Residuals:                      37   BIC:                             164.7
Df Model:                          12                                         
Covariance Type:            nonrobust                    