# Policy Analysis with Pooled Cross Sections (Lab1)

### Intro and objectives


### In this lab you will learn:
1. examples of policy analysis using Pooled Cross Sections
2. how to fit pooled cross sectional models in Python


## What I hope you'll get out of this lab
* The feeling that you'll "know where to start" when you need to evaluate the effect of policies or changes in a population
* Worked Examples
* How to interpret the results obtained

In [None]:
!pip install wooldridge
import wooldridge as woo
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


# Example. Effect of Worker Compensation Laws on Weeks out of Work

#### Meyer, Viscusi, and Durbin (1995) (hereafter, MVD) studied the length of time (in weeks) that an injured worker receives workers’ compensation. On July 15, 1980, Kentucky raised the cap on weekly earnings that were covered by workers’ compensation. An increase in the cap has no effect on the benefit for lowincome workers, but it makes it less costly for a high-income worker to stay on workers’ compensation.

#### Therefore, the control group is low-income workers, and the treatment group is high-income workers; high-income workers are defined as those who were subject to the pre-policy change cap.

#### Using random samples both before and after the policy change, MVD were able to test whether more generous workers’ compensation causes people to stay out of work longer (everything else fixed).












#### In this case we start by fitting a pooled cross section-based model such as:


$ log(durat)=\beta_0+δ_o*afchnge+\beta_1*highearn+\beta_2*afchnge*highearn $

#### Using log(durat) as the dependent variable. Let afchnge be the dummy variable for observations after the policy change and highearn the dummy variable for high earners.

#### The parameter of interest is on the interaction term afchnge·highearn: $δ_1$ measures the impact of the policy change (cap raise on weekly earnings) on the average length of time on workers’ compensation for high earners


In [None]:
injury = woo.dataWoo('injury')

In [None]:
injury.head()

Unnamed: 0,durat,afchnge,highearn,male,married,hosp,indust,injtype,age,prewage,...,head,neck,upextr,trunk,lowback,lowextr,occdis,manuf,construc,highlpre
0,1.0,1,1,1.0,0.0,1,3.0,1,26.0,404.950012,...,1,0,0,0,0,0,0,0.0,0.0,6.003764
1,1.0,1,1,1.0,1.0,0,3.0,1,31.0,643.825012,...,1,0,0,0,0,0,0,0.0,0.0,6.467427
2,84.0,1,1,1.0,1.0,1,3.0,1,37.0,398.125,...,1,0,0,0,0,0,0,0.0,0.0,5.986766
3,4.0,1,1,1.0,1.0,1,3.0,1,31.0,527.799988,...,1,0,0,0,0,0,0,0.0,0.0,6.268717
4,1.0,1,1,1.0,1.0,0,3.0,1,23.0,528.9375,...,1,0,0,0,0,0,0,0.0,0.0,6.27087


In [None]:
type(injury)

pandas.core.frame.DataFrame

In [None]:
injury.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7150 entries, 0 to 7149
Data columns (total 30 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   durat     7150 non-null   float64
 1   afchnge   7150 non-null   int64  
 2   highearn  7150 non-null   int64  
 3   male      7134 non-null   float64
 4   married   6853 non-null   float64
 5   hosp      7150 non-null   int64  
 6   indust    7125 non-null   float64
 7   injtype   7150 non-null   int64  
 8   age       7146 non-null   float64
 9   prewage   7150 non-null   float64
 10  totmed    7150 non-null   float64
 11  injdes    7150 non-null   int64  
 12  benefit   7150 non-null   float64
 13  ky        7150 non-null   int64  
 14  mi        7150 non-null   int64  
 15  ldurat    7150 non-null   float64
 16  afhigh    7150 non-null   int64  
 17  lprewage  7150 non-null   float64
 18  lage      7146 non-null   float64
 19  ltotmed   7150 non-null   float64
 20  head      7150 non-null   int6

In [None]:
# joint regression including an interaction term:
reg_1 = smf.ols(formula='ldurat ~ afchnge+highearn+afchnge*highearn', data=injury)
results_1 = reg_1.fit()

In [None]:
results_1.rsquared

0.015841687920047365

In [None]:
table_1 = pd.DataFrame({'b': round(results_1.params, 4),
                            'se': round(results_1.bse, 4),
                            't': round(results_1.tvalues, 4),
                            'pval': round(results_1.pvalues, 4)})
print(f'table_1: \n{table_1}\n')
print(f'R-Squared: {results_1.rsquared}')

table_1: 
                       b      se        t    pval
Intercept         1.1993  0.0271  44.2411  0.0000
afchnge           0.0236  0.0397   0.5953  0.5516
highearn          0.2152  0.0434   4.9629  0.0000
afchnge:highearn  0.1883  0.0628   2.9995  0.0027

R-Squared: 0.015841687920047365


## Based on the previous we have fitted the following model:




$ log(duration)=1.19+0.023*afchnge+0.215*highearn+0.1883*afchnge*highearn$

## How do we interpret the equation?

#### Based on the fitted model, we conclude:


#### 1. The interaction coefficient,$\delta_1=0.1883$, is large and statistically significant (p-value:0.0027) which implies that the average length of time on workers’ compensation for high earners increased by about 18.8% due to the increased earnings cap.
#### 2. The coefficient on afchnge is small and statistically insignificant
#### 3. The coefficient on highearn is large (0.215) and statistically significant (p-value:0.0000). This means that high earners tend to remain for longer periods of time out of work when injured that their low earner counterparts.

## Therefore we are finding evidence on the impact of the change in compensation policy.



#### Given the R-squared for the previous model, only 1.5% of the variation in log(durat). This makes sense: there are clearly many factors, including severity of the injury, that affect how long someone receives workers’ compensation.

#### We can add more factors to try to improve the model.

In [None]:
# joint regression including an interaction term:
reg_2 = smf.ols(formula='ldurat ~ afchnge+highearn+afchnge*highearn+afhigh+male+married+head+neck+upextr+trunk+lowback+lowextr', data=injury)
results_2 = reg_2.fit()

In [None]:
table_2 = pd.DataFrame({'b': round(results_2.params, 4),
                            'se': round(results_2.bse, 4),
                            't': round(results_2.tvalues, 4),
                            'pval': round(results_2.pvalues, 4)})
print(f'table_2: \n{table_2}\n')
print(f'R-Squared: {results_2.rsquared}')

table_2: 
                       b      se        t    pval
Intercept         1.4291  0.0839  17.0294  0.0000
afchnge           0.0286  0.0399   0.7156  0.4743
highearn          0.1863  0.0466   3.9972  0.0001
afchnge:highearn  0.0989  0.0318   3.1079  0.0019
afhigh            0.0989  0.0318   3.1079  0.0019
male             -0.0901  0.0402  -2.2441  0.0249
married           0.1076  0.0352   3.0625  0.0022
head             -0.6835  0.1102  -6.2038  0.0000
neck              0.0612  0.1417   0.4321  0.6657
upextr           -0.3017  0.0800  -3.7711  0.0002
trunk             0.0263  0.0877   0.2999  0.7643
lowback          -0.2088  0.0806  -2.5910  0.0096
lowextr          -0.2657  0.0814  -3.2619  0.0011

R-Squared: 0.030954827310505917


#### The previous model has a larger R-squared and therefore is preferable to the first one (3% v.s. 1.5%).

#### We observe similar values for the coefficient on the interaction term (afchange*highearn).