# Fixed Effects Estimation (Lab 4)

### Intro and objectives


### In this lab you will learn:
1. examples of fixed effects estimation
2. how to fit fixed effects models in Python


## What I hope you'll get out of this lab
* The feeling that you'll "know where to start" when you need to fit fixed effects models
* Worked Examples
* How to interpret the results obtained

In [1]:
!pip install wooldridge
!pip install linearmodels
import wooldridge as woo
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import linearmodels as plm

Collecting wooldridge
  Downloading wooldridge-0.4.5-py3-none-any.whl.metadata (2.6 kB)
Downloading wooldridge-0.4.5-py3-none-any.whl (5.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.1/5.1 MB[0m [31m27.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: wooldridge
Successfully installed wooldridge-0.4.5
Collecting linearmodels
  Downloading linearmodels-6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.9 kB)
Collecting mypy-extensions>=0.4 (from linearmodels)
  Downloading mypy_extensions-1.0.0-py3-none-any.whl.metadata (1.1 kB)
Collecting pyhdfe>=0.1 (from linearmodels)
  Downloading pyhdfe-0.2.0-py3-none-any.whl.metadata (4.0 kB)
Collecting formulaic>=1.0.0 (from linearmodels)
  Downloading formulaic-1.0.2-py3-none-any.whl.metadata (6.8 kB)
Collecting setuptools-scm<9.0.0,>=8.0.0 (from setuptools-scm[toml]<9.0.0,>=8.0.0->linearmodels)
  Downloading setuptools_scm-8.1.0-py3-none-any.whl.metadata (6.6 kB)
Collecting in

# Example. How can we reduce criminal activity ?


#### Cornwell and Trumbull (1994) used data on 90 counties in North Carolina, for the years 1981 through 1987, to estimate an unobserved effects model of crime; the data are contained in CRIME4.

#### Various factors including geographical location, attitudes toward crime, historical records, and reporting conventions might be contained in $a_i$. The crime rate is number of crimes per person, prbarr is the estimated probability of arrest, prbconv is the estimated probability of conviction (given an arrest), prbpris is the probability of serving time in prison (given a conviction), avgsen is the average sentence length served, and polpc is the number of police officers per capita. As is standard in criminometric studies, we use the logs of all variables to estimate elasticities. We also include a full set of year dummies to control for state trends in crime rates.




#### The objective is to determine which factors mitigate criminal activity
#### Variables:


county: county identifier

year: 81 to 87

crmrte: crimes committed per person

prbarr: 'probability' of arrest

prbconv: 'probability' of conviction

prbpris: 'probability' of prison sentenc

avgsen: avg. sentence, days

polpc: police per capita

density: people per sq. mile

taxpc: tax revenue per capita

west: =1 if in western N.C.

central: =1 if in central N.C.

urban: =1 if in SMSA

pctmin80: perc. minority, 1980

wcon: weekly wage, construction

wtuc: wkly wge, trns, util, commun

wtrd: wkly wge, whlesle, retail trade

wfir: wkly wge, fin, ins, real est

wser: wkly wge, service industry

wmfg: wkly wge, manufacturing

wfed: wkly wge, fed employees

wsta: wkly wge, state employees

wloc: wkly wge, local gov emps

mix: offense mix: face-to-face/other

pctymle: percent young male

d82: =1 if year == 82

d83: =1 if year == 83

d84: =1 if year == 84

d85: =1 if year == 85

d86: =1 if year == 86

d87: =1 if year == 87

lcrmrte: log(crmrte)

lprbarr: log(prbarr)

lprbconv: log(prbconv)

lprbpris: log(prbpris)

lavgsen: log(avgsen)

lpolpc: log(polpc)

ldensity: log(density)

ltaxpc: log(taxpc)

lwcon: log(wcon)

lwtuc: log(wtuc)

lwtrd: log(wtrd)

lwfir: log(wfir)

lwser: log(wser)

lwmfg: log(wmfg)

lwfed: log(wfed)

lwsta: log(wsta)

lwloc: log(wloc)

lmix: log(mix)

lpctymle: log(pctymle)

lpctmin: log(pctmin)

clcrmrte: lcrmrte - lcrmrte[_n-1]

clprbarr: lprbarr - lprbarr[_n-1]

clprbcon: lprbconv - lprbconv[_n-1]

clprbpri: lprbpri - lprbpri[t-1]

clavgsen: lavgsen - lavgsen[t-1]

clpolpc: lpolpc - lpolpc[t-1]

cltaxpc: ltaxpc - ltaxpc[t-1]

clmix: lmix - lmix[t-1]





In [2]:
Crime = woo.dataWoo('crime4')


In [3]:
Crime.head()

Unnamed: 0,county,year,crmrte,prbarr,prbconv,prbpris,avgsen,polpc,density,taxpc,...,lpctymle,lpctmin,clcrmrte,clprbarr,clprbcon,clprbpri,clavgsen,clpolpc,cltaxpc,clmix
0,1,81,0.039885,0.289696,0.402062,0.472222,5.61,0.001787,2.307159,25.69763,...,-2.43387,3.006608,,,,,,,,
1,1,82,0.038345,0.338111,0.433005,0.506993,5.59,0.001767,2.330254,24.874252,...,-2.449038,3.006608,-0.039376,0.154542,0.074143,0.071048,-0.003571,-0.011364,-0.032565,0.030857
2,1,83,0.030305,0.330449,0.525703,0.479705,5.8,0.001836,2.341801,26.451443,...,-2.464036,3.006608,-0.235316,-0.022922,0.193987,-0.055326,0.036879,0.038413,0.061477,-0.244732
3,1,84,0.034726,0.362525,0.604706,0.520104,6.89,0.001886,2.34642,26.842348,...,-2.478925,3.006608,0.13618,0.092641,0.140006,0.080857,0.172213,0.02693,0.01467,-0.027331
4,1,85,0.036573,0.325395,0.578723,0.497059,6.55,0.001924,2.364896,28.140337,...,-2.497306,3.006608,0.051825,-0.108054,-0.043918,-0.04532,-0.050606,0.020199,0.047223,0.172125


In [4]:
Crime.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 630 entries, 0 to 629
Data columns (total 59 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   county    630 non-null    int64  
 1   year      630 non-null    int64  
 2   crmrte    630 non-null    float64
 3   prbarr    630 non-null    float64
 4   prbconv   630 non-null    float64
 5   prbpris   630 non-null    float64
 6   avgsen    630 non-null    float64
 7   polpc     630 non-null    float64
 8   density   630 non-null    float64
 9   taxpc     630 non-null    float64
 10  west      630 non-null    int64  
 11  central   630 non-null    int64  
 12  urban     630 non-null    int64  
 13  pctmin80  630 non-null    float64
 14  wcon      630 non-null    float64
 15  wtuc      630 non-null    float64
 16  wtrd      630 non-null    float64
 17  wfir      630 non-null    float64
 18  wser      630 non-null    float64
 19  wmfg      630 non-null    float64
 20  wfed      630 non-null    float6

In [5]:
CrimeMultiIndex = Crime.set_index(['county', 'year'], drop=False)


In [6]:
CrimeMultiIndex.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,county,year,crmrte,prbarr,prbconv,prbpris,avgsen,polpc,density,taxpc,...,lpctymle,lpctmin,clcrmrte,clprbarr,clprbcon,clprbpri,clavgsen,clpolpc,cltaxpc,clmix
county,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
1,81,1,81,0.039885,0.289696,0.402062,0.472222,5.61,0.001787,2.307159,25.69763,...,-2.43387,3.006608,,,,,,,,
1,82,1,82,0.038345,0.338111,0.433005,0.506993,5.59,0.001767,2.330254,24.874252,...,-2.449038,3.006608,-0.039376,0.154542,0.074143,0.071048,-0.003571,-0.011364,-0.032565,0.030857
1,83,1,83,0.030305,0.330449,0.525703,0.479705,5.8,0.001836,2.341801,26.451443,...,-2.464036,3.006608,-0.235316,-0.022922,0.193987,-0.055326,0.036879,0.038413,0.061477,-0.244732
1,84,1,84,0.034726,0.362525,0.604706,0.520104,6.89,0.001886,2.34642,26.842348,...,-2.478925,3.006608,0.13618,0.092641,0.140006,0.080857,0.172213,0.02693,0.01467,-0.027331
1,85,1,85,0.036573,0.325395,0.578723,0.497059,6.55,0.001924,2.364896,28.140337,...,-2.497306,3.006608,0.051825,-0.108054,-0.043918,-0.04532,-0.050606,0.020199,0.047223,0.172125
1,86,1,86,0.034752,0.326062,0.512324,0.439863,6.9,0.001895,2.385681,29.74098,...,-2.524721,3.006608,-0.051062,0.002048,-0.121867,-0.122245,0.052056,-0.015258,0.055322,0.042765
1,87,1,87,0.035604,0.29827,0.527596,0.43617,6.71,0.001828,2.422633,30.993681,...,-2.552702,3.006608,0.024198,-0.089089,0.029374,-0.008431,-0.027923,-0.036189,0.041257,-0.193899
3,81,3,81,0.016392,0.202899,0.869048,0.465753,8.45,0.000594,0.976834,14.560878,...,-2.441794,2.068926,,,,,,,,
3,82,3,82,0.019065,0.162218,0.772152,0.377049,5.71,0.000705,0.992278,35.640728,...,-2.447933,2.068926,0.15106,-0.223767,-0.118217,-0.21128,-0.391948,0.170985,0.895151,-0.170775
3,83,3,83,0.015149,0.181586,1.02817,0.438356,8.69,0.000659,1.003861,19.261877,...,-2.454076,2.068926,-0.229912,0.112788,0.286354,0.150656,0.419954,-0.067522,-0.615361,0.231241


In [7]:
# FIRST FE model estimation:
reg1 = plm.PanelOLS.from_formula(
    formula="lcrmrte~d82 + d83 + d84 + d85 + d86 + d87 + lprbarr + lprbconv + lprbpris + lavgsen + lpolpc + lprbarr + lprbconv + lwcon + lwtuc + lwtrd + lwfir + lwser + lwmfg + lwfed + lwsta + lwloc + EntityEffects",
    data=CrimeMultiIndex, drop_absorbed=True)


results1 = reg1.fit()

In [8]:
results1

0,1,2,3
Dep. Variable:,lcrmrte,R-squared:,0.4575
Estimator:,PanelOLS,R-squared (Between):,0.9232
No. Observations:,630,R-squared (Within):,0.4575
Date:,"Tue, Aug 13 2024",R-squared (Overall):,0.9222
Time:,08:07:44,Log-likelihood,418.79
Cov. Estimator:,Unadjusted,,
,,F-statistic:,21.923
Entities:,90,P-value,0.0000
Avg Obs:,7.0000,Distribution:,"F(20,520)"
Min Obs:,7.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
d82,0.0189,0.0251,0.7519,0.4524,-0.0305,0.0682
d83,-0.0553,0.0330,-1.6739,0.0948,-0.1202,0.0096
d84,-0.0615,0.0411,-1.4975,0.1349,-0.1422,0.0192
d85,-0.0397,0.0562,-0.7071,0.4798,-0.1500,0.0706
d86,-0.0001,0.0680,-0.0017,0.9987,-0.1337,0.1335
d87,0.0537,0.0799,0.6722,0.5018,-0.1033,0.2107
lprbarr,-0.3564,0.0322,-11.081,0.0000,-0.4195,-0.2932
lprbconv,-0.2860,0.0211,-13.584,0.0000,-0.3273,-0.2446
lprbpris,-0.1751,0.0323,-5.4154,0.0000,-0.2387,-0.1116


#### We observe, from the previous results, that several factors are not statistically significant. Let's fit another model containing only those variables statistically significant

In [9]:
# SECOND FE model estimation:
reg2 = plm.PanelOLS.from_formula(
    formula="lcrmrte~ lprbconv + lprbpris +  + lpolpc + lprbarr + lprbconv + EntityEffects",
    data=CrimeMultiIndex, drop_absorbed=True)


results2 = reg2.fit()

In [10]:
results2

0,1,2,3
Dep. Variable:,lcrmrte,R-squared:,0.3568
Estimator:,PanelOLS,R-squared (Between):,0.7418
No. Observations:,630,R-squared (Within):,0.3568
Date:,"Tue, Aug 13 2024",R-squared (Overall):,0.7410
Time:,08:07:45,Log-likelihood,365.16
Cov. Estimator:,Unadjusted,,
,,F-statistic:,74.320
Entities:,90,P-value,0.0000
Avg Obs:,7.0000,Distribution:,"F(4,536)"
Min Obs:,7.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
lprbconv,-0.3070,0.0219,-14.043,0.0000,-0.3500,-0.2641
lprbpris,-0.1942,0.0334,-5.8185,0.0000,-0.2598,-0.1286
lpolpc,0.4126,0.0275,15.016,0.0000,0.3586,0.4666
lprbarr,-0.3855,0.0335,-11.520,0.0000,-0.4512,-0.3198


### Model interpretation

#### We have fit a model:

$log(crmrate_{it})=\beta_0+\beta_1*lpolpc_{it}+\beta_2*lprbarr_{it}+\beta_3*lprbconv_{it}+\beta_4*lprbpris_{it}$


#### 1. Based on the F-statistic (836.1, p-value:0.000) the model is "valid"
#### 2. Based on the R-squared, the model explains 35.68% of the variability.

#### 3.A 1% increase in the probability of conviction is predicted to lower the crime rate by about .30% (Bear in mind the log-log relationship).

#### 4. A 1% increase in the probability of arrest is predicted to lower the crime rate by about .38% (Bear in mind the log-log relationship).

#### 5. A 1% increase in the probability of going to prison if convicted is predicted to lower the crime rate by about .19% (Bear in mind the log-log relationship).

#### 6. The coefficient on the police per capita variable is somewhat surprising and is a feature of most studies that seek to explain crime rates. Interpreted causally, it says that a 1% increase in police per capita increases crime rates by about .4%.


### It is hard tobelieve that having more police officers causes more crime. What is going on here?

#### There are at least two possibilities. First, the crime rate variable is calculated from reported crimes. It might be that, when there are additional police, more crimes are reported. Second, the police variable might be endogenous in the equation for other reasons: counties may enlarge the police force when they expect crime rates to increase.

### We would need to further investigate how the data was collected to clarify the unexpected sign of the factor lpolpc. We could also try to explore variations of police per capita accross counties to determine how these variations affect criminality.
