# Policy Analysis with Pooled Cross Sections

### Intro and objectives


### In this lab you will learn:
1. examples of policy analysis using Pooled Cross Sections
2. how to fit pooled cross sectional models in Python


## What I hope you'll get out of this lab
* The feeling that you'll "know where to start" when you need to evaluate the effect of policies or changes in a population
* Worked Examples
* How to interpret the results obtained

In [1]:
!pip install wooldridge
import wooldridge as woo
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

Collecting wooldridge
  Downloading wooldridge-0.4.5-py3-none-any.whl.metadata (2.6 kB)
Downloading wooldridge-0.4.5-py3-none-any.whl (5.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.1/5.1 MB[0m [31m23.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: wooldridge
Successfully installed wooldridge-0.4.5


# Example. Effect of a Garbage Incinerator’s Location on Housing Prices


#### The rumor that a new incinerator would be built in North Andover began after 1978, and construction began in 1981. The incinerator was expected to be in operation soon after the start of construction; the incinerator actually began operating in 1985.
####  The hypothesis is that the price of houses located near the incinerator would fall relative to the price of more distant houses.

#### We define a house to be near the incinerator if it is within three miles. Let rprice denote the house price in real terms.
#### We will use data on prices of houses that sold in 1978 and another sample on those that sold in 1981.

#### In this case we fit a pooled cross section-based model regressing house price in terms of its vicinity to the incinerator.


$ rprice=\beta_0+δ_o*year_{81}+\beta_1*nearinc+δ_1*year_{81}*nearinc+u $


#### The intercept, $\beta_0$, is the average price of a home not near the incinerator in 1978. The parameter $δ_o$ captures changes in all housing values in North Andover from 1978 to 1981. The coefficient on nearinc, $\beta_1$, measures the location effect that is not due to the presence of the incinerator.

#### The parameter of interest is on the interaction term y81·nearinc: $δ_1$ measures the decline in housing values due to the new incinerator, provided we assume that houses both near and far from the site did not appreciate at different rates for other reasons.



In [2]:
kielmc = woo.dataWoo('kielmc')

In [3]:
kielmc.head()

Unnamed: 0,year,age,agesq,nbh,cbd,intst,lintst,price,rooms,area,...,lprice,y81,larea,lland,y81ldist,lintstsq,nearinc,y81nrinc,rprice,lrprice
0,1978,48,2304.0,4,3000.0,1000.0,6.9078,60000.0,7,1660,...,11.0021,0,7.414573,8.429017,0.0,47.717705,1,0,60000.0,11.0021
1,1978,83,6889.0,4,4000.0,1000.0,6.9078,40000.0,6,2612,...,10.596635,0,7.867871,9.032409,0.0,47.717705,1,0,40000.0,10.596635
2,1978,58,3364.0,4,4000.0,1000.0,6.9078,34000.0,6,1144,...,10.434115,0,7.042286,8.517193,0.0,47.717705,1,0,34000.0,10.434115
3,1978,11,121.0,4,4000.0,1000.0,6.9078,63900.0,5,1136,...,11.065075,0,7.035269,9.21034,0.0,47.717705,1,0,63900.0,11.065075
4,1978,48,2304.0,4,4000.0,2000.0,7.6009,44000.0,5,1868,...,10.691945,0,7.532624,9.21034,0.0,57.773682,1,0,44000.0,10.691945


In [4]:
type(kielmc)

In [5]:
kielmc.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 321 entries, 0 to 320
Data columns (total 25 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   year      321 non-null    int64  
 1   age       321 non-null    int64  
 2   agesq     321 non-null    float64
 3   nbh       321 non-null    int64  
 4   cbd       321 non-null    float64
 5   intst     321 non-null    float64
 6   lintst    321 non-null    float64
 7   price     321 non-null    float64
 8   rooms     321 non-null    int64  
 9   area      321 non-null    int64  
 10  land      321 non-null    float64
 11  baths     321 non-null    int64  
 12  dist      321 non-null    float64
 13  ldist     321 non-null    float64
 14  wind      321 non-null    int64  
 15  lprice    321 non-null    float64
 16  y81       321 non-null    int64  
 17  larea     321 non-null    float64
 18  lland     321 non-null    float64
 19  y81ldist  321 non-null    float64
 20  lintstsq  321 non-null    float6

In [6]:
# joint regression including an interaction term:
reg_joint = smf.ols(formula='rprice ~ nearinc * C(year)', data=kielmc)
results_joint = reg_joint.fit()

In [7]:
table_joint = pd.DataFrame({'b': round(results_joint.params, 4),
                            'se': round(results_joint.bse, 4),
                            't': round(results_joint.tvalues, 4),
                            'pval': round(results_joint.pvalues, 4)})
print(f'table_joint: \n{table_joint}\n')

table_joint: 
                                  b         se        t    pval
Intercept                82517.2276  2726.9101  30.2603  0.0000
C(year)[T.1981]          18790.2860  4050.0650   4.6395  0.0000
nearinc                 -18824.3705  4875.3221  -3.8612  0.0001
nearinc:C(year)[T.1981] -11863.9033  7456.6462  -1.5911  0.1126



## Based on the previous we have fitted the following model:




$ rprice=82517.22+18790.28*year_{81}-18824.37*nearinc-11863.90*year_{81}*nearinc$

## How do we interpret the equation?

#### Based on the fitted model, we conclude:

####1. House prices increased around 18790 dollars from 1978 to 1981

####2. House prices near the incinerator are overall 18824.37 dollars cheaper (perhaps due to lower quality, size, services,etc.)

#### 3. The specific effect of having an incinerator in the vicinity is -11863.90 dollars.

### We observed a strong and statistically significant price difference between houses close to the incinerator and those apart.

### Let's explore how house characteristics change accross location

In [8]:
kielmc.groupby('nearinc').mean()

Unnamed: 0_level_0,year,age,agesq,nbh,cbd,intst,lintst,price,rooms,area,...,wind,lprice,y81,larea,lland,y81ldist,lintstsq,y81nrinc,rprice,lrprice
nearinc,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,1979.36,10.822222,969.524444,2.031111,20244.444444,20675.555556,9.870139,104905.164444,6.8,2200.124444,...,7.746667,11.492595,0.453333,7.655328,10.621965,4.557294,97.565032,0.0,91035.490608,11.37296
1,1979.25,34.854167,2347.291667,2.625,5458.333333,6520.833333,8.567328,75465.104167,6.083333,1887.833333,...,5.177083,11.109812,0.416667,7.461071,9.551612,3.839339,73.881949,0.416667,66578.849935,10.999853


In [9]:
kielmc[['nearinc','baths','area','land','age','rooms']].groupby('nearinc').mean()

Unnamed: 0_level_0,baths,area,land,age,rooms
nearinc,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,2.551111,2200.124444,46984.902222,10.822222,6.8
1,1.84375,1887.833333,22391.583333,34.854167,6.083333


#### We observe that houses near the incinerator (nearinc==1) tend to have inferior characteristics in terms of baths, area, land, age and rooms

## Let's add some variables to the model
### We fit and enhanced model, adding some additional variables


In [10]:
reg_joint2 = smf.ols(formula='np.log(rprice) ~ nearinc*C(year) + age +'
                           'I(age**2) + np.log(intst) + np.log(land) +'
                           'np.log(area) + rooms + baths',
                   data=kielmc)

In [11]:
results2 = reg_joint2.fit()

# print regression table:
table_didC = pd.DataFrame({'b': round(results2.params, 4),
                           'se': round(results2.bse, 4),
                           't': round(results2.tvalues, 4),
                           'pval': round(results2.pvalues, 4)})
print(f'table_didC: \n{table_didC}\n')

table_didC: 
                              b      se        t    pval
Intercept                7.6517  0.4159  18.3986  0.0000
C(year)[T.1981]          0.1621  0.0285   5.6868  0.0000
nearinc                  0.0322  0.0475   0.6789  0.4977
nearinc:C(year)[T.1981] -0.1315  0.0520  -2.5305  0.0119
age                     -0.0084  0.0014  -5.9236  0.0000
I(age ** 2)              0.0000  0.0000   4.3415  0.0000
np.log(intst)           -0.0614  0.0315  -1.9500  0.0521
np.log(land)             0.0998  0.0245   4.0766  0.0001
np.log(area)             0.3508  0.0515   6.8129  0.0000
rooms                    0.0473  0.0173   2.7317  0.0067
baths                    0.0943  0.0277   3.4003  0.0008



## How do we interpret the equation?

#### Based on the fitted model, we conclude:

####1. House prices increased around 16.2% from 1978 to 1981

####2. The specific effect of having an incinerator in the vicinity is a reduction in house prices by 13.15%

#### 3. Factors such as age, land, area, rooms and baths are strong and statistically significant predictors of house price