-
Notifications
You must be signed in to change notification settings - Fork 9
/
IV2.Rmd
499 lines (388 loc) · 16.7 KB
/
IV2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
---
title: "Instrumental Variable Estimation 2: Implementation in R"
author: "Instructor: Yuta Toyama"
date: "Last updated: `r Sys.Date()`"
fig_width: 6
fig_height: 4
output:
xaringan::moon_reader:
css: xaringan-themer.css
nature:
highlightStyle: github
highlightLines: yes
countIncrementalSlides: no
ratio: '16:9'
knit: pagedown::chrome_print
---
class: title-slide-section, center, middle
name: logistics
# Introduction
---
## Introduction
- I cover three examples of instrumental variable regressions.
1. Wage regression
2. Demand curve
3. Effects of Voter Turnout (Hansford and Gomez)
---
class: title-slide-section, center, middle
name: logistics
# Wage regression
---
## Example 1: Wage regression
- Use dataset "Mroz", cross-sectional labor force participation data that accompany "Introductory Econometrics" by Wooldridge.
- Original data from *"The Sensitivity of an Empirical Model of Married Women's Hours of Work to Economic and Statistical Assumptions"* by Thomas Mroz published in *Econometrica* in 1987.
- Detailed description of data: https://www.rdocumentation.org/packages/npsf/versions/0.4.2/topics/mroz
```{r, message=FALSE}
library("foreign")
# You do not have to worry about a message "cannot read factor labels from Stata 5 files".
data <- read.dta("data/MROZ.DTA")
```
---
.pull-left-left[
- Describe data
```{r, message=FALSE, eval=FALSE}
library(stargazer)
stargazer(data,
type = "text")
```
]
.pull-right-right[
```{r,message=FALSE, echo=FALSE}
library(stargazer)
stargazer(data,type = "text")
```
]
---
<br/><br/>
- Consider the wage regression
$$\log(w_i) = \beta_0 + \beta_1 educ_i + \beta_2 exper_i + \beta_3 exper_i^2 + \epsilon_i$$
- We assume that $exper_i$ is exogenous but $educ_i$ is endogenous.
- As an instrument for $educ_i$, we use the years of schooling for his or her father and mother, which we call $fathereduc_i$ and $mothereduc_i$.
- Discussion on these IVs will be later.
---
```{r,eval=FALSE}
library("AER")
library("dplyr")
library("texreg")
library("estimatr")
# data cleaning
data %>%
select(lwage, educ, exper, expersq, motheduc, fatheduc) %>%
filter( is.na(lwage) == 0 ) -> data
result_OLS <- lm_robust( lwage ~ educ + exper + expersq, data = data, se_type = "HC1")
# IV regression using fathereduc and mothereduc
result_IV <- iv_robust(lwage ~ educ + exper + expersq |
fatheduc + motheduc + exper + expersq,
data = data, se_type = "HC1")
# Show result
screenreg(l = list(result_OLS, result_IV),digits = 3,
# caption = 'title',
# custom.model.names = c("(I)", "(II)", "(III)", "(IV)", "(V)"),
custom.coef.names = NULL, # add a class, if you want to change the names of variables.
include.ci = F,include.rsquared = FALSE, include.adjrs = TRUE, include.nobs = TRUE,
include.pvalues = FALSE, include.df = FALSE, include.rmse = FALSE,
custom.header = list("lwage" = 1:2), # you can add header especially to indicate dependent variable
stars = numeric(0))
```
---
```{r,echo=FALSE,message=FALSE,warning=FALSE}
library("AER")
library("dplyr")
library("texreg")
library("estimatr")
# data cleaning
data %>%
select(lwage, educ, exper, expersq, motheduc, fatheduc) %>%
filter( is.na(lwage) == 0 ) -> data
result_OLS <- lm_robust( lwage ~ educ + exper + expersq, data = data, se_type = "HC1")
# IV regression using fathereduc and mothereduc
result_IV <- iv_robust(lwage ~ educ + exper + expersq | fatheduc + motheduc + exper + expersq, data = data, se_type = "HC1")
# Show result
screenreg(l = list(result_OLS, result_IV),
digits = 3,
# caption = 'title',
# custom.model.names = c("(I)", "(II)", "(III)", "(IV)", "(V)"),
custom.coef.names = NULL, # add a class, if you want to change the names of variables.
include.ci = F,
include.rsquared = FALSE, include.adjrs = TRUE, include.nobs = TRUE,
include.pvalues = FALSE, include.df = FALSE, include.rmse = FALSE,
custom.header = list("lwage" = 1:2), # you can add header especially to indicate dependent variable
stars = numeric(0)
)
```
---
- How about the first stage? You should always check this!!
```{r}
# First stage regression
result_1st <- lm(educ ~ motheduc + fatheduc + exper + expersq, data = data)
# F test
linearHypothesis(result_1st,
c("fatheduc = 0", "motheduc = 0" ),
vcov = vcovHC, type = "HC1")
```
---
## Discussion on IV
- Labor economists have used family background variables as IVs for education.
- **Relevance**: OK from the first stage regression.
- **Independence**: A bit suspicious. Parents' education would be correlated with child's ability through quality of nurturing at an early age.
- Still, we can see that these IVs can mitigate (though may not eliminate completely) the omitted variable bias.
- Discussion on the validity of instruments is crucial in empirical research.
---
class: title-slide-section, center, middle
name: logistics
# Demand curve
---
## Example 2: Estimation of the Demand for Cigaretts
- Demand model is a building block in many branches of Economics.
- For example, health economics is concerned with the study of how health-affecting behavior of individuals is influenced by the health-care system and regulation policy.
- Smoking is a prominent example as it is related to many illnesses and negative externalities.
- It is plausible that cigarette consumption can be reduced by taxing cigarettes more heavily.
- Question: how much taxes must be increased to reach a certain reduction in cigarette consumption? -> Need to know **price elasticity of demand** for cigaretts.
---
<br/><br/>
- Use `CigarrettesSW` in the package `AER`.
- a panel data set that contains observations on cigarette consumption and several economic indicators for all 48 continental federal states of the U.S. from 1985 to 1995.
- What is **panel data**? The data involves both time series and cross-sectional information.
- The variable is denoted as $y_{it}$, which indexed by individual $i$ and time $t$.
- Cross section data $y_i$: information for a particular individual $i$ (e.g., income for person $i$).
- Time series data $y_t$: information for a particular time period (e.g., GDP in year $y$)
- Panel data $y_{it}$: income of person $i$ in year $t$.
- We will see more on panel data later in this course. For now, we use the panel data as just cross-sectional data (**pooled cross-sections**)
---
```{r}
# load the data set and get an overview
data("CigarettesSW")
summary(CigarettesSW)
```
---
<br/>
- Consider the following model $$\log (Q_{it}) = \beta_0 + \beta_1 \log (P_{it}) + \beta_2 \log(income_{it}) + u_{it}$$ where
- $Q_{it}$ is the number of packs per capita in state $i$ in year $t$,
- $P_{it}$ is the after-tax average real price per pack of cigarettes, and
- $income_{it}$ is the real income per capita. This is demand shifter.
- As an IV for the price, we use the followings:
- $SalesTax_{it}$: the proportion of taxes on cigarettes arising from the general sales tax.
- Relevant as it is included in the after-tax price
- Exogenous(indepndent) since the sales tax does not influence demand directly, but indirectly through the price.
- $CigTax_{it}$: the cigarett-specific taxes
---
```{r}
CigarettesSW %>%
mutate( rincome = (income / population) / cpi) %>%
mutate( rprice = price / cpi ) %>%
mutate( salestax = (taxs - tax) / cpi ) %>%
mutate( cigtax = tax/cpi ) -> Cigdata
```
- Let's run the regressions
```{r,eval=FALSE}
cig_ols <- lm_robust(log(packs) ~ log(rprice) + log(rincome) , data = Cigdata,se_type = "HC1")
#coeftest(cig_ols, vcov = vcovHC, type = "HC1")
cig_ivreg <- iv_robust(log(packs) ~ log(rprice) + log(rincome) |
log(rincome) + salestax + cigtax, data = Cigdata, se_type = "HC1")
#coeftest(cig_ivreg, vcov = vcovHC, type = "HC1")
# Show result
screenreg(l = list(cig_ols, cig_ivreg),digits = 3,
# caption = 'title',
custom.model.names = c("OLS", "IV"),custom.coef.names = NULL, # add a class, if you want to change the names of variables.
include.ci = F,include.rsquared = FALSE, include.adjrs = TRUE, include.nobs = TRUE,
include.pvalues = FALSE, include.df = FALSE, include.rmse = FALSE,
custom.header = list("log(packs)" = 1:2), # you can add header especially to indicate dependent variable
stars = numeric(0)
)
```
---
```{r,echo=FALSE}
cig_ols <- lm_robust(log(packs) ~ log(rprice) + log(rincome) , data = Cigdata,se_type = "HC1")
#coeftest(cig_ols, vcov = vcovHC, type = "HC1")
cig_ivreg <- iv_robust(log(packs) ~ log(rprice) + log(rincome) |
log(rincome) + salestax + cigtax, data = Cigdata, se_type = "HC1")
#coeftest(cig_ivreg, vcov = vcovHC, type = "HC1")
# Show result
screenreg(l = list(cig_ols, cig_ivreg),
digits = 3,
# caption = 'title',
custom.model.names = c("OLS", "IV"),
custom.coef.names = NULL, # add a class, if you want to change the names of variables.
include.ci = F,
include.rsquared = FALSE, include.adjrs = TRUE, include.nobs = TRUE,
include.pvalues = FALSE, include.df = FALSE, include.rmse = FALSE,
custom.header = list("log(packs)" = 1:2), # you can add header especially to indicate dependent variable
stars = numeric(0)
)
```
---
<br/>
- The first stage regression
```{r,eval=FALSE}
# First stage regression
result_1st <- lm(log(rprice) ~ log(rincome) + log(rincome) + salestax + cigtax , data= Cigdata)
# F test
linearHypothesis(result_1st,
c("salestax = 0", "cigtax = 0" ),
vcov = vcovHC, type = "HC1")
```
---
<br/><br/>
```{r, echo=FALSE}
# First stage regression
result_1st <- lm(log(rprice) ~ log(rincome) + log(rincome) + salestax + cigtax , data= Cigdata)
# F test
linearHypothesis(result_1st,
c("salestax = 0", "cigtax = 0" ),
vcov = vcovHC, type = "HC1")
```
---
class: title-slide-section, center, middle
name: logistics
# Voting
---
## Example 3: Effects of Turnout on Partisan Voting
- THOMAS G. HANSFORD and BRAD T. GOMEZ "Estimating the Electoral Effects of Voter Turnout" The American Political Science Review Vol. 104, No. 2 (May 2010), pp. 268-288
- Link: https://www.cambridge.org/core/journals/american-political-science-review/article/estimating-the-electoral-effects-of-voter-turnout/8A880C28E79BE770A5CA1A9BB6CF933C
- Here, we will see a simplified version of their analysis.
- The dataset is [here](HansfordGomez_Data.csv)
---
```{r, eval=FALSE}
library(readr)
HGdata <- read_csv("data/HansfordGomez_Data.csv")
stargazer::stargazer(as.data.frame(HGdata) %>% select(-starts_with("Yr")),type="text")
```
---
```{r, echo=FALSE,message=FALSE,warning=FALSE}
library(readr)
HGdata <- read_csv("data/HansfordGomez_Data.csv")
stargazer::stargazer(as.data.frame(HGdata) %>% select(-starts_with("Yr"),-(dph_StateVAP:State_DVS_lag2)),type="text")
```
---
```{r, echo=FALSE,message=FALSE,warning=FALSE}
stargazer::stargazer(as.data.frame(HGdata) %>% select(dph_StateVAP:State_DVS_lag2),type="text")
```
---
- Data description:
| Name | Description|
|---|-------------------------------------------------------------------|
| Year | Election Year|
| FIPS_County | FIPS County Code|
| Turnout | Turnout as Pcnt VAP|
| Closing2 | Days between registration closing date and election|
| Literacy | Literacy Test|
| PollTax | Poll Tax|
| Motor | Motor Voter|
| GubElection | Gubernatorial Election in State|
| SenElection | U.S. Senate Election in State|
| GOP_Inc | Republican Incumbent|
---
| Name | Description|
|---|-------------------------------------------------------------------|
| Yr52 | 1952 Dummy|
| Yr56 | 1956 Dummy|
| Yr60 | 1960 Dummy|
| Yr64 | 1964 Dummy|
| Yr68 | 1968 Dummy|
| Yr72 | 1972 Dummy|
| Yr76 | 1976 Dummy|
| Yr80 | 1980 Dummy|
| Yr84 | 1984 Dummy|
| Yr88 | 1988 Dummy|
| Yr92 | 1992 Dummy|
| Yr96 | 1996 Dummy|
| Yr2000 | 2000 Dummy|
---
<br/><br/>
| Name | Description|
|---|-------------------------------------------------------------------|
| DNormPrcp_KRIG | Election day rainfall - differenced from normal rain for the day|
| GOPIT | Turnout x Republican Incumbent|
| DemVoteShare2_3MA | Partisan composition measure = 3 election moving avg. of Dem Vote Share|
| DemVoteShare2 | Democratic Pres Candidate's Vote Share|
| RainGOPI | Rainfall measure x Republican Incumbent|
| TO_DVS23MA | Turnout x Partisan Composition measure|
| Rain_DVS23MA | Rainfall measure x Partisan composition measure|
| dph | =1 if home state of Dem pres candidate|
| dvph | =1 if home state of Dem vice pres candidate|
---
<br/><br/>
| Name | Description|
|---|-------------------------------------------------------------------|
| rph | =1 if home state of Rep pres candidate|
| rvph | =1 if home state of Rep vice pres candidate|
| state_del | avg common space score for the House delegation|
| dph_StateVAP | = dph*State voting age population|
| dvph_StateVAP | = dvph*State voting age population|
| rph_StateVAP | = rph*State voting age population|
| rvph_StateVAP | = rvph*State voting age population|
| State_DVS_lag | State-wide Dem vote share, lagged one election|
| State_DVS_lag2 | State_DVS_lag squared|
---
- Consider the following regression $$DemoShare_{it} = \beta_0 + \beta_1 Turnout_{it} + u_t + u_{it}$$ where
- $DemoShare_{it}$ : Two-party vote share for Democrat candidate in county $i$ in the presidential election in year $t$
- $Turnout_{it}$ : Turnout rate in county $i$ in the presidential election in year $t$
- $u_t$ : **Year fixed effects**. Time dummies for each presidential election year
- As an IV, we use the rainfall measure denoted by `DNormPrcp_KRIG`
---
```{r, cache=TRUE,eval=FALSE}
# You can do this, but it is tedious.
hg_ols <- lm_robust( DemVoteShare2 ~ Turnout + Yr52 + Yr56 + Yr60 + Yr64 + Yr68 + Yr72 + Yr76 + Yr80
+ Yr84 + Yr88 + Yr92 + Yr96 + Yr2000, data = HGdata, se_type="HC1")
#coeftest(hg_ols, vcov = vcovHC, type = "HC1")
# By using "factor(Year)" as an explanatory variable, the regression automatically incorporates the dummies for each value.
hg_ols <- lm_robust( DemVoteShare2 ~ Turnout + factor(Year) , data = HGdata, se_type="HC1")
#coeftest(hg_ols, vcov = vcovHC, type = "HC1")
# Iv regression
hg_ivreg <- iv_robust( DemVoteShare2 ~ Turnout + factor(Year) |
factor(Year) + DNormPrcp_KRIG, data = HGdata, se_type="HC1")
#coeftest(hg_ivreg, vcov = vcovHC, type = "HC1")
# Show result
screenreg(l = list(hg_ols, hg_ivreg),
digits = 3,
# caption = 'title',
custom.model.names = c("OLS", "IV"),
custom.coef.names = NULL, # add a class, if you want to change the names of variables.
include.ci = F,
include.rsquared = FALSE, include.adjrs = TRUE, include.nobs = TRUE,
include.pvalues = FALSE, include.df = FALSE, include.rmse = FALSE,
custom.header = list("DemVoteShare2" = 1:2), # you can add header especially to indicate dependent variable
stars = numeric(0)
)
```
---
```{r, cache=TRUE,echo=FALSE}
# By using "factor(Year)" as an explanatory variable, the regression automatically incorporates the dummies for each value.
hg_ols <- lm_robust( DemVoteShare2 ~ Turnout + factor(Year) , data = HGdata, se_type="HC1")
#coeftest(hg_ols, vcov = vcovHC, type = "HC1")
# Iv regression
hg_ivreg <- iv_robust( DemVoteShare2 ~ Turnout + factor(Year) |
factor(Year) + DNormPrcp_KRIG, data = HGdata, se_type="HC1")
#coeftest(hg_ivreg, vcov = vcovHC, type = "HC1")
# Show result
screenreg(l = list(hg_ols, hg_ivreg),
digits = 3,
# caption = 'title',
custom.model.names = c("OLS", "IV"),
custom.coef.names = NULL, # add a class, if you want to change the names of variables.
include.ci = F,
include.rsquared = FALSE, include.adjrs = TRUE, include.nobs = TRUE,
include.pvalues = FALSE, include.df = FALSE, include.rmse = FALSE,
custom.header = list("DemVoteShare2" = 1:2), # you can add header especially to indicate dependent variable
stars = numeric(0),
omit.coef = "factor"
)
```
---
```{r, cache=TRUE,eval=FALSE}
# First stage regression
hg_1st <- lm(Turnout ~ factor(Year) + DNormPrcp_KRIG, data= HGdata)
# F test
linearHypothesis(hg_1st,
c("DNormPrcp_KRIG = 0" ),
vcov = vcovHC, type = "HC1")
```
---
```{r, cache=TRUE,echo=FALSE}
# First stage regression
hg_1st <- lm(Turnout ~ factor(Year) + DNormPrcp_KRIG, data= HGdata)
# F test
linearHypothesis(hg_1st,
c("DNormPrcp_KRIG = 0" ),
vcov = vcovHC, type = "HC1")
```