-
Notifications
You must be signed in to change notification settings - Fork 9
/
RD2_implementation.Rmd
839 lines (671 loc) · 29.8 KB
/
RD2_implementation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
---
title: "Regression Discontinuity 2: Implementation in R"
author: "Instructor: Yuta Toyama"
date: "Last updated: `r Sys.Date()`"
fig_width: 6
fig_height: 4
output:
xaringan::moon_reader:
css: xaringan-themer.css
nature:
highlightStyle: github
highlightLines: yes
countIncrementalSlides: no
ratio: '16:9'
knit: pagedown::chrome_print
---
class: title-slide-section, center, middle
name: logistics
# Close election design
---
## Question and Estimation strategy
- Lee, D.S., Moretti, E., and M. Butler, 2004, Do Voters Affect or Elect Policies? Evidence from
the U.S. House, Quarterly Journal of Economics 119, 807-859.
- Do voters affect policy itself or do they just select politician?
- The roll-call voting record $RC_t$ of the representative in the district following the election t can be written as $$RC_t = (1-D_t)y_t + D_t x_t,$$
- $D_t$: indicator variable for whether the Democrat won election t
- $x_t \; (y_t)$: the policy implemented by the Democrat (the Repulican) at t
---
- Under some conditions, it can be expressed as
\begin{eqnarray}
RC_t &=& constant + \pi_0 P_t^* + \pi_1 D_t +\epsilon_t \quad (1)
\\
RC_{t+1} &=& constant +\pi_0 P_{t+1}^* + \pi_1 D_{t+1} +\epsilon_{t+1} \quad (2)
\end{eqnarray}
- $P_{t}^*$: voters' underlying popularity (the electoral strength) of the Democrat. It is defined as the probability that party D will win if parties D and R are expected to choose their blis points, not moderating points.
---
- **What we try to know** is whether $\pi_0 = 0$ or $\pi_1 = 0$, or neither, meaning what affect representative's roll-call voting, in other words, politician's decision.
- If $\pi_1 = 0$, the roll-call voting of the representative in the district does not vary regarless of who wins (called **Complete Convergence**). That is both parties choose the exactly same policy. The policy position is determined only by the voter's underlying popularity.
- If $\pi_0 = 0$, the roll-call voting of the representative in the district does not affected by voters' underlying popularity (called **Complete Divergence**). This can be interpretted that voters can not affect policy, but merely elect politicians’ fixed policies.
- If else, both parties select different policies, but voters can affect policy (called **Partial Convergence**).
---
- The problem is we cannot estimate equations (1) and (2), because we cannot observe $P_t^*$.
- This brings **two issues** to figure out in order to identify $\pi_0$ and $\pi_1$.
1. Simple comparison of $RC_t$ between $D_t=1$ and $D_t=0$ without controlling on $P_t^*$ leads endogeneity bias, since $P_t^*$ tends to be higher among $D_t=1$. </br> $\Rightarrow$ We need to somehow control $P_t^*$ $\Rightarrow$ **RDD**
- By focusing on close elections (when voteshares of both parties are very tight), we can compare the cases between when $D_t=1$ and $D_t=0$, fixing $P_t^*$ constant. $\Rightarrow$ Being able to identify $\pi_1$.
2. Because $P_t^*$ is directly unobservable, we have to somehow find variation of $P_t^*$ to identify $\pi_0$. </br> $\Rightarrow$ **Incumbency advantage**
- The random assignment of who wins in the first election generates random assignment in which candidate has greater electoral strength for the next election.</br> $\Rightarrow$ This requires **two period analysis.**
---
### Identification
- The conditional expectation of equation (2) is:
\begin{eqnarray}
\lim_{v \downarrow 0.5} E[RC_{t+1}|V_t = v] &= constant +\pi_0 E[P_{t+1}^*|D_t = 1, V_t =0.5] \\ &+ \pi_1 E[D_{t+1}|D_t = 1 , V_t =0.5] \\
&= constant +\pi_0 P_{t+1}^{*D} + \pi_1 P^D_{t+1}
\end{eqnarray}
\begin{eqnarray}
\lim_{v \uparrow 0.5} E[RC_{t+1}|V_t = v] &= constant +\pi_0 E[P_{t+1}^*|D_t = 0, V_t =0.5] \\ &+ \pi_1 E[D_{t+1}|D_t = 0 , V_t =0.5] \\
&= constant +\pi_0 P_{t+1}^{*R} + \pi_1 P^R_{t+1}
\end{eqnarray}
- $V_t$ is voteshare of the Democrat in election t, and threshold is 0.5.
- $P_{t+1}^{*D} \equiv E[P_{t+1}^*|D_t = 1, V_t =0.5],$ $P_{t+1}^{*R} \equiv E[P_{t+1}^*|D_t = 0, V_t = 0.5]$
- $P^D_{t+1}$ ( $P^R_{t+1}$) is equilibrium probability that Democrat wins in election t+1 when Democrat (Republican) won in election t.
---
## Estimation
- When one could randomize $D_t$ by restricting data close to the threshold,
\begin{eqnarray}
\underbrace{E[RC_{t+1}|D_t = 1] - E[RC_{t+1}|D_t = 0]}_{\text{Observable}} &=& \pi_0( P_{t+1}^{*D} - P_{t+1}^{*R}) + \pi_1 (P^D_{t+1} - P^R_{t+1}) \\
&\equiv& \underbrace{\gamma}_{\text{Total effect of initial win on future roll call votes}} \quad (3) \\
\underbrace{ E[RC_{t}|D_t = 1] - E[RC_{t}|D_t = 0]}_{\text{Observable}} &=& \pi_1 \quad (4)\\
\underbrace{ E[D_{t+1}|D_t = 1] - E[D_{t+1}|D_t = 0]}_{\text{Observable}} &=& P^D_{t+1} - P^R_{t+1} \quad (5)\\
\end{eqnarray}
- Therefore, $\pi_0( P_{t+1}^{*D} - P_{t+1}^{*R})$ can be estimated by $\gamma - \pi_1 (P_{t+1}^D - P_{t+1}^R)$
---
## Data
- There are two main data sets in this project.
- The first is a measure of how liberal an official voted, broght from **ADA score** for 1946–1995. ADA varies from 0 to 100 for each representative. Higher scores correspond to a more “liberal” voting record.
- The running variable in this study is the vote share. That is the share of all votes that went to a Democrat across Congressional districts.
- U. S. House elections are held every two years.
---
- **Panel data** (1946–1995 $\times$ all districs around the U.S.)
- Main variables
- `score`: ADA score in Congressional session t of the representative elected at k $(RC_{t})$
- `democrat`: indicator whether the Democrat wins in election t $(D_{t})$
- `lagdemocrat`: indicator whether the Democrat wins in election t-1 $(D_{t-1})$
- `demvoteshare`: voteshare at district k in election t $(V_t)$
- `lagdemvoteshare`: voteshare at district k in the previous election, t-1 $(V_{t-1})$
- For example, one specific row of the dataset has the voteshares and the results of the November 1992 election (period t) and the November 1990 election (period t-1) at district k, and the ADA score of 1993–1994 Congressional session (period t).
---
class: title-slide-section, center, middle
name: logistics
# Graphical Analysis
---
## Graphical Analysis
- Results of the analysis will be seen later. Here, we learn how to implement graphical analysis first.
- **RD analyses hinge on their graphical analyses.**
- always start with visual inspection to check which model (e.g. linear or nonlinear) is plausible.
---
### Outcomes by the running variables
- First, we try to create this figure from the article.
.middle[
.center[
<img src="figure_RD2/fig2b.png" width="500">
]
]
- The dependent variable is probability of Democrat victory in election t+1 and the independent is voteshare in election t.
- Then, we will see what happens when we change bandwidth and functional form.
---
```{r, message=FALSE,warning=FALSE}
library(tidyverse)
library(haven)
library(estimatr)
library(texreg)
library(latex2exp)
# Download data
read_data <- function(df)
{
full_path <- paste("https://raw.github.com/scunning1975/mixtape/master/",
df, sep = "")
df <- read_dta(full_path)
return(df)
}
lmb_data <- read_data("lmb-data.dta")
```
---
- First, you have more than 10,000 data points, so reduce them for scatter plot.
```{r, message=FALSE,warning=FALSE}
#aggregating the data
# calculate mean value for every 0.01 voteshare
demmeans <- split(lmb_data$democrat, cut(lmb_data$lagdemvoteshare, 100)) %>%
lapply(mean) %>%
unlist()
#createing new data frame for plotting
agg_lmb_data <- data.frame(democrat = demmeans, lagdemvoteshare = seq(0.01,1, by = 0.01))
```
---
## Quadratic fitting in all data
```{r,eval = F,warning=FALSE, fig.height=5, fig.width=7.5}
#grouping above or below threshold
lmb_data <- lmb_data %>%
mutate(gg_group = if_else(lagdemvoteshare > 0.5, 1,0))
#plotting
gg_srd = ggplot(data=lmb_data, aes(lagdemvoteshare, democrat)) +
geom_point(aes(x = lagdemvoteshare, y = democrat), data = agg_lmb_data) +
xlim(0,1) + ylim(-0.1,1.1) +
geom_vline(xintercept = 0.5) +
xlab("Democrat Vote Share, time t") +
ylab("Probability of Democrat Win, time t+1") +
scale_y_continuous(breaks=seq(0,1,0.2)) +
ggtitle(TeX("Effect of Initial Win on Winning Next Election: $\\P^D_{t+1} - P^R_{t+1}$"))
gg_srd + stat_smooth(aes(lagdemvoteshare, democrat, group = gg_group),
method = "lm", formula = y ~ x + I(x^2))
```
---
.middle[
.center[
<img src="figure_RD2/q_all.png" width="800">
]
]
---
## Quadratic fitting; limited to +/- 0.05
```{r,eval = F,warning=FALSE, fig.height=5, fig.width=7.5}
gg_srd + stat_smooth(data=lmb_data %>% filter(lagdemvoteshare>.45 & lagdemvoteshare<.55),
aes(lagdemvoteshare, democrat, group = gg_group),
method = "lm", formula = y ~ x + I(x^2))
```
.middle[
.center[
<img src="figure_RD2/q_l.png" width="500">
]
]
- Notice that confidence interval widens. But, lines fit plots better.
---
## Linear different slops
```{r,eval = F,warning=FALSE, fig.height=5, fig.width=7.5}
gg_srd + stat_smooth(aes(lagdemvoteshare, democrat, group = gg_group), method = "lm")
```
.middle[
.center[
<img src="figure_RD2/l_d.png" width="600">
]
]
---
## Linear common slop
```{r,eval = F,warning=FALSE, fig.height=5, fig.width=7.5}
gg_srd + stat_smooth(data=lmb_data, aes(lagdemvoteshare, democrat),
method = "lm", formula = y ~ x + I(x > 0.5))
```
- Alternatively, this can avoid showing line across the threshold.
```{r,eval = F,warning=FALSE, fig.height=5, fig.width=7.5}
lm_tmp <- lm(democrat ~ lagdemvoteshare + I(lagdemvoteshare>0.5), data = lmb_data)
lm_fun <- function(x) predict(lm_tmp, data.frame(lagdemvoteshare = x)) #output is predicted democrat
gg_srd +
stat_function(
data = data.frame(x = c(0, 1),y = c(0, 1)),aes(x = x,y=y),
fun = lm_fun,xlim = c(0,0.499),
col="blue",size = 1.5) +
stat_function(
data = data.frame(x = c(0, 1),y = c(0, 1)),aes(x = x,y=y),
fun = lm_fun,xlim = c(0.501,1),
col="blue", size = 1.5 )
```
---
.pull-left[
.middle[
.center[
<img src="figure_RD2/l_c1.png" width="500">
]
]
]
.pull-right[
.middle[
.center[
<img src="figure_RD2/l_c2.png" width="500">
]
]
]
---
## Loess fitting
```{r,eval = F,warning=FALSE, fig.height=5, fig.width=7.5}
gg_srd + stat_smooth(aes(lagdemvoteshare, democrat, group = gg_group), method = "loess")
```
.middle[
.center[
<img src="figure_RD2/loess.png" width="500">
]
]
- Compared to the quadratic case, variance gets bigger but the prediction fits the points better.
---
## Kernel-weighted local polynomial regressions
```{r,eval = F,warning=FALSE, fig.height=5, fig.width=7.5}
library(stats)
smooth_dem0 <- lmb_data %>%
filter(lagdemvoteshare < 0.5) %>%
dplyr::select(democrat, lagdemvoteshare) %>%
na.omit()
smooth_dem0 <- as_tibble(ksmooth(smooth_dem0$lagdemvoteshare, smooth_dem0$democrat,
kernel = "box", bandwidth = 0.1))
smooth_dem1 <- lmb_data %>%
filter(lagdemvoteshare >= 0.5) %>%
dplyr::select(democrat, lagdemvoteshare) %>%
na.omit()
smooth_dem1 <- as_tibble(ksmooth(smooth_dem1$lagdemvoteshare, smooth_dem1$democrat,
kernel = "box", bandwidth = 0.1))
gg_srd +
geom_smooth(aes(x, y), data = smooth_dem0) +
geom_smooth(aes(x, y), data = smooth_dem1)
```
---
.middle[
.center[
<img src="figure_RD2/kernel.png" width="600">
]
]
---
## Model and Bandwidth selection - bias-variance tradeoff
- How should we pick the “right” model and bandwidth?
- There’s always a **trade-off between bias and variance** when choosing bandwidth and polynomial length.
- Bias: distance between your prediction and true value
- Variance: width of your prediction
- The shorter the window and the more flexible (e.g. higher-order polynomials) the model, the lower the bias, but because you have less data, the variance in your estimate increases.
- Always, it's important to show robustness.
---
- Model selection
- Higher-order polynomials can lead to overfitting (Gelman and Imbens 2019). They recommend using local linear regressions with linear and quadratic forms only.
- Local linear regression with a kernel smoother is a popular choice
- Bandwidth selection:
- Optimal bandwidth selection: Imbens and Kalyanaraman (2011), Calonico, Cattaneo, and Titiunik (2014) **implimentation will be at the last slide**
- Cross validation: Imbens and Lemieux (2008)
---
class: title-slide-section, center, middle
name: logistics
# Quantitative analysis
---
## Quantitative analysis
- Our next goal is to replicate the quantitaive results of Lee, Moretti, and Butler (2004) in the table below.
| | $\gamma$ | $\pi_1$ | $P_{t+1}^D - P_{t+1}^R$ | $\pi_1(P_{t+1}^D - P_{t+1}^R)$ | $\pi_0(P_{t+1}^{*D} - P_{t+1}^{*R})$ |
|----------------------|---------------|--------------|-------------|--------------|-------------|
| Variable | $ADA_{t+1}$ | $ADA_{t}$ | $DEM_{t+1}$ | | |
| Estimated gap | 21.2 (1.9) | 47.6 (1.3)| 0.48 (0.02)| | |
| | | | | 22.84 (2.2)| -1.64 (2.0) |
- The analysis restrics only observations where the Democrat voteshare is between 48 percent and 52 percent, so that the number of observations is 915.
- From the second column, complete convergence is rejected.
- The last column of the statistical insignificance shows that voters primarily elect policies rather than affect policies.
- **Complete divergence** is supported by this analysis.
---
```{r, message=FALSE,warning=FALSE, eval=F}
# Restrict data containg the Democrat vote share between 48 percent and 52 percent
# `lagdemvoteshare` is the Dem. voteshare of the t-1 period
lmb_subset <- lmb_data %>%
filter(lagdemvoteshare>.48 & lagdemvoteshare<.52)
# E[ADA_{t+1}|D_t] = \gamma
lm_1 <- lm_robust(score ~ lagdemocrat, data = lmb_subset, se_type = "HC1")
# E[ADA_{t}|D_t] = \pi_1
lm_2 <- lm_robust(score ~ democrat, data = lmb_subset, se_type = "HC1")
# E[D_{t+1}|D_t] = P_{t+1}^D - P_{t+1}^R
lm_3 <- lm_robust(democrat ~ lagdemocrat, data = lmb_subset, se_type = "HC1")
screenreg(l = list(lm_1, lm_2,lm_3),
digits = 2,
# caption = 'title',
custom.model.names = c("ADA_t+1", "ADA_t", "DEM_t+1"),
include.ci = F,
include.rsquared = FALSE, include.adjrs = FALSE, include.nobs = T,
include.pvalues = FALSE, include.df = FALSE, include.rmse = FALSE,
custom.coef.map = list("lagdemocrat"="lagdemocrat","democrat"="democrat"),
# select coefficients to report
stars = numeric(0))
```
---
```{r, message=FALSE,warning=FALSE, echo=F}
# Restrict data containg the Democrat vote share between 48 percent and 52 percent
# `lagdemvoteshare` is the Dem. voteshare of the t-1 period
lmb_subset <- lmb_data %>%
filter(lagdemvoteshare>.48 & lagdemvoteshare<.52)
# E[ADA_{t+1}|D_t] = \gamma
lm_1 <- lm_robust(score ~ lagdemocrat, data = lmb_subset, se_type = "HC1")
# E[ADA_{t}|D_t] = \pi_1
lm_2 <- lm_robust(score ~ democrat, data = lmb_subset, se_type = "HC1")
# E[D_{t+1}|D_t] = P_{t+1}^D - P_{t+1}^R
lm_3 <- lm_robust(democrat ~ lagdemocrat, data = lmb_subset, se_type = "HC1")
screenreg(l = list(lm_1, lm_2,lm_3),
digits = 2,
# caption = 'title',
custom.model.names = c("ADA_t+1", "ADA_t", "DEM_t+1"),
include.ci = F,
include.rsquared = FALSE, include.adjrs = FALSE, include.nobs = T,
include.pvalues = FALSE, include.df = FALSE, include.rmse = FALSE,
custom.coef.map = list("lagdemocrat"="lagdemocrat","democrat"="democrat"),
# select coefficients to report
stars = numeric(0))
```
- The results are slightly different. But ignore that for now.
- From now on, we will see how the results depend on **bandwidth** and **fanctional form**.
---
## Same specification in all the data
```{r, message=FALSE, eval=F}
#using all data (note data used is lmb_data, not lmb_subset)
lm_1 <- lm_robust(score ~ lagdemocrat, data = lmb_data, se_type = "HC1")
lm_2 <- lm_robust(score ~ democrat, data = lmb_data, se_type = "HC1")
lm_3 <- lm_robust(democrat ~ lagdemocrat, data = lmb_data, se_type = "HC1")
screenreg(l = list(lm_1, lm_2,lm_3),
digits = 2,
# caption = 'title',
custom.model.names = c("ADA_t+1", "ADA_t", "DEM_t+1"),
include.ci = F,
include.rsquared = FALSE, include.adjrs = FALSE, include.nobs = T,
include.pvalues = FALSE, include.df = FALSE, include.rmse = FALSE,
custom.coef.map = list("lagdemocrat"="lagdemocrat","democrat"="democrat"),
# select coefficients to report
stars = numeric(0))
```
---
```{r, message=FALSE, echo=F}
#using all data (note data used is lmb_data, not lmb_subset)
lm_1 <- lm_robust(score ~ lagdemocrat, data = lmb_data, se_type = "HC1")
lm_2 <- lm_robust(score ~ democrat, data = lmb_data, se_type = "HC1")
lm_3 <- lm_robust(democrat ~ lagdemocrat, data = lmb_data, se_type = "HC1")
screenreg(l = list(lm_1, lm_2,lm_3),
digits = 2,
# caption = 'title',
custom.model.names = c("ADA_t+1", "ADA_t", "DEM_t+1"),
include.ci = F,
include.rsquared = FALSE, include.adjrs = FALSE, include.nobs = T,
include.pvalues = FALSE, include.df = FALSE, include.rmse = FALSE,
custom.coef.map = list("lagdemocrat"="lagdemocrat","democrat"="democrat"),
# select coefficients to report
stars = numeric(0))
```
- Here we see that simply running the regression yields different estimates when we include data far from the cutoff itself.
---
### Controls for the running variable & Recentering of the running variable
- We will simply subtract 0.5 from the running variable.
```{r, message=FALSE, eval=FALSE}
# Recentering
lmb_data <- lmb_data %>%
mutate(demvoteshare_c = demvoteshare - 0.5)
lm_1 <- lm_robust(score ~ lagdemocrat + demvoteshare_c, data = lmb_data, se_type = "HC1")
lm_2 <- lm_robust(score ~ democrat + demvoteshare_c, data = lmb_data, se_type = "HC1")
lm_3 <- lm_robust(democrat ~ lagdemocrat + demvoteshare_c, data = lmb_data, se_type = "HC1")
screenreg(l = list(lm_1, lm_2,lm_3),
digits = 2,
# caption = 'title',
custom.model.names = c("ADA_t+1", "ADA_t", "DEM_t+1"),
include.ci = F,
include.rsquared = FALSE, include.adjrs = FALSE, include.nobs = T,
include.pvalues = FALSE, include.df = FALSE, include.rmse = FALSE,
custom.coef.map = list("lagdemocrat"="lagdemocrat","democrat"="democrat"),
# select coefficients to report
stars = numeric(0))
```
---
```{r, message=FALSE, echo=FALSE}
# Recentering
lmb_data <- lmb_data %>%
mutate(demvoteshare_c = demvoteshare - 0.5)
lm_1 <- lm_robust(score ~ lagdemocrat + demvoteshare_c, data = lmb_data, se_type = "HC1")
lm_2 <- lm_robust(score ~ democrat + demvoteshare_c, data = lmb_data, se_type = "HC1")
lm_3 <- lm_robust(democrat ~ lagdemocrat + demvoteshare_c, data = lmb_data, se_type = "HC1")
screenreg(l = list(lm_1, lm_2,lm_3),
digits = 2,
# caption = 'title',
custom.model.names = c("ADA_t+1", "ADA_t", "DEM_t+1"),
include.ci = F,
include.rsquared = FALSE, include.adjrs = FALSE, include.nobs = T,
include.pvalues = FALSE, include.df = FALSE, include.rmse = FALSE,
custom.coef.map = list("lagdemocrat"="lagdemocrat","democrat"="democrat"),
# select coefficients to report
stars = numeric(0))
```
---
## Different slopes on either side of the discontinuity
- How to impliment a regression line to be on either side, which means necessarily that we have two lines left and right of the discontinuity? $\Rightarrow$ **Interaction**
```{r, message=FALSE, eval=FALSE}
lm_1 <- lm_robust(score ~ lagdemocrat*demvoteshare_c,
data = lmb_data, se_type = "HC1")
lm_2 <- lm_robust(score ~ democrat*demvoteshare_c,
data = lmb_data, se_type = "HC1")
lm_3 <- lm_robust(democrat ~ lagdemocrat*demvoteshare_c,
data = lmb_data, se_type = "HC1")
screenreg(l = list(lm_1, lm_2,lm_3),
digits = 2,
# caption = 'title',
custom.model.names = c("ADA_t+1", "ADA_t", "DEM_t+1"),
include.ci = F,
include.rsquared = FALSE, include.adjrs = FALSE, include.nobs = T,
include.pvalues = FALSE, include.df = FALSE, include.rmse = FALSE,
custom.coef.map = list("lagdemocrat"="lagdemocrat","democrat"="democrat"),
# select coefficients to report
stars = numeric(0))
```
---
```{r, message=FALSE, echo=FALSE}
lm_1 <- lm_robust(score ~ lagdemocrat*demvoteshare_c,
data = lmb_data, se_type = "HC1")
lm_2 <- lm_robust(score ~ democrat*demvoteshare_c,
data = lmb_data, se_type = "HC1")
lm_3 <- lm_robust(democrat ~ lagdemocrat*demvoteshare_c,
data = lmb_data, se_type = "HC1")
screenreg(l = list(lm_1, lm_2,lm_3),
digits = 2,
# caption = 'title',
custom.model.names = c("ADA_t+1", "ADA_t", "DEM_t+1"),
include.ci = F,
include.rsquared = FALSE, include.adjrs = FALSE, include.nobs = T,
include.pvalues = FALSE, include.df = FALSE, include.rmse = FALSE,
custom.coef.map = list("lagdemocrat"="lagdemocrat","democrat"="democrat"),
# select coefficients to report
stars = numeric(0))
```
---
## Different quadratic regressions in all data
```{r, message=FALSE, eval=FALSE}
lmb_data <- lmb_data %>%
mutate(demvoteshare_sq = demvoteshare_c^2)
lm_1 <- lm_robust(score ~ lagdemocrat*demvoteshare_c + lagdemocrat*demvoteshare_sq,
data = lmb_data, se_type = "HC1")
lm_2 <- lm_robust(score ~ democrat*demvoteshare_c + democrat*demvoteshare_sq,
data = lmb_data, se_type = "HC1")
lm_3 <- lm_robust(democrat ~ lagdemocrat*demvoteshare_c + lagdemocrat*demvoteshare_sq,
data = lmb_data, se_type = "HC1")
screenreg(l = list(lm_1, lm_2,lm_3),
digits = 2,
# caption = 'title',
custom.model.names = c("ADA_t+1", "ADA_t", "DEM_t+1"),
include.ci = F,
include.rsquared = FALSE, include.adjrs = FALSE, include.nobs = T,
include.pvalues = FALSE, include.df = FALSE, include.rmse = FALSE,
custom.coef.map = list("lagdemocrat"="lagdemocrat","democrat"="democrat"),
# select coefficients to report
stars = numeric(0))
```
---
```{r, message=FALSE, echo=FALSE}
lmb_data <- lmb_data %>%
mutate(demvoteshare_sq = demvoteshare_c^2)
lm_1 <- lm_robust(score ~ lagdemocrat*demvoteshare_c + lagdemocrat*demvoteshare_sq,
data = lmb_data, se_type = "HC1")
lm_2 <- lm_robust(score ~ democrat*demvoteshare_c + democrat*demvoteshare_sq,
data = lmb_data, se_type = "HC1")
lm_3 <- lm_robust(democrat ~ lagdemocrat*demvoteshare_c + lagdemocrat*demvoteshare_sq,
data = lmb_data, se_type = "HC1")
screenreg(l = list(lm_1, lm_2,lm_3),
digits = 2,
# caption = 'title',
custom.model.names = c("ADA_t+1", "ADA_t", "DEM_t+1"),
include.ci = F,
include.rsquared = FALSE, include.adjrs = FALSE, include.nobs = T,
include.pvalues = FALSE, include.df = FALSE, include.rmse = FALSE,
custom.coef.map = list("lagdemocrat"="lagdemocrat","democrat"="democrat"),
# select coefficients to report
stars = numeric(0))
```
- The larger standard error due to the longer polynomial term.
---
## Different quadratic regression; limited to +/- 0.05
```{r, message=FALSE, eval=FALSE}
lmb_subset <- lmb_data %>%
filter(demvoteshare > .45 & demvoteshare < .55) %>%
mutate(demvoteshare_sq = demvoteshare_c^2)
lm_1 <- lm_robust(score ~ lagdemocrat*demvoteshare_c + lagdemocrat*demvoteshare_sq,
data = lmb_subset, se_type = "HC1")
lm_2 <- lm_robust(score ~ democrat*demvoteshare_c + democrat*demvoteshare_sq,
data = lmb_subset, se_type = "HC1")
lm_3 <- lm_robust(democrat ~ lagdemocrat*demvoteshare_c + lagdemocrat*demvoteshare_sq,
data = lmb_subset, se_type = "HC1")
screenreg(l = list(lm_1, lm_2,lm_3),
digits = 2,
# caption = 'title',
custom.model.names = c("ADA_t+1", "ADA_t", "DEM_t+1"),
include.ci = F,
include.rsquared = FALSE, include.adjrs = FALSE, include.nobs = T,
include.pvalues = FALSE, include.df = FALSE, include.rmse = FALSE,
custom.coef.map = list("lagdemocrat"="lagdemocrat","democrat"="democrat"),
# select coefficients to report
stars = numeric(0))
```
---
```{r, message=FALSE, echo=FALSE}
lmb_subset <- lmb_data %>%
filter(demvoteshare > .45 & demvoteshare < .55) %>%
mutate(demvoteshare_sq = demvoteshare_c^2)
lm_1 <- lm_robust(score ~ lagdemocrat*demvoteshare_c + lagdemocrat*demvoteshare_sq,
data = lmb_subset, se_type = "HC1")
lm_2 <- lm_robust(score ~ democrat*demvoteshare_c + democrat*demvoteshare_sq,
data = lmb_subset, se_type = "HC1")
lm_3 <- lm_robust(democrat ~ lagdemocrat*demvoteshare_c + lagdemocrat*demvoteshare_sq,
data = lmb_subset, se_type = "HC1")
screenreg(l = list(lm_1, lm_2,lm_3),
digits = 2,
# caption = 'title',
custom.model.names = c("ADA_t+1", "ADA_t", "DEM_t+1"),
include.ci = F,
include.rsquared = FALSE, include.adjrs = FALSE, include.nobs = T,
include.pvalues = FALSE, include.df = FALSE, include.rmse = FALSE,
custom.coef.map = list("lagdemocrat"="lagdemocrat","democrat"="democrat"),
# select coefficients to report
stars = numeric(0))
```
---
## Optimal bandwidth by `rdrobust`
- The method of optimal bandwidth selection (Calonico, Cattaneo, and Titiunik 2014) can be implemented with the user-created `rdrobust` command.
- These methods ultimately choose optimal bandwidths that may differ left and right of the cutoff based on some bias-variance trade-off.
```{r, message=FALSE,eval=FALSE}
# install.packages("rdrobust")
library(rdrobust)
rdr <- rdrobust(y = lmb_data$score,
x = lmb_data$demvoteshare, c = 0.5)
summary(rdr)
```
---
```{r, message=FALSE,echo=FALSE,warning=FALSE}
# install.packages("rdrobust")
library(rdrobust)
rdr <- rdrobust(y = lmb_data$score,
x = lmb_data$demvoteshare, c = 0.5)
summary(rdr)
```
---
class: title-slide-section, center, middle
name: logistics
# Covariate test
---
## Covariates by the running variables
- We use income (`realincome`) as covariates.
- We limit window of voteshare from 0.25 to 0.75.
```{r,eval=FALSE}
#aggregating the data
lmb_subset = lmb_data %>%
dplyr::select(realincome,demvoteshare) %>%
filter(demvoteshare>.25 & demvoteshare<.75) %>%
na.omit()
#calculate mean value for every 0.01 voteshare
demmeans <- split(lmb_subset$realincome, cut(lmb_subset$demvoteshare, 50)) %>%
lapply(mean) %>%
unlist()
#createing new data frame for plotting
agg_lmb_data <- data.frame(income = demmeans, demvoteshare = seq(0.26, 0.75, by = 0.01))
```
---
## Covariate test for income
```{r,eval=FALSE}
#grouping above or below threshold
lmb_subset <- lmb_subset %>%
mutate(gg_group = if_else(demvoteshare > 0.5, 1,0))
#plotting
ggplot(data=lmb_subset, aes(demvoteshare, realincome)) +
geom_point(aes(x = demvoteshare, y = income), data = agg_lmb_data) +
geom_vline(xintercept = 0.5) +
stat_smooth( aes(demvoteshare, realincome, group = gg_group), method = "lm", formula = y ~ x + I(x^2)) + ggtitle("voteshare and income")
```
---
.middle[
.center[
<img src="figure_RD2/c_test_income.png" width="600">
]
]
---
.middle[
.center[
<img src="figure_RD2/c_test.png" width="600">
]
]
- The authors also did covariate tests with other variables such as percentage with high-school degree (`pcthighschl`), percentage black (`pctblack`), percentage eligible to vote (`votingpop/totpop`).
---
.middle[
.center[
<img src="figure_RD2/placebo.png" width="600">
]
]
- t-1 period's outcome is also often used as a placebo.
---
### Coding of placebo
```{r,eval = F,warning=FALSE, fig.height=5, fig.width=7.5,message=FALSE,warning=FALSE}
#aggregating the data
# calculate mean value for every 0.01 voteshare
demmeans <- split(lmb_data$lagdemvoteshare, cut(lmb_data$demvoteshare, 100)) %>%
lapply(mean) %>%
unlist()
#createing new data frame for plotting
agg_lmb_data <- data.frame(lagdemvoteshare=demmeans, demvoteshare = seq(0.01,1, by = 0.01))
#grouping above or below threshold
lmb_data <- lmb_data %>%
mutate(gg_group = if_else(demvoteshare > 0.5, 1,0))
#plotting
ggplot(data=lmb_data, aes(demvoteshare, lagdemvoteshare)) +
geom_point(aes(x = demvoteshare, y = lagdemvoteshare), data = agg_lmb_data) +
xlim(0,1) + ylim(-0.1,1.1) +
geom_vline(xintercept = 0.5) +
xlab("Democrat Vote Share, time t") +
ylab("Democrat Vote Share, time t-1") +
scale_y_continuous(breaks=seq(0,1,0.2)) +
ggtitle(TeX("Democratic Party Vote Share in Election t-1, by Democratic Party Vote Share in Election t$")) + stat_smooth(data=lmb_data,aes(x=demvoteshare, y=lagdemvoteshare, group = gg_group),
method = "lm", formula = y ~ x + I(x^2))
```
---
.middle[
.center[
<img src="figure_RD2/c_test_code.png" width="700">
]
]
---
class: title-slide-section, center, middle
name: logistics
# Density test
---
## Density of the running variables
- McCrary density test
- We will implement this test using local polynomial density estimation (Cattaneo, Jansson, and Ma 2019).
```{r,eval = F,warning=FALSE, fig.height=6, fig.width=9}
# install.packages("rddensity")
# install.packages("rdd")
library(rddensity)
library(rdd)
DCdensity(lmb_data$demvoteshare, cutpoint = 0.5)
density <- rddensity(lmb_data$demvoteshare, c = 0.5)
rdplotdensity(density, lmb_data$demvoteshare)
```
---
.pull-left[
.middle[
.center[
<img src="figure_RD2/d_test1.png" width="500">
]
]
]
.pull-right[
.middle[
.center[
<img src="figure_RD2/d_test2.png" width="500">
]
]
]
- No signs that there was manipulation in the running variable at the cutoff.