# Chapter 15. Instrumental Variables Estimation and Two Stage Least Squares

In this chapter, we further study the problem of endogenous explanatory variables in multiple regression models. In Chapter 3, we derived the bias in the OLS estimators when an important variable is omitted; in Chapter 5, we showed that OLS is generally inconsistent under omitted variables . Chapter 9 demonstrated that omitted variables bias can be eliminated (or at least mitigated) when a suitable proxy variable is given for an unobserved explanatory variable. Unfortunately, suitable proxy variables are not always available.

In the previous two chapters, we explained how fixed effects estimation or first differencing can be used with panel data to estimate the effects of time-varying independent variables in the presence of time-constant omitted variables. Although such methods are very useful, we do not always have access to panel data. Even if we can obtain panel data, it does us little good if we are interested in the effect of a variable that does not change over time: first differencing or fixed effects estimation elimi- nates time-constant explanatory variables. In addition, the panel data methods that we have studied so far do not solve the problem of time-varying omitted variables that are correlated with the explanatory variables.

In this chapter, we take a different approach to the endogeneity problem. You will see how the method of instrumental variables (IV) can be used to solve the problem of endogeneity of one or more explanatory variables. The method of two stage least squares (2SLS or TSLS) is second in popularity only to ordinary least squares for estimating linear equations in applied econometrics.

We begin by showing how IV methods can be used to obtain consistent estimators in the pres- ence of omitted variables. IV can also be used to solve the errors-in-variables problem, at least under certain assumptions. The next chapter will demonstrate how to estimate simultaneous equations models using IV methods.

Our treatment of instrumental variables estimation closely follows our development of ordinary least squares in Part 1, where we assumed that we had a random sample from an underlying popula- tion. This is a desirable starting point because, in addition to simplifying the notation, it emphasizes that the important assumptions for IV estimation are stated in terms of the underlying population (just as with OLS). As we showed in Part 2, OLS can be applied to time series data, and the same is true of instrumental variables methods. Section 15-7 discusses some special issues that arise when IV meth- ods are applied to time series data. In Section 15-8, we cover applications to pooled cross sections and panel data.

## 15-1 Motivation: Omitted Variables in a Simple Regression Model

When faced with the prospect of omitted variables bias (or unobserved heterogeneity), we have so far discussed three options: (1) we can ignore the problem and suffer the consequences of biased and inconsistent estimators; (2) we can try to find and use a suitable proxy variable for the unobserved variable; or (3) we can assume that the omitted variable does not change over time and use the fixed effects or first-differencing methods from Chapters 13 and 14. The first response can be satisfactory if the estimates are coupled with the direction of the biases for the key parameters. For example, if we can say that the estimator of a positive parameter, say, the effect of job training on subsequent wages, is biased toward zero and we have found a statistically significant positive estimate, we have still learned something: job training has a positive effect on wages, and it is likely that we have underes- timated the effect. Unfortunately, the opposite case, where our estimates may be too large in magni- tude, often occurs, which makes it very difficult for us to draw any useful conclusions.

The proxy variable solution discussed in Section 9-2 can also produce satisfying results, but it is not always possible to find a good proxy. This approach attempts to solve the omitted variable prob- lem by replacing the unobservable with one or more proxy variables.

Another approach leaves the unobserved variable in the error term, but rather than estimating the model by OLS, it uses an estimation method that recognizes the presence of the omitted variable. This is what the method of instrumental variables does.

### Example 1.

For illustration, consider the problem of unobserved ability in a wage equation for working adults. A simple model is

\begin{equation}
log(wage)=\beta_0+\beta_1*educ+\beta_2*abil+e
\end{equation}

where e is the error term. In Chapter 9, we showed how, under certain assumptions, a proxy variable such as IQ can be substituted for ability, and then a consistent estimator of $\beta_1$ is available from the regression of

\begin{equation}
log(wage) \quad on \quad educ, IQ
\end{equation}

Suppose, however, that a proxy variable is not available (or does not have the properties needed to produce a consistent estimator of \beta_1 ). Then, we put abil into the error term, and we are left with the simple regression model

\begin{equation}
log(wage)=\beta_0+\beta_1*educ+e \tag{15.1}
\end{equation}

where u contains abil . Of course, if equation (15.1) is estimated by OLS, a biased and inconsistent estimator of $\beta_1$ results if educ and abil are correlated. It turns out that we can still use equation (15.1) as the basis for estimation, provided we can find an instrumental variable for educ . To describe this approach, the simple regression model is written as

\begin{equation}
y=\beta_0+\beta_1*x+u \tag{15.2}
\end{equation}

where we think that x and u are correlated (have nonzero covariance):

\begin{equation}
Cov(x,u) \neq 0 \tag{15.3}
\end{equation}

The method of instrumental variables works whether or not x and u are correlated, but, for reasons we will see later, OLS should be used if x is uncorrelated with u.

In order to obtain consistent estimators of $\beta_0$ and $\beta_1$ when x and u are correlated, we need some additional information. The information comes by way of a new variable that satisfies certain prop- erties. Suppose that we have an observable variable z that satisfies these two assumptions: (1) z is uncorrelated with u , that is,

\begin{equation}
Cov(z,u)=0 \tag{15.4}
\end{equation}

(2) z is correlated with x , that is,

\begin{equation}
Cov(z,x) \neq 0 \tag{15.5}
\end{equation}

Then, we call z an instrumental variable for x , or sometimes simply an instrument for x .

In the context of omitted variables, instrument exogeneity means that z should have no partial effect on y (after x and omitted variables have been controlled for), and z should be uncorrelated with the omitted variables. Equation (15.5) means that z must be related, either positively or negatively, to the endogenous explanatory variable x.

There is a very important difference between the two requirements for an instrumental variable. Because (15.4) involves the covariance between z and the unobserved error u , we cannot generally hope to test this assumption: in the vast majority of cases, we must maintain $Cov(z,u)=0$ by appeal- ing to economic behavior or introspection. (In unusual cases, we might have an observable proxy variable for some factor contained in u , in which case we can check to see if z and the proxy variable are roughly uncorrelated. Of course, if we have a good proxy for an important element of u , we might just add the proxy as an explanatory variable and estimate the expanded equation by ordinary least squares. See Section 9-2.)

By contrast, the condition that z is correlated with x (in the population) can be tested, given a random sample from the population. 

For the log( wage ) equation in (15.1), an instrumental variable z for educ must be (1) uncorrelated with ability (and any other unobserved factors affecting wage) and (2) correlated with education. Something such as the last digit of an individual's Social Security Number almost certainly satisfies the first requirement: it is uncorrelated with ability because it is determined randomly. However, it is precisely because of the randomness of the last digit of the SSN that it is not correlated with educa- tion, either; therefore it makes a poor instrumental variable for educ because it violates the instrument relevance requirement in equation (15.5).

What we have called a proxy variable for the omitted variable makes a poor IV for the opposite reason. For example, in the log( wage ) example with omitted ability, a proxy variable for abil should be as highly correlated as possible with abil . An instrumental variable must be uncorrelated with abil . Therefore, while IQ is a good candidate as a proxy variable for abil , it is not a good instrumental vari- able for educ because it violates the instrument exogeneity requirement in equation (15.4).

Whether other possible instrumental variable candidates satisfy the exogeneity requirement in (15.4) is less clear-cut. In wage equations, labor economists have used family background variables as IVs for education. For example, mother's education ( motheduc ) is positively correlated with child's education, as can be seen by collecting a sample of data on working people and running a simple regression of educ on motheduc . Therefore, motheduc satisfies equation (15.5). The problem is that mother's education might also be correlated with child's ability (through mother's ability and perhaps quality of nurturing at an early age), in which case (15.4) fails.

Whether other possible instrumental variable candidates satisfy the exogeneity requirement in (15.4) is less clear-cut. In wage equations, labor economists have used family background variables as IVs for education. For example, mother&#8217;s education ( motheduc ) is positively correlated with child&#8217;s education, as can be seen by collecting a sample of data on working people and running a simple regression of educ on motheduc . Therefore, motheduc satisfies equation (15.5). The problem is that mother&#8217;s education might also be correlated with child&#8217;s ability (through mother&#8217;s ability and perhaps quality of nurturing at an early age), in which case (15.4) fails.

### Example 2. 

Another IV choice for educ in (15.1) is number of siblings while growing up ( sibs ). Typically, having more siblings is associated with lower average levels of education. Thus, if number of siblings is uncorrelated with ability, it can act as an instrumental variable for educ.

\begin{equation}
score=\beta_0+\beta_1*skipped+u \tag{15.8}
\end{equation}

where score is the final exam score and skipped is the total number of lectures missed during the semester. We certainly might be worried that skipped is correlated with other factors in u : more able, highly motivated students might miss fewer classes. Thus, a simple regression of score on skipped may not give us a good estimate of the causal effect of missing classes.

What might be a good IV for skipped ? We need something that has no direct effect on score and is not correlated with student ability and motivation. At the same time, the IV must be correlated with skipped . One option is to use distance between living quarters and campus. Some students at a large university will commute to campus, which may increase the likelihood of missing lectures (due to bad weather, oversleeping, and so on). Thus, skipped may be positively correlated with distance ; this can be checked by regressing skipped on distance and doing a t test, as described earlier.

Is distance uncorrelated with u ? In the simple regression model (15.8), some factors in u may be correlated with distance . For example, students from low-income families may live off campus; if income affects student performance, this could cause distance to be correlated with u . Section 15-2 shows how to use IV in the context of multiple regression, so that other factors affecting score can be included directly in the model. Then, distance might be a good IV for skipped . An IV approach may not be necessary at all if a good proxy exists for student ability, such as cumulative GPA prior to the semester.

Arguments for why a variable z makes a good IV candidate for an endogenous explanatory variable x should include a discussion about the nature of the relationship between x and z . For example, due to genetics and background influences it makes sense that child's education ( x ) and mother's education ( z ) are positively correlated. If in your sample of data you find that they are actually negatively correlated then your use of mother's education as an IV for child's education is likely to be unconvincing. [And this has nothing to do with whether condition (15.4) is likely to hold.] In the example of measuring whether skipping classes has an effect on test performance, one should find a positive, statistically significant relationship between skipped and distance in order to justify using distance as an IV for skipped: a negative relationship would be difficult to justify [and would suggest that there are important omitted variables driving a negative correlation --variables that might themselves have to be included in the model (15.8)].

## 15-1a Statistical Inference with the IV Estimator

Given the similar structure of the IV and OLS estimators, it is not surprising that the IV estimator has an approximate normal distribution in large sample sizes. To perform inference on $\beta_1$ , we need a standard error that can be used to compute t statistics and confidence intervals. The usual approach is to impose a homoskedasticity assumption, just as in the case of OLS. Now, the homoskedasticity assumption is stated conditional on the instrumental variable, z , not the endogenous explanatory vari- able, x . Along with the previous assumptions on u , x , and z , we add

\begin{equation}
E(u^2|z)=\sigma^2=Var(u) \tag{15.11}
\end{equation}

It can be shown that under (15.4), (15.5) and (15.11), the asymptotic variance of $\hat \beta_1$ is

\begin{equation}
\dfrac{\sigma^2}{n\sigma^2_x\rho^2_{x,z}} \tag{15.12}
\end{equation}

where $\sigma^2_x$ is the population variance of x, $\sigma^2$ is the population variance of u , and $\rho^2_{x,z}$ is the square of the population correlation between x and z . This tells us how highly correlated x and z are in the popu- lation. As with the OLS estimator, the asymptotic variance of the IV estimator decreases to zero at the rate of 1/ n , where n is the sample size.

Equation (15.12) is interesting for two reasons. First, it provides a way to obtain a standard error for the IV estimator. All quantities in (15.12) can be consistently estimated given a random sample (Refer to Wooldridge 2016 section 15-1a). 

A second reason (15.12) is interesting is that it allows us to compare the asymptotic variances of the IV and the OLS estimators (when x and u are uncorrelated).

### Wooldridge Example 15.1. Estimating the Return to Education for Married Women

We use the data on married working women in MROZ to estimate the return to education in the simple regression model

\begin{equation}
log(wage)=\beta_0+\beta_1*educ+e \tag{15.14}
\end{equation}

In [3]:
install.packages('AER')
install.packages('stargazer')
library(foreign);library(AER);library(stargazer)

Installing package into '/home/nbuser/R'
(as 'lib' is unspecified)
Installing package into '/home/nbuser/R'
(as 'lib' is unspecified)
Loading required package: car
Loading required package: lmtest
Loading required package: zoo

Attaching package: 'zoo'

The following objects are masked from 'package:base':

    as.Date, as.Date.numeric

Loading required package: sandwich
Loading required package: survival

Please cite as: 

 Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
 R package version 5.2.1. https://CRAN.R-project.org/package=stargazer 



In [11]:
mroz <- read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/mroz.dta?raw=true")
# restrict to non-missing wage observations
oursample <- subset(mroz, !is.na(wage))


# OLS automatically
reg.ols <-   lm(log(wage) ~ educ, data=oursample)


# Pretty regression table
summary(reg.ols)



Call:
lm(formula = log(wage) ~ educ, data = oursample)

Residuals:
     Min       1Q   Median       3Q      Max 
-3.10256 -0.31473  0.06434  0.40081  2.10029 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -0.1852     0.1852  -1.000    0.318    
educ          0.1086     0.0144   7.545 2.76e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.68 on 426 degrees of freedom
Multiple R-squared:  0.1179,	Adjusted R-squared:  0.1158 
F-statistic: 56.93 on 1 and 426 DF,  p-value: 2.761e-13


The OLS-based estimate for $\beta_1$ implies an almost 11% return for another year of education.

Next, we use father's education (fatheduc) as an instrumental variable for educ. We have to maintain that fatheduc is uncorrelated with u . The second requirement is that educ and fatheduc are correlated. We can check this very easily using a simple regression of educ on fatheduc (using only the working women in the sample):

In [15]:
# OLS automatically
reg2.ols <-   lm(educ ~ fatheduc, data=oursample)
summary(reg2.ols)


Call:
lm(formula = educ ~ fatheduc, data = oursample)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.4704 -1.1231 -0.1231  0.9546  5.9546 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 10.23705    0.27594  37.099   <2e-16 ***
fatheduc     0.26944    0.02859   9.426   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.081 on 426 degrees of freedom
Multiple R-squared:  0.1726,	Adjusted R-squared:  0.1706 
F-statistic: 88.84 on 1 and 426 DF,  p-value: < 2.2e-16


The t statistic on fatheduc is 9.426, which indicates that educ and fatheduc have a statistically significant positive correlation. (In fact, fatheduc explains about 17.2% of the variation in educ in the sample). Using fatheduc as an IV for educ gives

In [16]:
# IV automatically 
reg.iv <- ivreg(log(wage) ~ educ | fatheduc, data=oursample) 
summary(reg.iv)


Call:
ivreg(formula = log(wage) ~ educ | fatheduc, data = oursample)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.0870 -0.3393  0.0525  0.4042  2.0677 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  0.44110    0.44610   0.989   0.3233  
educ         0.05917    0.03514   1.684   0.0929 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6894 on 426 degrees of freedom
Multiple R-Squared: 0.09344,	Adjusted R-squared: 0.09131 
Wald test: 2.835 on 1 and 426 DF,  p-value: 0.09294 


The IV estimate of the return to education is 5.9%, which is barely more than one-half of the OLS esti- mate. This suggests that the OLS estimate is too high and is consistent with omitted ability bias. We can never know whether .109 is above the true return to education, or whether .059 is closer to the true return to education. Further, the standard error of the IV estimate is two and one-half times as large as the OLS standard error.

A revealed in the following code, the confidence interval for $\beta_1$ using OLS is much tighter than using the IV. Therefore, although the differences between the two models are practically large, we cannot say whether the difference is statistically significant. We will show how to test this in Section 15-5. 

In [19]:
stargazer(reg.ols,reg.iv, type="text",ci=TRUE)


                                         Dependent variable:          
                               ---------------------------------------
                                              log(wage)               
                                         OLS            instrumental  
                                                          variable    
                                         (1)                 (2)      
----------------------------------------------------------------------
educ                                  0.109***             0.059*     
                                   (0.080, 0.137)      (-0.010, 0.128)
                                                                      
Constant                               -0.185               0.441     
                                   (-0.548, 0.178)     (-0.433, 1.315)
                                                                      
----------------------------------------------------------------------
Obser

### Wooldridge Example 15.2. Estimating the Return to Education for Men 

We now use WAGE2 to estimate the return to education for men. We use the variable sibs (number of siblings) as an instrument for educ . These are negatively correlated, as we can verify from a simple regression:

In [20]:
wage2 <- read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/wage2.dta?raw=true")
# restrict to non-missing wage observations

reg.ols <-   lm(educ ~ sibs, data=wage2)

# Pretty regression table
summary(reg.ols)



Call:
lm(formula = educ ~ sibs, data = wage2)

Residuals:
   Min     1Q Median     3Q    Max 
-5.139 -1.683 -0.683  1.931  6.140 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 14.13879    0.11314 124.969  < 2e-16 ***
sibs        -0.22792    0.03028  -7.528 1.22e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.134 on 933 degrees of freedom
Multiple R-squared:  0.05726,	Adjusted R-squared:  0.05625 
F-statistic: 56.67 on 1 and 933 DF,  p-value: 1.215e-13


This equation implies that every sibling is associated with, on average, about .23 less of a year of education. If we assume that sibs is uncorrelated with the error term in (15.14), then the IV estimator is consistent. Estimating equation (15.14) using sibs as an IV for educ gives

In [24]:

# IV automatically 
reg.iv <- ivreg(log(wage) ~ educ | sibs, data=wage2) 
summary(reg.iv)



Call:
ivreg(formula = log(wage) ~ educ | sibs, data = wage2)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.85429 -0.26950  0.04223  0.29276  1.31039 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  5.13003    0.35517  14.444  < 2e-16 ***
educ         0.12243    0.02635   4.646 3.86e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4233 on 933 degrees of freedom
Multiple R-Squared: -0.009174,	Adjusted R-squared: -0.01026 
Wald test: 21.59 on 1 and 933 DF,  p-value: 3.865e-06 


### Wooldridge Example 15.3. Estimating the effect of Smoking on Birth Weight 

In Chapter 6, we estimated the effect of cigarette smoking on child birth weight. Without other explanatory variables, the model is

\begin{equation}
log(bwght)=\beta_0+\beta_1*packs+u \tag{15.21}
\end{equation}

where packs is the number of packs smoked by the mother per day. We might worry that packs is correlated with other health factors or the availability of good prenatal care, so that packs and u might be correlated. A possible instrumental variable for packs is the average price of cigarettes in the state of residence, cigprice . We will assume that cigprice and u are uncorrelated (even though state support for health care could be correlated with cigarette taxes).

If cigarettes are a typical consumption good, basic economic theory suggests that packs and cig- price are negatively correlated, so that cigprice can be used as an IV for packs . To check this, we regress packs on cigprice , using the data in BWGHT:

In [25]:
bwght <- read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/bwght.dta?raw=true")
# restrict to non-missing wage observations

reg.ols <-   lm(packs ~ cigprice, data=bwght)

# Pretty regression table
summary(reg.ols)


Call:
lm(formula = packs ~ cigprice, data = bwght)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.1106 -0.1061 -0.1032 -0.1015  2.4016 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0674257  0.1025384   0.658    0.511
cigprice    0.0002829  0.0007830   0.361    0.718

Residual standard error: 0.2987 on 1386 degrees of freedom
Multiple R-squared:  9.417e-05,	Adjusted R-squared:  -0.0006273 
F-statistic: 0.1305 on 1 and 1386 DF,  p-value: 0.7179


This indicates no relationship between smoking during pregnancy and cigarette prices, which is perhaps not too surprising given the addictive nature of cigarette smoking.

Because packs and cigprice are not correlated, we should not use cigprice as an IV for packs in the previous model.

The previous example shows that IV estimation can produce strange results when the instrument relevance condition, $Corr(z,x) \leq 0$, fails. Of practically greater interest is the so-called problem of weak instruments , which is loosely defined as the problem of &#8220;low&#8221; (but not zero) correlation between z and x.

## 15-1c Computing R-Squared after IV Estimation

Unlike in the case of OLS, the R -squared from IV estimation can be negative because SSR for IV can actually be larger than SST. Although it does not really hurt to report the R -squared for IV estimation, it is not very useful, either as the R-squared has no natural interpretation. If our goal was to produce the largest R -squared, we would always use OLS. IV methods are intended to provide better estimates of the ceteris paribus effect of x on y when x and u are correlated; goodness-of-fit is not a factor. A high R-squared resulting from OLS is of little comfort if we cannot consistently estimate $\beta_1$

## 15-2 IV Estimation of the Multiple Regression Model

The IV estimator for the simple regression model is easily extended to the multiple regression case. We begin with the case where only one of the explanatory variables is correlated with the error. In fact, consider a standard linear model with two explanatory variables:

\begin{equation}
y_1=\beta_0+\beta_1*y_2+\beta_2*z_1+u_1 \tag{15.22}
\end{equation}

We call this a structural equation to emphasize that we are interested in the $\beta_j$ , which simply means that the equation is supposed to measure a causal relationship. We use a new notation here to distinguish endogenous from exogenous variables. The dependent variable $y_1$ is clearly endogenous, as it is correlated with $u_1$ . The variables $y_2$ and $z_1$ are the explanatory variables, and $u_1$ is the error. As usual, we assume that the expected value of $u_1$ is zero: $E(u_1)=0$. We use $z_1$ to indicate that this variable is exogenous in (15.22) ($z_1$ is uncorrelated with $u_1$). We use $y_2$ to indicate that this variable is suspected of being correlated with $u_1$. We do not specify why $y_2$ and $u_1$ are correlated, but for now it is best to think of $u_1$ as containing an omitted variable correlated with $y_2$ . The notation in equation (15.22) originates in simultaneous equations models (which we cover in Chapter 16), but we use it more generally to easily distinguish exogenous from endogenous explanatory variables in a multiple regression model.

An example of (15.22) is:

\begin{equation}
log(wage)=\beta_0+\beta_1*educ+\beta_2*exper+u_1 \tag{15.23}
\end{equation}

where $y_1=log(wage)$, $y_2=educ$ , and $z_1=exper$. In other words, we assume that exper is exogenous in (15.23), but we allow that educ -for the usual reasons- is correlated with $u_1$.

We know that if (15.22) is estimated by OLS, all of the estimators will be biased and inconsist- ent. Thus, we follow the strategy suggested in the previous section and seek an instrumental variable for $y_2$. We need another exogenous variable -call it $z_2$ that does not appear in (15.22). Therefore, key assumptions are that $z_1$ and $z_2$ are uncorrelated with $u_1$ ; we also assume that $u_1$ has zero expected value, which is without loss of generality when the equation contains an intercept.

### Wooldridge Example 15.4. using College Proximity as an IV for Education 

Card (1995) used wage and education data for a sample of men in 1976 to estimate the return to education. He used a dummy variable for whether someone grew up near a four-year college (nearc4) as an instrumental variable for education. In a log(wage) equation, he included other standard controls: experience, a black dummy variable, dummy variables for living in an SMSA and living in the South, and a full set of regional dummy variables and an SMSA dummy for where the man was living in 1966.

In order for nearc4 to be a valid instrument, it must be uncorrelated with the error term in the wage equation -we assume this- and it must be partially correlated with educ . To check the latter requirement, we regress educ on nearc4 and all of the exogenous variables appearing in the equation. (That is, we estimate the reduced form for educ .) Using the data in CARD, we obtain, in condensed form,

In [4]:
library(foreign);library(AER);library(stargazer)

In [5]:
card <- read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/card.dta?raw=true")

In [7]:
# Checking for relevance: reduced form
redf<-lm(educ ~ nearc4+exper+I(exper^2)+black+smsa+south+smsa66+reg662+
           reg663+reg664+reg665+reg666+reg667+reg668+reg669, data=card)
summary(redf)


Call:
lm(formula = educ ~ nearc4 + exper + I(exper^2) + black + smsa + 
    south + smsa66 + reg662 + reg663 + reg664 + reg665 + reg666 + 
    reg667 + reg668 + reg669, data = card)

Residuals:
   Min     1Q Median     3Q    Max 
-7.545 -1.370 -0.091  1.278  6.239 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 16.6382529  0.2406297  69.145  < 2e-16 ***
nearc4       0.3198989  0.0878638   3.641 0.000276 ***
exper       -0.4125334  0.0336996 -12.241  < 2e-16 ***
I(exper^2)   0.0008686  0.0016504   0.526 0.598728    
black       -0.9355287  0.0937348  -9.981  < 2e-16 ***
smsa         0.4021825  0.1048112   3.837 0.000127 ***
south       -0.0516126  0.1354284  -0.381 0.703152    
smsa66       0.0254805  0.1057692   0.241 0.809644    
reg662      -0.0786363  0.1871154  -0.420 0.674329    
reg663      -0.0279390  0.1833745  -0.152 0.878913    
reg664       0.1171820  0.2172531   0.539 0.589665    
reg665      -0.2726165  0.2184204  -1.248 0.212082    
reg

The previous results indicate that nearc4 is positively correlated with educ and thus can be used as an Instrumental Variable for educ (Assuming that nearc4 is uncorrelated with the error term). Following we provide the OLS and IV estimates:

In [12]:
# OLS
ols<-lm(log(wage)~educ+exper+I(exper^2)+black+smsa+south+smsa66+reg662+
           reg663+reg664+reg665+reg666+reg667+reg668+reg669, data=card)
# IV estimation
iv <-ivreg(log(wage)~educ+exper+I(exper^2)+black+smsa+south+smsa66+
             reg662+reg663+reg664+reg665+reg666+reg667+reg668+reg669 
          | nearc4+exper+I(exper^2)+black+smsa+south+smsa66+
            reg662+reg663+reg664+reg665+reg666+reg667+reg668+reg669
          , data=card)

# Pretty regression table of selected coefficients
stargazer(ols,iv,type="text",
          keep=c("ed","near","exp","bl", "smsa","south"),keep.stat=c("n","rsq"),ci=TRUE)


                    Dependent variable:       
             ---------------------------------
                         log(wage)            
                   OLS          instrumental  
                                  variable    
                   (1)              (2)       
----------------------------------------------
educ             0.075***         0.132**     
              (0.068, 0.082)   (0.024, 0.239) 
                                              
exper            0.085***         0.108***    
              (0.072, 0.098)   (0.062, 0.155) 
                                              
I(exper2)       -0.002***        -0.002***    
             (-0.003, -0.002) (-0.003, -0.002)
                                              
black           -0.199***        -0.147***    
             (-0.235, -0.163) (-0.252, -0.041)
                                              
smsa             0.136***         0.112***    
              (0.097, 0.176)   (0.050, 0.174) 
            

Interestingly, the IV estimate of the return to education is almost twice as large as the OLS estimate, but the standard error of the IV estimate is over 18 times larger than the OLS standard error. The 95% confidence interval for the IV estimate is between .024 and .239, which is a very wide range. The presence of larger confidence intervals is a price we must pay to get a consistent estimator of the return to education when we think educ is endogenous.

As discussed earlier, we should not make anything of the smaller R -squared in the IV estimation: by definition, the OLS R -squared will always be larger because OLS minimizes the sum of squared residuals.

## 15-3 Two Stage Least Squares: 

Please refer to Wooldridge 2016 (section 15-3) for more information.

### Wooldridge Example 15.5. Return to Education for Working Women 

In [None]:
Assuming that the wage equation is

\begin{equation}
log(wage)=\beta_0+\beta_1*educ+\beta_2*exper+\beta_3*exper^2+u_1 \tag{15.40}
\end{equation}

where $u_1$ is uncorrelated with both exper and $exper^2$. Suppose that we also think mother's and father's education are uncorrelated with $u_1$. Then, we can use both of these as IVs for educ. The reduced form equation for educ is

\begin{equation}
educ=\pi_0+\pi_1*exper+\pi_2*exper^2+\pi_3*mothereduc+\pi_4*fathereduc+v_2 \tag{15.41}
\end{equation}

and identification requires that $\pi_3 \neq 0$ or $\pi_4 \neq 0$ (or both). We test for this in the following code:

In [33]:
mroz <- read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/mroz.dta?raw=true")
# restrict to non-missing wage observations
oursample <- subset(mroz, !is.na(wage))

# 1st stage: reduced form
stage1 <- lm(educ~exper+I(exper^2)+motheduc+fatheduc, data=oursample)
summary(stage1)



Call:
lm(formula = educ ~ exper + I(exper^2) + motheduc + fatheduc, 
    data = oursample)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.8057 -1.0520 -0.0371  1.0258  6.3787 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  9.102640   0.426561  21.340  < 2e-16 ***
exper        0.045225   0.040251   1.124    0.262    
I(exper^2)  -0.001009   0.001203  -0.839    0.402    
motheduc     0.157597   0.035894   4.391 1.43e-05 ***
fatheduc     0.189548   0.033756   5.615 3.56e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.039 on 423 degrees of freedom
Multiple R-squared:  0.2115,	Adjusted R-squared:  0.204 
F-statistic: 28.36 on 4 and 423 DF,  p-value: < 2.2e-16


In [34]:
# F test
myH0 <- c("motheduc ","fatheduc ")
linearHypothesis(stage1, myH0)

Res.Df,RSS,Df,Sum of Sq,F,Pr(>F)
425,2219.216,,,,
423,1758.575,2.0,460.6411,55.4003,4.268909e-22


The result of the previous partial F-test (F=55.4, p-value=0) indicates that both instrumental variables (IV) motheduc, fatheduc are correlated with educ

When we estimate (15.40) by 2SLS, we obtain,

In [40]:
library(foreign);library(AER);library(stargazer)
mroz <- read.dta("http://fmwww.bc.edu/ec-p/data/wooldridge/mroz.dta")

# restrict to non-missing wage observations
oursample <- subset(mroz, !is.na(wage))


# Automatic 2SLS estimation
aut.2SLS<-ivreg(log(wage)~educ+exper+I(exper^2) 
             | motheduc+fatheduc+exper+I(exper^2) , data=oursample)

# Pretty regression table
summary(aut.2SLS)



Call:
ivreg(formula = log(wage) ~ educ + exper + I(exper^2) | motheduc + 
    fatheduc + exper + I(exper^2), data = oursample)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.0986 -0.3196  0.0551  0.3689  2.3493 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)   
(Intercept)  0.0481003  0.4003281   0.120  0.90442   
educ         0.0613966  0.0314367   1.953  0.05147 . 
exper        0.0441704  0.0134325   3.288  0.00109 **
I(exper^2)  -0.0008990  0.0004017  -2.238  0.02574 * 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6747 on 424 degrees of freedom
Multiple R-Squared: 0.1357,	Adjusted R-squared: 0.1296 
Wald test: 8.141 on 3 and 424 DF,  p-value: 2.787e-05 


The estimated return to education is about 6.1%. Because of its large standard error, the 2SLS estimate is barely statistically significant at the 5% level against a two-sided alternative.

## 15-5 Testing for Endogeneity and Testing Overidentifying Restrictions

### 15-5a Testing for Endogeneity

The 2SLS estimator is less efficient than OLS when the explanatory variables are exogenous; as we have seen, the 2SLS estimates can have very large standard errors. Therefore, it is useful to have a test for endogeneity of an explanatory variable that shows whether 2SLS is even necessary. Obtaining such a test is rather simple.

To illustrate, suppose we have a single suspected endogenous variable $y_2$,

\begin{equation}
y_1=\beta_0+\beta_1*y_2+\beta_2*z_1+\beta_3*z_2+u_1 \tag{15.49}
\end{equation}

where $z_1$ and $z_2$ are exogenous. We have two additional exogenous variables, $z_3$ and $z_4$ , which do not appear in (15.49). If $y_2$ is uncorrelated with $u_1$ , we should estimate (15.49) by OLS. How can we test this? Hausman (1978) suggested directly comparing the OLS and 2SLS estimates and determining whether the differences are statistically significant. After all, both OLS and 2SLS are consistent if all variables are exogenous. If 2SLS and OLS differ significantly, we conclude that $y_2$ must be endogenous (maintaining that the $z_j$ are exogenous).

It is a good idea to compute OLS and 2SLS to see if the estimates are practically different. To determine whether the differences are statistically significant, it is easier to use a regression test. This is based on estimating the reduced form for $y_2$ , which in this case is

\begin{equation}
y_2=\pi_0+\pi_1*z_1+\pi_2*z_2+\pi_3*z_3+\pi_4*z_4+v_2 \tag{15.50}
\end{equation}

#### Testing for Endogeneity of a Single Explanatory Variable:

(i) Estimate the reduced form for $y_2$ by regressing it on all exogenous variables (including those in the structural equation and the additional IVs). Obtain the residuals $\hat v_2$

(ii) Add $\hat v_2$ to the structural equation (which includes $y_2$) and test for significance of $\hat v_2$ using an OLS regression. If the coefficient $\hat v_2$ is statistically different from zero, we conclude that $y_2$ is indeed endogenous. We might want to use a heteroskedasticity-robust test.

### Wooldridge Example 15.7. Return to Education for Working Women

We can test for endogeneity of educ in (15.40) by obtaining the residuals $\hat v_2$ from estimating the reduced form (15.41) -using only working women- and including these in (15.40).

In [46]:
library(foreign);library(AER);library(stargazer)
mroz <- read.dta("http://fmwww.bc.edu/ec-p/data/wooldridge/mroz.dta")

# restrict to non-missing wage observations
oursample <- subset(mroz, !is.na(wage))

# 1st stage: reduced form
stage1<-lm(educ~exper+I(exper^2)+motheduc+fatheduc, data=oursample)

# 2nd stage
stage2<-lm(log(wage)~educ+exper+I(exper^2)+resid(stage1),data=oursample)

# results including t tests
coeftest(stage2)



t test of coefficients:

                 Estimate  Std. Error t value  Pr(>|t|)    
(Intercept)    0.04810030  0.39457526  0.1219 0.9030329    
educ           0.06139663  0.03098494  1.9815 0.0481824 *  
exper          0.04417039  0.01323945  3.3363 0.0009241 ***
I(exper^2)    -0.00089897  0.00039591 -2.2706 0.0236719 *  
resid(stage1)  0.05816661  0.03480728  1.6711 0.0954406 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Given that the coefficient on $\hat v_2$ is $\hat \rho_1 = .058$ and $t=1.67$. There is moderate evidence of positive correlation between $u_1$ and $v_2$ and thus we conclude that educ is endogenous. 


We can also test for endogeneity of multiple explanatory variables. For each suspected endog- enous variable, we obtain the reduced form residuals, as in part (i). Then, we test for joint significance of these residuals in the structural equation, using an F test. Joint significance indicates that at least one suspected explanatory variable is endogenous. The number of exclusion restrictions tested is the number of suspected endogenous explanatory variables.

### 15-5b Testing Overidentification Restrictions

When we introduced the simple instrumental variables estimator in Section 15-1, we emphasized that the instrument must satisfy two requirements: it must be uncorrelated with the error (exogeneity) and correlated with the endogenous explanatory variable (relevance). We have now seen that, even in models with additional explanatory variables, the second requirement can be tested using a t test (with just one instrument) or an F test (when there are multiple instruments). In the context of the simple IV estimator, we noted that the exogeneity requirement cannot be tested. However, if we have more instruments than we need, we can effectively test whether some of them are uncorrelated with the structural error.

The procedure of comparing different IV estimates of the same parameter is an example of test- ing overidentifying restrictions . The general idea is that we have more instruments than we need to estimate the parameters consistently.

#### Testing Overidentifying Restrictions:

(i) Estimate the structural equation by 2SLS and obtain the 2SLS residuals, $\hat u_1$.
(ii) Regress $\hat u_1$ on all exogenous variables. Obtain the R-squared, say, $R_1^2$.
(iii) Under the null hypothesis that all IVs are uncorrelated with $u_1$ then $n R_1^2 \stackrel{a}{\sim} \chi^2_q$, where q is the number of instrumental variables from outside the model minus the total number of endogenous explanatory variables. If $nR_1^2$ exceeds (say) the 5% critical value in the $\chi^2_q$ distribution, we reject $H_0$ and conclude that at least some of the IVs are not exogenous.

### Wooldridge Example 15.8. Return to Education for Working Women

When we use motheduc and fatheduc as IVs for educ in (15.40), we have a single overidentifying restriction. Regressing the 2SLS residuals $u_1$ on exper , exper*exper , motheduc , and fatheduc produces:

In [61]:
library(foreign);library(AER);library(stargazer)
mroz <- read.dta("http://fmwww.bc.edu/ec-p/data/wooldridge/mroz.dta")

# restrict to non-missing wage observations
oursample <- subset(mroz, !is.na(wage))

# IV regression
res.2sls <- ivreg(log(wage) ~ educ+exper+I(exper^2)
                | exper+I(exper^2)+motheduc+fatheduc,data=oursample) 

# Auxiliary regression
res.aux <-  lm(resid(res.2sls) ~ exper+I(exper^2)+motheduc+fatheduc
                       , data=oursample) 

# Calculations for test
sprintf('R-squared:  %s', r2 <- summary(res.aux)$r.squared )
sprintf( 'N: %s',n <- nobs(res.aux) )
sprintf( 'N*R-squared: %s',teststat <- n*r2 )
sprintf('Chi squared distribution p-value: %s' ,pval <- 1-pchisq(teststat,1) )


Therefore, the parents' education variables pass the overidentification test.

## 15-6 2SLS with Heteroskedasticity

Heteroskedasticity in the context of 2SLS raises essentially the same issues as with OLS. Most impor- tantly, it is possible to obtain standard errors and test statistics that are (asymptotically) robust to heteroskedasticity of arbitrary and unknown form.

Returning to Example 158, using motheeduc and fatheduc as instruments for educ, we obtain $F_{1,423}=2.793$, with $p-value=0.095 $. This is evidence of heteroskedasticity at the 5% level (refer to the following code, Wu-Hausman test)

In [65]:
library(foreign);library(AER);library(stargazer)
mroz <- read.dta("http://fmwww.bc.edu/ec-p/data/wooldridge/mroz.dta")

# restrict to non-missing wage observations
oursample <- subset(mroz, !is.na(wage))

# IV regression
summary( res.2sls <- ivreg(log(wage) ~ educ+exper+I(exper^2)
                | exper+I(exper^2)+motheduc+fatheduc,data=oursample) ,diagnostics=TRUE)


Call:
ivreg(formula = log(wage) ~ educ + exper + I(exper^2) | exper + 
    I(exper^2) + motheduc + fatheduc, data = oursample)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.0986 -0.3196  0.0551  0.3689  2.3493 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)   
(Intercept)  0.0481003  0.4003281   0.120  0.90442   
educ         0.0613966  0.0314367   1.953  0.05147 . 
exper        0.0441704  0.0134325   3.288  0.00109 **
I(exper^2)  -0.0008990  0.0004017  -2.238  0.02574 * 

Diagnostic tests:
                 df1 df2 statistic p-value    
Weak instruments   2 423    55.400  <2e-16 ***
Wu-Hausman         1 423     2.793  0.0954 .  
Sargan             1  NA     0.378  0.5386    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6747 on 424 degrees of freedom
Multiple R-Squared: 0.1357,	Adjusted R-squared: 0.1296 
Wald test: 8.141 on 3 and 424 DF,  p-value: 2.787e-05 


## 15-8 Applying 2SLS to Pooled Cross Sections and Panel Data

Applying instrumental variables methods to independently pooled cross sections raises no new difficulties. As with models estimated by OLS, we should often include time period dummy variables to allow for aggregate time effects. These dummy variables are exogenous -because the passage of time is exogenous- and so they act as their own instruments.

Instrumental variables estimation can be combined with panel data methods, particularly first differencing, to estimate parameters consistently in the presence of unobserved effects and endogene- ity in one or more time-varying explanatory variables. The following simple example illustrates this combination of methods.

### Wooldridge Example 15.10. Job Training and Worker Productivity

Suppose we want to estimate the effect of another hour of job training on worker productivity. For the two years 1987 and 1988, consider the simple panel data model:

\begin{equation}
log(scrap_{it})=\beta_0+\delta_0*d88_t+\beta_1*hrsemp_{it}+a_i+u_{it}, t=1,2,
\end{equation}

where $scrap_it$ is firm i's scrap rate in year t and hrsemp it is hours of job training per employee. As usual, we allow different year intercepts and a constant, unobserved firm effect, $a_i$ .

For the reasons discussed in Section 13-2, we might be concerned that hrsemp it is correlated with $a_i$ , the latter of which contains unmeasured worker ability. As before, we difference to remove $a_i$ :

\begin{equation}
\delta log(scrap_{i})=\delta_0+\beta_1*\Delta hrsemp_{i}+\Delta u_{i} \tag{15.57}
\end{equation}

Normally, we would estimate this equation by OLS. But what if $\Delta u_i$ is correlated with $\Delta hrsemp$ ? For example, a firm might hire more skilled workers, while at the same time reducing the level of job training. In this case, we need an instrumental variable for $\Delta hrsemp$ . Generally, such an IV would be hard to find, but we can exploit the fact that some firms received job training grants in 1988. If we assume that grant designation is uncorrelated with $\Delta u_i$  -something that is reasonable, because the grants were given at the beginning of 1988- then $\Delta grant_i$ is valid as an IV, provided $\Delta hrsemp$ and $\Delta grant$ are correlated. Using the data in JTRAIN differenced between 1987 and 1988, the first stage regression is:

In [77]:
install.packages("plm");
library(foreign);library(plm);
jtrain <- read.dta("http://fmwww.bc.edu/ec-p/data/wooldridge/jtrain.dta")

# Define panel data (for 1987 and 1988 only)
jtrain.87.88 <- subset(jtrain,year<=1988)
jtrain.p<-pdata.frame(jtrain.87.88, index=c("fcode","year"))


summary( plm(hrsemp~grant, model="fd",data=jtrain.p))




Installing package into '/home/nbuser/R'
(as 'lib' is unspecified)
"NaNs produced"

Oneway (individual) effect First-Difference Model

Call:
plm(formula = hrsemp ~ grant, data = jtrain.p, model = "fd")

Unbalanced Panel: n = 131, T = 1-2, N = 256
Observations used in estimation: 125

Residuals:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-47.800  -1.580   0.000   0.383   1.300  78.600 

Coefficients:
      Estimate Std. Error t-value  Pr(>|t|)    
grant  28.3873     2.7038  10.499 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    46195
Residual Sum of Squares: 28102
R-Squared:      0.3922
Adj. R-Squared: 0.3922
F-statistic: Inf on 0 and 124 DF, p-value: NA

The previous result confirm that job training per employee (hrsemp) is strongly positively correlated to receiving a job training grant in 1988. In fact receiving a job training grant increased per-employee training by almost 28 hours and grant designation accounted for 39.2% of the variation in $\Delta hrsemp$. Two stage least squares estimation of (15.57) gives:

In [78]:

# IV FD regression
summary( plm(log(scrap)~hrsemp|grant, model="fd",data=jtrain.p))


"NaNs produced"

Oneway (individual) effect First-Difference Model
Instrumental variable estimation
   (Balestra-Varadharajan-Krishnakumar's transformation)

Call:
plm(formula = log(scrap) ~ hrsemp | grant, data = jtrain.p, model = "fd")

Unbalanced Panel: n = 47, T = 1-2, N = 92
Observations used in estimation: 45

Residuals:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-2.3300 -0.2430 -0.0416 -0.0178  0.2370  2.4000 

Coefficients:
        Estimate Std. Error t-value Pr(>|t|)  
hrsemp -0.015526   0.005844 -2.6567  0.01095 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    17.29
Residual Sum of Squares: 17.398
R-Squared:      0.061927
Adj. R-Squared: 0.061927
F-statistic: -Inf on 0 and 44 DF, p-value: NA

This means that 10 more hours of job training per worker are estimated to reduce the scrap rate by about 14%. For the firms in the sample, the average amount of job training in 1988 was about 17 hours per worker, with a minimum of zero and a maximum of 88.