# Chapter 13. Pooling Cross Sections accross Time: Simple Panel Data Methods

Until now, we have covered multiple regression analysis using pure cross-sectional or pure time series data. Although these two cases arise often in applications, data sets that have both cross- sectional and time series dimensions are being used more and more often in empirical research. Multiple regression methods can still be used on such data sets. In fact, data with cross-sectional and time series aspects can often shed light on important policy questions. We will see several examples in this chapter. 

We will analyze two kinds of data sets in this chapter. An independently pooled cross section is obtained by sampling randomly from a large population at different points in time (usually, but not necessarily, different years). For instance, in each year, we can draw a random sample on hourly wages, education, experience, and so on, from the population of working people in the United States. Or, in every other year, we draw a random sample on the selling price, square footage, number of bathrooms, and so on, of houses sold in a particular metropolitan area. From a statistical standpoint, these data sets have an important feature: they consist of independently sampled observations. This was also a key aspect in our analysis of cross-sectional data: among other things, it rules out correlation in the error terms across different observations. 

An independently pooled cross section differs from a single random sample in that sampling from the population at different points in time likely leads to observations that are not identically distributed. For example, distributions of wages and education have changed over time in most countries.

As we will see, this is easy to deal with in practice by allowing the intercept in a multiple regression model, and in some cases the slopes, to change over time. We cover such models in Section 13-1. In Section 13-1, we discuss how pooling cross sections over time can be used to evaluate policy changes.

A panel data set, while having both a cross-sectional and a time series dimension, differs in some important respects from an independently pooled cross section. To collect panel data&#8212;sometimes called longitudinal data &#8212;we follow (or attempt to follow) the same individuals, families, firms, cit- ies, states, or whatever, across time. For example, a panel data set on individual wages, hours, educa- tion, and other factors is collected by randomly selecting people from a population at a given point in time. Then, these same people are reinterviewed at several subsequent points in time. This gives us data on wages, hours, education, and so on, for the same group of people in different years.

Panel data sets are fairly easy to collect for school districts, cities, counties, states, and countries, and policy analysis is greatly enhanced by using panel data sets; we will see some examples in the following discussion. For the econometric analysis of panel data, we cannot assume that the obser- vations are independently distributed across time. For example, unobserved factors (such as ability) that affect someone's wage in 1990 will also affect that person's wage in 1991; unobserved factors that affect a city's crime rate in 1985 will also affect that city's crime rate in 1990. For this reason, special models and methods have been developed to analyze panel data. In Sections 13-3, 13-4, and 13-5, we describe the straightforward method of differencing to remove time-constant, unobserved attributes of the units being studied. Because panel data methods are somewhat more advanced, we will rely mostly on intuition in describing the statistical properties of the estimation procedures, leaving detailed assumptions to the chapter appendix. We follow the same strategy in Chapter 14, which covers more complicated panel data methods.

## 13-1 Pooling Independent Cross Sections accross Time 

Many surveys of individuals,families, and firms are repeated at regular intervals, often each year. An example is the Current Population Survey (or CPS), which randomly samples households each year. (See, for example, CPS78_85, which contains data from the 1978 and 1985 CPS.) If a random sample is drawn at each time period, pooling the resulting random samples gives us an independently pooled cross section.

One reason for using independently pooled cross sections is to increase the sample size. By pooling random samples drawn from the same population, but at different points in time, we can get more precise estimators and test statistics with more power. Pooling is helpful in this regard only insofar as the relationship between the dependent variable and at least some of the independent variables remain constant over time.

As mentioned in the introduction, using pooled cross sections raises only minor statistical complications. Typically, to reflect the fact that the population may have different distributions in different time periods, we allow the intercept to differ across periods, usually years. This is easily accom- plished by including dummy variables for all but one year, where the earliest year in the sample is usually chosen as the base year. It is also possible that the error variance changes over time, something we discuss later.

Sometimes, the pattern of coefficients on the year dummy variables is itself of interest. For exam- ple, a demographer may be interested in the following question: After controlling for education, has the pattern of fertility among women over age 35 changed between 1972 and 1984? The following example illustrates how this question is simply answered by using multiple regression analysis with year dummy variables.

### Wooldridge. Example 13.1 Women's Fertility over Time

The data set in FERTIL1, which is similar to that used by Sander (1992), comes from the National Opinion Research Center's General Social Survey for the even years from 1972 to 1984, inclusively. We use these data to estimate a model explaining the total number of kids born to a woman (kids).

One question of interest is: After controlling for other observable factors, what has happened to fertility rates over time? The factors we control for are years of education, age, race, region of the country where living at age 16, and living environment at age 16. The estimates are computed as follows:

In [17]:
library(foreign)
fertil1 <- read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/fertil1.dta?raw=true")


kidsreg<- lm(kids ~ educ+age+I(age^2)+black+east+northcen+west+farm+othrural+town+smcity+y74+y76+y78+y80+y82+y84 ,data=fertil1)
summary(kidsreg)


Call:
lm(formula = kids ~ educ + age + I(age^2) + black + east + northcen + 
    west + farm + othrural + town + smcity + y74 + y76 + y78 + 
    y80 + y82 + y84, data = fertil1)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.9878 -1.0086 -0.0767  0.9331  4.6548 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -7.742457   3.051767  -2.537 0.011315 *  
educ        -0.128427   0.018349  -6.999 4.44e-12 ***
age          0.532135   0.138386   3.845 0.000127 ***
I(age^2)    -0.005804   0.001564  -3.710 0.000217 ***
black        1.075658   0.173536   6.198 8.02e-10 ***
east         0.217324   0.132788   1.637 0.101992    
northcen     0.363114   0.120897   3.004 0.002729 ** 
west         0.197603   0.166913   1.184 0.236719    
farm        -0.052557   0.147190  -0.357 0.721105    
othrural    -0.162854   0.175442  -0.928 0.353481    
town         0.084353   0.124531   0.677 0.498314    
smcity       0.211879   0.160296   1.322 0.186507    
y74       

The base year is 1972. The coefficients on the year dummy variables show a sharp drop in fertility in the early 1980s. For example, the coefficient on y82 implies that, holding education, age, and other factors fixed, a woman had on average .52 less children, or about one-half a child, in 1982 than in 1972. This is a very large drop: holding educ , age , and the other factors fixed, 100 women in 1982 are predicted to have about 52 fewer children than 100 comparable women in 1972. Since we are controlling for education, this drop is separate from the decline in fertility that is due to the increase in average education levels. (The average years of education are 12.2 for 1972 and 13.3 for 1984.) The coefficients on y82 and y84 represent drops in fertility for reasons that are not captured in the explanatory variables.

Women with more education have fewer children, and the estimate is very statistically significant. Other things being equal, 100 women with a college education will have about 51 fewer children on average than 100 women with only a high school education: .128(4)=.512. Age has a diminishing effect on fertility. (The turning point in the quadratic is at about age=46, by which time most women have finished having children.)

## 13-2 Policy Analysis with Pooled Sections

Pooled cross sections can be very useful for evaluating the impact of a certain event or policy. The fol- lowing example of an event study shows how two cross-sectional data sets, collected before and after the occurrence of an event, can be used to determine the effect on economic outcomes.

### Wooldridge. Example 13.3 Effect of a Garbage Incinerator's Location on Housing Prices

Kiel and McClain (1995) studied the effect that a new garbage incinerator had on housing values in North Andover, Massachusetts. They used many years of data and a fairly complicated econometric analysis. We will use two years of data and some simplified models, but our analysis is similar.

The rumor that a new incinerator would be built in North Andover began after 1978, and construction began in 1981. The incinerator was expected to be in operation soon after the start of construction; the incinerator actually began operating in 1985. We will use data on prices of houses that sold in 1978 and another sample on those that sold in 1981. The hypothesis is that the price of houses located near the incinerator would fall relative to the price of more distant houses.

For illustration, we define a house to be near the incinerator if it is within three miles. We will start by looking at the dollar effect on housing prices. This requires us to measure price in constant dollars. We measure all housing prices in 1978 dollars, using the Boston housing price index. Let rprice denote the house price in real terms.

A naive analyst would use only the 1981 data and estimate a very simple model:

\begin{equation}
rprice = \gamma_0+\gamma_1*nearinc+u
\end{equation}

where nearinc is a binary variable equal to one if the house is near the incinerator, and zero oterwise. Estimating this equation using the data KIELMC gives

In [21]:
library(foreign)
kielmc <- read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/kielmc.dta?raw=true")
rpricereg<-lm(rprice~nearinc, data=kielmc,subset=(year==1981))
summary(rpricereg)


Call:
lm(formula = rprice ~ nearinc, data = kielmc, subset = (year == 
    1981))

Residuals:
   Min     1Q Median     3Q    Max 
-60678 -19832  -2997  21139 136754 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   101308       3093  32.754  < 2e-16 ***
nearinc       -30688       5828  -5.266 5.14e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 31240 on 140 degrees of freedom
Multiple R-squared:  0.1653,	Adjusted R-squared:  0.1594 
F-statistic: 27.73 on 1 and 140 DF,  p-value: 5.139e-07


Since this is a simple regression on a single dummy variable, the intercept is the average selling price for homes not near the incinerator, and the coefficient on nearinc is the difference in the average selling price between homes near the incinerator and those that are not. The estimate shows that the average selling price for the former group was $30,688.27 less than for the latter group. The t statistic is greater than five in absolute value, so we can strongly reject the hypothesis that the average value for homes near and far from the incinerator are the same.

Unfortunately, equation (13.4) does not imply that the siting of the incinerator is causing the lower housing values. In fact, if we run the same regression for 1978 (before the incinerator was even rumored), we obtain

In [22]:
library(foreign)
kielmc <- read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/kielmc.dta?raw=true")
rpricereg<-lm(rprice~nearinc, data=kielmc,subset=(year==1978))
summary(rpricereg)


Call:
lm(formula = rprice ~ nearinc, data = kielmc, subset = (year == 
    1978))

Residuals:
   Min     1Q Median     3Q    Max 
-56517 -16605  -3193   8683 236307 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)    82517       2654  31.094  < 2e-16 ***
nearinc       -18824       4745  -3.968 0.000105 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 29430 on 177 degrees of freedom
Multiple R-squared:  0.08167,	Adjusted R-squared:  0.07648 
F-statistic: 15.74 on 1 and 177 DF,  p-value: 0.0001054


Therefore, even before there was any talk of an incinerator, the average value of a home near the site was \$18,824.37 less than the average value of a home not near the site (\$82,517.23); the difference is statistically significant, as well. This is consistent with the view that the incinerator was built in an area with lower housing values.

How, then, can we tell whether building a new incinerator depresses housing values? The key is to look at how the coefficient on nearinc changed between 1978 and 1981. The difference in average housing value was much larger in 1981 than in 1978 (\$30,688.27 versus \$18,824.37), even as a percentage of the average value of homes not near the incinerator site. The difference in the two coefficients on nearinc is

\begin{equation}
\hat\delta_1=-30688-(-18824)=-11864
\end{equation}

This is our estimate of the effect of the incinerator on values of homes near the incinerator site. In empirical economics, $\hat\delta_1$ has become known as the difference-in-differences estimator. In our case $\hat\delta_1$ is the difference over time in the average difference of housing prices in the two locations.

To test whether $\hat\delta_1$ is statistically different from zero, we need to find its standard error by using a regression analysis. In fact, $\hat\delta_1$ can be obtained by estimating

\begin{equation}
rprice = \beta_0+\delta_0*y81+\beta_1*nearinc+\delta_1*y81*nearinc+u
\end{equation}

using the data pooled over both years. The intercept, $\beta_0$ , is the average price of a home not near the incinerator in 1978. The parameter $\delta_0$ captures changes in all housing values in North Andover from 1978 to 1981. The coefficient on nearinc , $\beta_1$ , measures the location effect that is not due to the presence of the incinerator: as we saw previously, even in 1978, homes near the incinerator site sold for less than homes farther away from the site.

The parameter of interest is on the interaction term y81*nearinc : $\delta_1$ measures the decline in housing values due to the new incinerator, provided we assume that houses both near and far from the site did not appreciate at different rates for other reasons.

The estimates are computed as following:

In [25]:
library(foreign)
kielmc <- read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/kielmc.dta?raw=true")
rpricereg<-lm(rprice~y81+nearinc+y81*nearinc, data=kielmc)
summary(rpricereg)


Call:
lm(formula = rprice ~ y81 + nearinc + y81 * nearinc, data = kielmc)

Residuals:
   Min     1Q Median     3Q    Max 
-60678 -17693  -3031  12483 236307 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)    82517       2727  30.260  < 2e-16 ***
y81            18790       4050   4.640 5.12e-06 ***
nearinc       -18824       4875  -3.861 0.000137 ***
y81:nearinc   -11864       7457  -1.591 0.112595    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 30240 on 317 degrees of freedom
Multiple R-squared:  0.1739,	Adjusted R-squared:  0.1661 
F-statistic: 22.25 on 3 and 317 DF,  p-value: 4.224e-13


The t statistic on $\delta_1$ is about -1.59, which is marginally significant against a one-sided alternative (p-value =.11) 

Kiel and McClain (1995) included various housing characteristics in their analysis of the incinerator siting. There are two good reasons for doing this. First, the kinds of homes selling near the incinerator in 1981 might have been systematically different than those selling near the incinerator in 1978; if so, it can be important to control for such characteristics. Second, even if the relevant house characteristics did not change, including them can greatly reduce the error variance, which can then shrink the standard error of $\delta_1$ . Controling for the age of the houses, using a quadratic, we obtain:

In [28]:
library(foreign)
kielmc <- read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/kielmc.dta?raw=true")
rpricereg<-lm(rprice~y81+nearinc+y81*nearinc+age+I(age^2), data=kielmc)
summary(rpricereg)


Call:
lm(formula = rprice ~ y81 + nearinc + y81 * nearinc + age + I(age^2), 
    data = kielmc)

Residuals:
   Min     1Q Median     3Q    Max 
-79349 -14431  -1711  10069 201486 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  8.912e+04  2.406e+03  37.039  < 2e-16 ***
y81          2.132e+04  3.444e+03   6.191 1.86e-09 ***
nearinc      9.398e+03  4.812e+03   1.953 0.051713 .  
age         -1.494e+03  1.319e+02 -11.333  < 2e-16 ***
I(age^2)     8.691e+00  8.481e-01  10.248  < 2e-16 ***
y81:nearinc -2.192e+04  6.360e+03  -3.447 0.000644 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 25540 on 315 degrees of freedom
Multiple R-squared:  0.4144,	Adjusted R-squared:  0.4052 
F-statistic: 44.59 on 5 and 315 DF,  p-value: < 2.2e-16


This substantially increases the R -squared (by reducing the residual variance). The coefficient on y81&#183;nearinc is now much larger in magnitude, and its standard error is lower.

In addition to the age variables it is possible to control for distance to the inter-state in feet ( intst ), land area in feet ( land ), house area in feet ( area ), number of rooms ( rooms ), and number of baths ( baths ). 

In [30]:
library(foreign)
kielmc <- read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/kielmc.dta?raw=true")
rpricereg<-lm(rprice~y81+nearinc+y81*nearinc+age+I(age^2)+intst+
                            land+area+rooms+baths, data=kielmc)
summary(rpricereg)


Call:
lm(formula = rprice ~ y81 + nearinc + y81 * nearinc + age + I(age^2) + 
    intst + land + area + rooms + baths, data = kielmc)

Residuals:
   Min     1Q Median     3Q    Max 
-76721  -8885   -252   8433 136649 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.381e+04  1.117e+04   1.237  0.21720    
y81          1.393e+04  2.799e+03   4.977 1.07e-06 ***
nearinc      3.780e+03  4.453e+03   0.849  0.39661    
age         -7.395e+02  1.311e+02  -5.639 3.85e-08 ***
I(age^2)     3.453e+00  8.128e-01   4.248 2.86e-05 ***
intst       -5.386e-01  1.963e-01  -2.743  0.00643 ** 
land         1.414e-01  3.108e-02   4.551 7.69e-06 ***
area         1.809e+01  2.306e+00   7.843 7.16e-14 ***
rooms        3.304e+03  1.661e+03   1.989  0.04758 *  
baths        6.977e+03  2.581e+03   2.703  0.00725 ** 
y81:nearinc -1.418e+04  4.987e+03  -2.843  0.00477 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 19620 on 310 

This produces an estimate on y81*nearinc closer to that without any controls, but it yields a much smaller standard error: the t statistic for $\delta_1$ is about -2.8. Therefore, we find a much more significant effect in this third model than in the first one. The third model estimates are preferred because they control for the most factors and have the smallest standard errors (except in the constant, which is not important here). The fact that nearinc has a much smaller coefficient and is insignificant in the third model indicates that the characteristics included in it largely capture the housing characteristics that are most important for determining housing prices.

It makes more sense to use log( price ) [or log( rprice )] in the analysis in order to get an approximate percentage effect. The model then becomes:

\begin{equation}
log(price) = \beta_0+\delta_0*y81+\beta_1*nearinc+\delta_1*y81*nearinc+others+u
\end{equation}

Now, $100*\delta_1$ the approximate percentage reduction in housing value due to the incinerator. Using log(price) versus log(rprice) only affects the coefficient on y81 .Using the same 321 pooled observations this time with intst, land and area in logarithmic form, we obtain:

In [32]:
library(foreign)
kielmc <- read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/kielmc.dta?raw=true")
logpricereg<-lm(log(price)~y81+nearinc+y81*nearinc+age+I(age^2)+log(intst)+
                            log(land)+log(area)+rooms+baths, data=kielmc)
summary(logpricereg)



Call:
lm(formula = log(price) ~ y81 + nearinc + y81 * nearinc + age + 
    I(age^2) + log(intst) + log(land) + log(area) + rooms + baths, 
    data = kielmc)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.18440 -0.09946  0.01478  0.10984  0.74873 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  7.652e+00  4.159e-01  18.399  < 2e-16 ***
y81          4.260e-01  2.850e-02  14.947  < 2e-16 ***
nearinc      3.224e-02  4.749e-02   0.679 0.497712    
age         -8.359e-03  1.411e-03  -5.924 8.37e-09 ***
I(age^2)     3.763e-05  8.668e-06   4.342 1.92e-05 ***
log(intst)  -6.144e-02  3.151e-02  -1.950 0.052081 .  
log(land)    9.984e-02  2.449e-02   4.077 5.81e-05 ***
log(area)    3.508e-01  5.149e-02   6.813 4.98e-11 ***
rooms        4.733e-02  1.733e-02   2.732 0.006662 ** 
baths        9.428e-02  2.773e-02   3.400 0.000761 ***
y81:nearinc -1.315e-01  5.197e-02  -2.531 0.011884 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '

The coefficient on the interaction term implies that, because of the new incinerator, houses near the incinerator lost about 13.15% in value.

The methodology used in the previous example has numerous applications, especially when the data arise from a natural experiment (or a quasi-experiment ). A natural experiment occurs when some exogenous event&#8212;often a change in government policy&#8212; changes the environment in which individuals, families, firms, or cities operate. A natural experiment always has a control group, which is not affected by the policy change, and a treatment group, which is thought to be affected by the policy change. Unlike a true experiment, in which treatment and control groups are randomly and explicitly chosen, the control and treatment groups in natural experiments arise from the particular policy change. To control for systematic differences between the control and treatment groups, we need two years of data, one before the policy change and one after the change. Thus, our sample is usefully broken down into four groups: the control group before the change, the control group after the change, the treatment group before the change, and the treatment group after the change.

Call C the control group and T the treatment group, letting dT equal unity for those in the treatment group T , and zero otherwise. Then, letting d2 denote a dummy variable for the second (post- policy change) time period, the equation of interest is

\begin{equation}
y=\beta_0+\delta_0*d2+\beta_1dT+\delta_1d2*dT+others
\end{equation}

where y is the outcome variable of interest. $\delta_1$ measures the effect of the policy. Without other factors in the regression, $\delta_1$ will be the difference-in-differences estimator:

### Wooldridge. Example 13.4

Meyer, Viscusi, and Durbin (1995) (hereafter, MVD) studied the length of time (in weeks) that an injured worker receives workers&#8217; compensation. On July 15, 1980, Kentucky raised the cap on weekly earnings that were covered by workers&#8217; compensation. An increase in the cap has no effect on the benefit for low-income workers, but it makes it less costly for a high-income worker to stay on work- ers&#8217; compensation. Therefore, the control group is low-income workers, and the treatment group is high-income workers; high-income workers are defined as those who were subject to the pre-policy change cap. Using random samples both before and after the policy change, MVD were able to test whether more generous workers&#8217; compensation causes people to stay out of work longer (everything else fixed). They started with a difference-in-differences analysis, using log( durat ) as the dependent variable. Let afchnge be the dummy variable for observations after the policy change and highearn the dummy variable for high earners. Using the data in INJURY, the estimated equation, with stand- ard errors in parentheses, is

In [39]:
library(foreign)
injury <- read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/injury.dta?raw=true")
injuryres <- lm(log(durat) ~ afchnge+highearn+afchnge*highearn, data=injury)
summary(injuryres)


Call:
lm(formula = log(durat) ~ afchnge + highearn + afchnge * highearn, 
    data = injury)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.0128 -0.7214 -0.0171  0.7714  4.0047 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)       1.19934    0.02711  44.241  < 2e-16 ***
afchnge           0.02364    0.03970   0.595  0.55164    
highearn          0.21520    0.04336   4.963 7.11e-07 ***
afchnge:highearn  0.18835    0.06279   2.999  0.00271 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.298 on 7146 degrees of freedom
Multiple R-squared:  0.01584,	Adjusted R-squared:  0.01543 
F-statistic: 38.34 on 3 and 7146 DF,  p-value: < 2.2e-16


Therefore, $\delta_1=.188 (t=2.99)$, which implies that the average length of time on workers&#8217; compensation for high earners increased by about 18.8% due to the increased earnings cap. The coefficient on afchnge is small and statistically insignificant: as is expected, the increase in the earnings cap has no effect on duration for low-income workers.

This is a good example of how we can get a fairly precise estimate of the effect of a policy change even though we cannot explain much of the variation in the dependent variable. The dummy variables explain only 1.5% of the variation in log(durat). This makes sense: there are clearly many factors, including severity of the injury, that affect how long someone receives workers' compensation. Fortunately, we have a very large sample size, and this allows us to get a significant t statistic.

## 13-3 Two-Period Panel Data Analysis

We now turn to the analysis of the simplest kind of panel data: for a cross section of individuals, schools, firms, cities, or whatever, we have two years of data; call these t=1 and t=2. These years need not be adjacent, but t=1 corresponds to the earlier year. For example, the file CRIME2 contains data on (among other things) crime and unemployment rates for 46 cities for 1982 and 1987. Therefore, t=1 corresponds to 1982, and t=2 corresponds to 1987.

In [None]:
install.packages("plm")

In [6]:
library(foreign);library(plm)
crime2 <- read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/crime2.dta?raw=true")

# Define panel data frame
crime2.p <- pdata.frame(crime2, index=46 )

# Panel dimensions:
pdim(crime2.p)

# Observation 1-6: new "id" and "time" and some other variables:
crime2.p[1:10,c("id","time","year","pop","crimes","crmrte","unem")]


Balanced Panel: n = 46, T = 2, N = 92

Unnamed: 0,id,time,year,pop,crimes,crmrte,unem
1-1,1,1,82,229528,17136,74.65756,8.2
1-2,1,2,87,246815,17306,70.11729,3.7
2-1,2,1,82,814054,75654,92.93487,8.1
2-2,2,2,87,933177,83960,89.97221,5.4
3-1,3,1,82,374974,31352,83.61113,9.0
3-2,3,2,87,406297,31364,77.19476,5.9
4-1,4,1,82,176496,15698,88.94253,12.6
4-2,4,2,87,201723,16953,84.04099,5.7
5-1,5,1,82,288446,31202,108.1728,12.6
5-2,5,2,87,331728,34355,103.5638,7.4


An alternative way to use panel data is to view the unobserved factors affecting the dependent variable as consisting of two types:
those that are constant and those that vary over time. 
Letting i denote the cross-sectional unit and t the time period, 
we can write a model with a single observed explanatory variable as

\begin{equation}
y_{it}=\beta_0+\delta_0*d2_t+\beta_1x_{it}+a_i+u_{it}, t=1,2 \tag{13.13}
\end{equation}

In the notation $y_{it}$ , i denotes the person, firm, city, and so on, and t denotes the time period. The variable $d2_t$ t is a dummy variable that equals zero when t=1 and one when t=2; it does not change across i , which is why it has no i subscript. Therefore, the intercept for t=1 is $\beta_0$ , and the intercept for t=2 is $\beta_0+\delta_0$ . Just as in using independently pooled cross sections, allowing the intercept to change over time is important in most applications. In the crime example, secular trends in the United States will cause crime rates in all U.S. cities to change, perhaps markedly, over a five-year period.

The variable $a_i$ captures all unobserved, time-constant factors that affect $y_{it}$ . (The fact that $a_i$ has no t subscript tells us that it does not change over time.) Generically, $a_i$ is called an unobserved effect . It is also common in applied work to find $a_i$ referred to as a fixed effect , which helps us to remember that a i is fixed over time. The model in (13.13) is called an unobserved effects model or a fixed effects model . In applications, you might see $a_i$ referred to as unobserved heterogeneity as well (or individual heterogeneity , firm heterogeneity , city heterogeneity , and so on). The error $u_{it}$ is often called the idiosyncratic error or time-varying error, because it represents unobserved factors that change over time and affect $y_{it}$ . These are very much like the errors in a straight time series regression equation.

A simple unobserved effects model for city crime rates for 1982 and 1987 is

\begin{equation}
crmrte_{it}=\beta_0+\delta_0*d87_t+\beta_1unem_{it}+a_i+u_{it}, t=1,2 \tag{13.13}
\end{equation}

where d87 is a dummy variable for 1987. Since i denotes different cities, we call $a_i$ an unobserved city effect or a city fixed effect: it represents all factors affecting city crime rates that do not change over time. Geographical features, such as the city's location in the United States, are included in $a_i$ . Many other factors may not be exactly constant, but they might be roughly constant over a five-year period. These might include certain demographic features of the population (age, race, and education). Different cities may have their own methods for reporting crimes, and the people living in the cities might have different attitudes toward crime; these are typically slow to change. For historical reasons, cities can have very different crime rates, and historical factors are effectively captured by the unobserved effect $a_i$ .

METHODOLOGICAL NOTE: We can not pool the two years and use OLS as this leads to biased estimators (refer to Wooldridge 206, p.413)

In most applications, the main reason for collecting panel data is to allow for the unobserved effect, $a_i$, to be correlated with the explanatory variables. For example, in the crime equation, we want to allow the unmeasured city factors in $a_i$ that affect the crime rate also to be correlated with the unemployment rate. It turns out that this is simple to allow: because $a_i$ is constant over time, we can difference the data across the two years. More precisely, for a cross-sectional observation i , write the two years as

\begin{equation}
y_{i2}=(\beta_0+\delta_0)+\beta_1x_{i2}+a_i+u_{i2}, t=2
\end{equation}
\begin{equation}
y_{i1}=\beta_0+\beta_1x_{i1}+a_i+u_{i1}, t=1
\end{equation}

If we subtract the second equation from the first , we obtain

\begin{equation}
(y_{i2}-y_{i1})=\delta_0+\beta_1(x_{i2}-x_{i1})+(u_{i2}-u_{i1})
\end{equation}

or

\begin{equation}
\Delta y_i=\delta_0+\beta_1 \Delta x_i + \Delta u_i \tag{13.17}
\end{equation}

where $\Delta$ denotes the change from t=1 to t=2. The unobserved effect, $a_i$ , does not appear in (13.17): it has been "differenced away". Also, the intercept in (13.17) is actually the change in the intercept from t=1 to t=2.

Equation (13.17), which we call the first-differenced equation , is very simple. It is just a single cross-sectional equation, but each variable is differenced over time. We can analyze (13.17) using the methods we developed in Part 1, provided the key assumptions are satisfied. The most important of these is that $\Delta u_i$ is uncorrelated with $\Delta x_i$ . This assumption holds if the idiosyncratic error at each time t , $u_it$ , is uncorrelated with the explanatory variable in both time periods.

We allow $x_{it}$ to be correlated with unobservables that are constant over time. When we obtain the OLS estimator of b 1 from (13.17), we call the resulting estimator the first-differenced estimator

In the crime example, assuming that $\Delta u_i$ and $\Delta unem_i$ are uncorrelated may be reasonable, but it can also fail. For example, suppose that law enforcement effort (which is in the idiosyncratic error) increases more in cities where the unemployment rate decreases. This can cause negative correlation between $\Delta u_i$ and $\Delta unem_i$ , which would then lead to bias in the OLS estimator. Naturally, this problem can be overcome to some extent by including more factors in the equation, something we will cover later. As usual, it is always possible that we have not accounted for enough time-varying factors.

Another crucial condition is that $\Delta x_i$ must have some variation across i . 
This qualification fails if the explanatory variable does not change over time for any cross-sectional observation, 
or if it changes by the same amount for every observation. 
This is not an issue in the crime rate example because the unemployment rate changes across time for almost all cities. 
But, if i denotes an individual and $x_{it}$ is a dummy variable for gender, $\Delta x_i=0$ for all i ; 
we clearly cannot estimate (13.17) by OLS in this case.

The only other assumption we need to apply to the usual OLS statistics is that (13.17) satisfies the homoskedasticity assumption. This is reasonable in many cases, and, if it does not hold, we know how to test and correct for heteroskedasticity using the methods in Chapter 8. It is sometimes fair to assume that (13.17) fulfills all of the classical linear model assumptions. The OLS estimators are unbiased and all statistical inference is exact in such cases.

When we estimate (13.17) for the crime rate example, we get

In [16]:
library(foreign);library(plm); library(lmtest)
crime2 <- read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/crime2.dta?raw=true")

crime2.p <- pdata.frame(crime2, index=46 )

# manually calculate first differences:
crime2.p$dcrmrte <- diff(crime2.p$crmrte)
crime2.p$dunem   <- diff(crime2.p$unem)

# Display selected variables for observations 1-6:
crime2.p[1:6,c("id","time","year","crmrte","dcrmrte","unem","dunem")]

# Estimate FD model with lm on differenced data:
summary( lm(dcrmrte~dunem, data=crime2.p) )

Unnamed: 0,id,time,year,crmrte,dcrmrte,unem,dunem
1-1,1,1,82,74.65756,,8.2,
1-2,1,2,87,70.11729,-4.540276,3.7,-4.5
2-1,2,1,82,92.93487,,8.1,
2-2,2,2,87,89.97221,-2.962654,5.4,-2.7
3-1,3,1,82,83.61113,,9.0,
3-2,3,2,87,77.19476,-6.416367,5.9,-3.1



Call:
lm(formula = dcrmrte ~ dunem, data = crime2.p)

Residuals:
    Min      1Q  Median      3Q     Max 
-36.912 -13.369  -5.507  12.446  52.915 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  15.4022     4.7021   3.276  0.00206 **
dunem         2.2180     0.8779   2.527  0.01519 * 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 20.05 on 44 degrees of freedom
  (46 observations deleted due to missingness)
Multiple R-squared:  0.1267,	Adjusted R-squared:  0.1069 
F-statistic: 6.384 on 1 and 44 DF,  p-value: 0.01519


which now gives a positive, statistically significant relationship between the crime and unemployment rates. Thus, differencing to eliminate time-constant effects makes a big difference in this example. The intercept in the previous model also reveals something interesting. Even if $\Delta unem =0$, we predict an increase in the crime rate (crimes per 1,000 people) of 15.40. This reflects a secular increase in crime rates throughout the United States from 1982 to 1987.

Although differencing two years of panel data is a powerful way to control for unobserved effects, it is not without cost. First, panel data sets are harder to collect than a single cross section, especially for individuals. We must use a survey and keep track of the individual for a follow-up sur- vey. It is often difficult to locate some people for a second survey. For units such as firms, some will go bankrupt or merge with other firms. Panel data are much easier to obtain for schools, cities, coun- ties, states, and countries.

Even if we have collected a panel data set, the differencing used to eliminate $a_i$ can greatly reduce the variation in the explanatory variables. 
While $x_{it}$ frequently has substantial variation in the cross section for each t , $\Delta x_i$ may not have much variation. 
We know from Chapter 3 that a little variation in $\Delta x_i$ can lead to a large standard error for $\beta_1$ when estimating (13.17) by OLS. We can combat this by using a large cross section, but this is not always possible. Also, using longer differences over time is sometimes better than using year-to-year changes.

NOTE: Another alternative to compute the previous model is to use the plm library instead as follows:

In [33]:
library(foreign);library(plm); library(lmtest)
crime2 <- read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/crime2.dta?raw=true")

crime2.p <- pdata.frame(crime2, index=46 )



# Estimate FD model with plm on original data:
summary( plm(crmrte~unem, data=crime2.p, model="fd") )



"NaNs produced"

Oneway (individual) effect First-Difference Model

Call:
plm(formula = crmrte ~ unem, data = crime2.p, model = "fd")

Balanced Panel: n = 46, T = 2, N = 92
Observations used in estimation: 46

Residuals:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -28.50   -8.15    1.33    6.09   17.90   65.20 

Coefficients:
      Estimate Std. Error t-value Pr(>|t|)
unem -0.018096   0.608684 -0.0297   0.9764

Total Sum of Squares:    20256
Residual Sum of Squares: 22003
R-Squared:      0.1267
Adj. R-Squared: 0.1267
F-statistic: -Inf on 0 and 45 DF, p-value: NA

## 13-5 Differencing with More than Two Time Periods

We can also use differencing with more than two time periods. For illustration, suppose we have N individuals and T=3 time periods for each individual. A general fixed model is

\begin{equation}
y_{it}=\delta_1+\delta_2*d2_t+\delta_3*d3_t+\beta_1x_{it1}+\ldots+\beta_kx_{itk}+a_i+u_{it}, t=1,2,3 \tag{13.28}
\end{equation}

The total number of observations being therefore 3N. It is a good idea to allow a separate intercept for each time period, especially when we have a small number of them. The base period, as always is t=1. The intercept for the 
    second time period is $\delta_1+\delta_2$ and so on. We are primarily interested in $\beta_1,\ldots,\beta_k$. If the unobserved effect $a_i$ is correlated with any of the explanatory variables, then using pooled OLS on the three years of data results in biased and inconsistent estimates.
   

 The key assumption is that the idiosyncratic errors are uncorrelated with the explanatory variable in each time period, that is the explanatory variables are strictly exogenous after we take out the unobserved effect, $a_i$. If we have ommitted an important time-varying variable, then $u_i$ is correlated with $x_{itk}$. Chapters 15 and 16 will discuss what can be done in such cases.

In [None]:
If $a_i$ is correlated with $x_{itj}$, then $x_{itj}$ will be correlated with the composite error, $v_{it}=a_i+u_{it}$. 
We can eliminate $a_i$ by differencing adjacent periods. in the T=3 case, we substract time period one from time period two 
and time period two from time period three. This gives

\begin{equation}
\Delta y_{it}=\delta_2*\Delta d2_t+\delta_3*\Delta d3_t+\beta_1\Delta x_{it1}+\ldots+\beta_k\Delta x_{itk}+a_i+\Delta u_{it},  \tag{13.30}
\end{equation}

for t=2 and 3. We do not have a differenced equation for t=1 because there is nothing to substract from the t=1 equation. Now 13.30 represents two time periods for each individual in the sample. If this equation satisfies the classical linear model assumptions, then pooled OLS gives unbiased estimators and the usual t and F statistics are valid for the hypothesis. the important requirement for OLS to be consistent is that $\Delta u_{it}$ is uncorrelated with $\Delta x_{itj}$ for all j and t=2,3.

Unless the time intercepts in the original time are of direct interest it is better to estimate the first-differenced equation with an intercept and a single time-period dummy, usually for the third period. In other words, the equation becomes

\begin{equation}
\Delta y_{it}=\alpha_0+\alpha_3*d3_t+\beta_1\Delta x_{it1}+\ldots+\beta_k\Delta x_{itk}+\Delta u_{it},  
\end{equation}

If we have the same T time periods for each of N cross-sectional units, we say that the data set is a balanced panel: we have the same time periods for all individuals, firms, cities and so on. When T is small relative to N, we should include a dummy variable for each time period to account for secular changes that are not being modelled. Therefore, after first differencing, the equation looks like

\begin{equation}
\Delta y_{it}=\alpha_0+\alpha_3*d3_t+\alpha_4d4_t+\ldots+\alpha_TdT_t+\beta_1\Delta x_{it1}+\ldots+\beta_k\Delta x_{itk}+\Delta u_{it},  \tag{13.31}
\end{equation}

where we have $T-1$ time periods on each unit i for the first-differenced equation. The total number of observations is $N(T-1)$. Equation (13.31) can be estimated by pooled OLS assuming $\Deltau_{it}$ is uncorrelated over time for the usual standard errors and test statistics to be valid.

If $u_{it}$ follows a stable AR(1) model, the $\Delta u_{it}$ will be serially correlated. Only when $u_{it}$ follows a random walk will $\Delta u_{it}$ be serially uncorrelated. It is easy to test for serial correlation in the first-differenced equation, refer to examples following. 

We can correct for the presence of AR(1) serial correlation by using feasible GLS. Essentially, within each cross-sectional observation, we would use the Prais-Winsten transformation.If there is no serial correlation in the errors, the usual methods for dealing with heteroskedasticity are valid, refer to chapter 8.

### Wooldridge. Example 13.9

Cornwell and Trumbull (1994) used data on 90 counties in North Carolina, for the years 1981 through 1987, to estimate an unobserved effects model of crime; the data are contained in CRIME4. Here, we estimate a simpler version of their model, and we difference the equation over time to eliminate a i , the unobserved effect. (Cornwell and Trumbull use a different transformation, which we will cover in Chapter 14.) Various factors including geographical location, attitudes toward crime, historical records, and reporting conventions might be contained in a i . The crime rate is number of crimes per person, prbarr is the estimated probability of arrest, prbconv is the estimated probability of convic- tion (given an arrest), prbpris is the probability of serving time in prison (given a conviction), avgsen is the average sentence length served, and polpc is the number of police officers per capita. As is standard in criminometric studies, we use the logs of all variables to estimate elasticities. We also include a full set of year dummies to control for state trends in crime rates. We can use the years 1982 through 1987 to estimate the differenced equation.

In [9]:
install.packages("plm")
library(foreign);library(plm);library(lmtest)

Installing package into '/home/nbuser/R'
(as 'lib' is unspecified)
also installing the dependency 'bdsmatrix'

Loading required package: Formula
"URL 'https://github.com/thousandoaks/Wooldridge/blob/master/crime4.dta?raw=true': status was '404 Not Found'"

ERROR: Error in download.file(file, tmp, quiet = TRUE, mode = "wb"): cannot download all files


In [27]:

crime4<-read.dta("https://github.com/thousandoaks/Wooldridge/blob/master/crime4.dta?raw=true")

crime4.p <- pdata.frame(crime4, index=c("county","year") )
pdim(crime4.p)

head(crime4.p, n=10)

Balanced Panel: n = 90, T = 7, N = 630

Unnamed: 0,county,year,crmrte,prbarr,prbconv,prbpris,avgsen,polpc,density,taxpc,...,lpctymle,lpctmin,clcrmrte,clprbarr,clprbcon,clprbpri,clavgsen,clpolpc,cltaxpc,clmix
1-81,1,81,0.0398849,0.289696,0.402062,0.472222,5.61,0.0017868,2.307159,25.69763,...,-2.43387,3.006608,,,,,,,,
1-82,1,82,0.0383449,0.338111,0.433005,0.506993,5.59,0.0017666,2.330254,24.87425,...,-2.449038,3.006608,-0.0393763,0.1545422,0.074143,0.071048,-0.0035714,-0.011364,-0.0325654,0.0308573
1-83,1,83,0.0303048,0.330449,0.525703,0.479705,5.8,0.0018358,2.341801,26.45144,...,-2.464036,3.006608,-0.2353156,-0.022922,0.1939871,-0.0553258,0.0368786,0.038413,0.0614774,-0.2447317
1-84,1,84,0.0347259,0.362525,0.604706,0.520104,6.89,0.0018859,2.34642,26.84235,...,-2.478925,3.006608,0.1361797,0.0926411,0.1400059,0.0808574,0.1722132,0.0269303,0.0146701,-0.0273306
1-85,1,85,0.036573,0.325395,0.578723,0.497059,6.55,0.0019244,2.364896,28.14034,...,-2.497306,3.006608,0.0518246,-0.1080536,-0.0439184,-0.04532,-0.050606,0.0201988,0.0472231,0.1721251
1-86,1,86,0.0347524,0.326062,0.512324,0.439863,6.9,0.0018952,2.385681,29.74098,...,-2.524721,3.006608,-0.0510616,0.0020478,-0.1218668,-0.1222454,0.0520563,-0.0152583,0.0553219,0.0427649
1-87,1,87,0.0356036,0.29827,0.527596,0.43617,6.71,0.0018279,2.422633,30.99368,...,-2.552702,3.006608,0.0241981,-0.0890886,0.0293736,-0.0084312,-0.0279225,-0.0361891,0.0412574,-0.1938994
3-81,3,81,0.0163921,0.202899,0.869048,0.465753,8.45,0.0005939,0.976834,14.56088,...,-2.441794,2.068926,,,,,,,,
3-82,3,82,0.0190651,0.162218,0.772152,0.377049,5.71,0.0007047,0.992278,35.64073,...,-2.447933,2.068926,0.1510599,-0.2237672,-0.1182169,-0.2112803,-0.3919475,0.1709847,0.8951507,-0.1707754
3-83,3,83,0.0151492,0.181586,1.02817,0.438356,8.69,0.0006587,1.003861,19.26188,...,-2.454076,2.068926,-0.2299116,0.1127882,0.2863544,0.1506562,0.4199538,-0.067522,-0.615361,0.2312407


In [32]:
# Estimate FD model:
reg<-(plm(log(crmrte)~d83+d84+d85+d86+d87+lprbarr+lprbconv+lprbpris+lavgsen+lpolpc,data=crime4.p, model="fd"))
coeftest(reg)


t test of coefficients:

          Estimate Std. Error  t value  Pr(>|t|)    
d83      -0.092014   0.016405  -5.6089 3.282e-08 ***
d84      -0.132355   0.023155  -5.7161 1.821e-08 ***
d85      -0.129328   0.028332  -4.5647 6.223e-06 ***
d86      -0.093869   0.032647  -2.8753  0.004199 ** 
d87      -0.044943   0.036685  -1.2251  0.221072    
lprbarr  -0.326327   0.029846 -10.9336 < 2.2e-16 ***
lprbconv -0.237521   0.018174 -13.0690 < 2.2e-16 ***
lprbpris -0.164511   0.025923  -6.3462 4.740e-10 ***
lavgsen  -0.024691   0.021103  -1.1700  0.242526    
lpolpc    0.397686   0.026812  14.8324 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


The three probability variables - of arrest, conviction, and serving prison time- all have the expected sign and all are statistically significant. For example a 1% increase in the probability of  arrest is predicted to lower the crime rata by about 0.33%. The average sentence variable shows a modest deterrent effect, but is not statistically significant.

The coefficient on the police per capita variable is somewhat surprising and is a feature of most studies that seek to explain crime rates. Interpreted causally, it says that a 1% increase in police per capita increases crime rates by about .4%. (The usual t statistic is very large, almost 15.) It is hard to believe that having more police officers causes more crime. What is going on here? There are at least two possibilities. First, the crime rate variable is calculated from reported crimes. It might be that, when there are additional police, more crimes are reported. In this case, the model cannot be interpreted in a causal fashion. In Chapters 15 and 16, we will cover models and estimation methods that can account for this additional form of endogeneity.

Applying the White test for heteroskedasticity (Section 8-3) gives F=75.48 and p-value=.0000 so there is strong evidence of heteroskedasticity. Testing for AR(1) serial correlation yields $\hat\rho=-.233, t=-4.77$, so negative correlation exists.

In the case of error terms which are serially correlated and/or heteroskedastic there are formulas for the variance-covariance matrix for panel data that are robust with respect to heteroskedasticity and arbitrary correlation of the error term within a cross sectional unit (or "cluster").

These clustered standard errors can be computed with the command vcovHC from the package plm, a illustrated as follows:

In [31]:
coeftest(reg,vcovHC)


t test of coefficients:

          Estimate Std. Error t value  Pr(>|t|)    
d83      -0.092014   0.014432 -6.3758 3.964e-10 ***
d84      -0.132355   0.017726 -7.4668 3.401e-13 ***
d85      -0.129328   0.022948 -5.6357 2.836e-08 ***
d86      -0.093869   0.020665 -4.5425 6.890e-06 ***
d87      -0.044943   0.023324 -1.9269 0.0545289 .  
lprbarr  -0.326327   0.055400 -5.8904 6.849e-09 ***
lprbconv -0.237521   0.038712 -6.1355 1.663e-09 ***
lprbpris -0.164511   0.045073 -3.6498 0.0002884 ***
lavgsen  -0.024691   0.024924 -0.9906 0.3223138    
lpolpc    0.397686   0.101062  3.9351 9.426e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


This time the standard errors adjust for serial correlation and heteroskedasticity. No variables loses statistical significance, but the t statistics on the significant variables get notably smaller.
Note that the confidence intervals in the case of robust standard errors are much wider that those based on the usual OLS standard errors.
    

## 13-5a Potential Pitfalls in First Differencing Panel Data

In this and previous sections, we have argued that differencing panel data over time, in order to eliminate a time-constant unobserved effect, is a valuable method for obtaining causal effects. Nevertheless, differencing is not free of difficulties. We have already discussed potential problems with the method when the key explanatory variables do not vary much over time (and the method is useless for explanatory variables that never vary over time). Unfortunately, even when we do have sufficient time variation in the $x_{itj}$, first-differenced (FD) estimation can be subject to serious biases. We have already mentioned that strict exogeneity of the regressors is a critical assumption. Unfortunately, as discussed in Wooldridge (2010, Section 11-1), having more time periods generally does not reduce the inconsistency in the FD estimator when the regressors are not strictly exogenous.

Another important drawback to the FD estimator is that it can be worse than pooled OLS if one or more of the explanatory variables is subject to measurement error, especially the classical errors- in-variables model discussed in Section 9.3. Differencing a poorly measured regressor reduces its variation relative to its correlation with the differenced error caused by classical measurement error, resulting in a potentially sizable bias. Solving such problems can be very difficult. See Section 15-8 and Wooldridge (2010, Chapter 11).