# Extra: Frisch-Waugh-Lovell Theorem

**For the purposes of ECON-398, you do not need to know this. This is for those who would like a slightly deeper understanding of how Linear Regression works**

Recall from the notes that multiple regression models can in general be decomposed to a series of simpler regression models. This is the result of the Frisch-Waugh-Lovell Theorem, which shows us exactly what kinds of variation in the data goes into estimating each parameter. In practical terms, you verify that this works as described by running the models as suggested by the FWL Theorem.

## Load Data

Just like the last lesson on summary statistics, I am going to use simulated data in this lesson. In particular, certain concepts are easier to demonstrate when the data is synthetic, so that the true Data Generating Process (DGP) is known.

In [1]:
library(tidyverse)
library(stargazer)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.6     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.8     [32m✔[39m [34mdplyr  [39m 1.0.9
[32m✔[39m [34mtidyr  [39m 1.2.0     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 2.1.2     [32m✔[39m [34mforcats[39m 0.5.1

“package ‘ggplot2’ was built under R version 4.1.3”
“package ‘tidyr’ was built under R version 4.1.2”
“package ‘readr’ was built under R version 4.1.2”
“package ‘dplyr’ was built under R version 4.1.3”
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

“package ‘stargazer’ was built under R version 4.1.2”

Please cite as: 


 Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statisti

### Generating Correlated Random Variables

The base of synthetic data starts with random variables generated with a particular desired covariance structure. The simplest case uses a multivariate Normal distribution due to its nice joint and marginal distribution properties. There are tools available to generate correlated random variables with other distributions, but those are beyond the scope of this course.

For this lesson, it is not necessary for variables to have specific meanings. I will assign the variables names corresponding to letters we use when teaching the theory in the course.

In [2]:
set.seed(998)
sim.data = matrix(rnorm(25000, 0, 1), 5000, 5, FALSE)
sim.data <- sim.data %*% chol(matrix(
    c(1, 0, .5, .25, 0,
     0, 1, .6, .1, 0,
     .5, .6, 1, .4, 0,
     .25, .1, .4, 1, 0,
     0, 0, 0, 0, 1),
    5, 5, TRUE))
colnames(sim.data) <- c('D', 'X', 'W', 'Z', 'e')
sim.data <- as_tibble(sim.data)

### Modifying the Random Variables to Desired Types

For our purposes here, we require one binary variable, one categorical variable with more than $2$ categories, and one continuous variable. To adhere as closely to course terminology as possible, `D` will be the binary variable, `W` the categorical variable, and `X` and `Z` continuous variables. 

In [3]:
sim.data <- sim.data %>%
    mutate(
        D = if_else(D > 0, 1, 0),
        W = as_factor(ntile(W, 5))
    )

### Generating the Outcome Variable

I manually define the outcome variable so that I know the true parameters of the CEF of the outcome variable. Keep this in mind as we look at regression outputs later. Notice that despite generating $4$ variables I use only $3$ variables to define the CEF. This is not a mistake and in a future lesson the purpose of `Z` will be made clear.

In [4]:
sim.data <- sim.data %>%
    mutate(
        Y = .25 + 3 * D + .5 * X + .4 * (W == 2) + .7 * (W == 3) + 1.3 * (W == 4) + 1.9 * (W == 5) + e
    )

### Summary Statistics

I report a summary of the data here. I could also use `summarize` or `stargazer`, but this is sufficient for my purposes since this notebook is about teaching how linear regression works in practice and not how to execute a full data analysis pipeline.

In [5]:
summary(sim.data)

       D                X             W              Z            
 Min.   :0.0000   Min.   :-3.688439   1:1000   Min.   :-3.703643  
 1st Qu.:0.0000   1st Qu.:-0.716236   2:1000   1st Qu.:-0.686461  
 Median :1.0000   Median : 0.009939   3:1000   Median : 0.009768  
 Mean   :0.5054   Mean   :-0.007287   4:1000   Mean   :-0.003324  
 3rd Qu.:1.0000   3rd Qu.: 0.685444   5:1000   3rd Qu.: 0.681739  
 Max.   :1.0000   Max.   : 3.675470            Max.   : 3.542278  
       e                   Y          
 Min.   :-3.517587   Min.   :-3.4588  
 1st Qu.:-0.690418   1st Qu.: 0.7849  
 Median : 0.006281   Median : 2.5836  
 Mean   :-0.012897   Mean   : 2.6097  
 3rd Qu.: 0.671192   3rd Qu.: 4.4280  
 Max.   : 3.381951   Max.   : 8.6636  

## The Theorem in Practice

Suppose I want to estimate

$$
    Y_{i} = \alpha + \delta D_{i} + \beta X_{i} + e_{i}.
$$

If I am interested in only $\beta$, the FWL Theorem says that I can decompose this procedure into three steps:

1. First,
$$
    Y_{i} = \zeta_{0} + \zeta_{1} D_{i} + U_{i}.
$$
2. Second,
$$
    D_{i} = \eta_{0} + \eta_{1} D_{i} + V_{i}.
$$
3. Third,
$$
    U_{i} = \gamma + \beta V_{i} + \xi_{i}.
$$

In [35]:
sim.data <- sim.data %>%
    mutate(
        U = resid(lm(Y ~ D, data = .)),
        V = resid(lm(X ~ D, data = .))
    )

In [36]:
stargazer(
    sim.data %>%
        lm(Y ~ D + X, data = .),
    sim.data %>%
        lm(U ~ V, data = .),
    type = 'text', df = FALSE)


                        Dependent variable:     
                    ----------------------------
                          Y              U      
                         (1)            (2)     
------------------------------------------------
D                      3.544***                 
                       (0.032)                  
                                                
X                      0.854***                 
                       (0.015)                  
                                                
V                                    0.854***   
                                      (0.015)   
                                                
Constant               0.825***       -0.000    
                       (0.023)        (0.016)   
                                                
------------------------------------------------
Observations            5,000          5,000    
R2                      0.755          0.382    
Adjusted R2        

### What does FWL tell us?

The key lesson that FWL teaches is that in a linear regression, the coefficient on each variable depends on the correlation in the residual variation of that variable and the outcome variable that are not already explained by other variables. This is what is known as identifying variation in statistics: what features of your data actually lead to the results that you see?