# OLS Regression

### Zhentao Shi

We demonstrate the OLS estimator and its algebraic properties.

In [3]:
## Generating data

using Distributions

n = 20 # sample size 
K = 4  # number of paramters

b0 = [0.5; 1.0; -1.0; 1] # the true coefficient

X = hcat( ones(n,1), rand( Normal(), n, K-1) )  # the regressor matrix
e = rand(Normal(), n, 1 ) # the error term

Y = X * b0 + e # generate the dependent variable

20×1 Array{Float64,2}:
  4.4009   
 -0.574694 
  0.323081 
  0.340809 
  0.543161 
  2.88334  
  3.33813  
  1.23503  
 -3.79042  
  0.0961641
  1.39752  
  0.0240427
  1.37341  
  0.315935 
 -1.5131   
  0.122445 
 -1.24801  
  3.78353  
  1.12203  
  0.471744 

After the data generation, we obtain an $n\times 1$ vector of $Y$ and an $n\times K$ vector of $X$. Since the random generator seed is unspecified, the generated random variables are different every time we run the code.

In [4]:
bhat = (X' * X) \ (X' * Y)

4×1 Array{Float64,2}:
  0.573196
  1.2254  
 -0.477072
  1.19246 

Calculate the estimate as $\hat{\beta} = (X' X)^{-1} X'Y$.

The regression residual is defined as $\hat{e} = Y - X' \hat{\beta}$. Verify $X'\hat{e} = 0$.

In [5]:
ehat = Y - X * bhat
println( X' * ehat )

[1.11022e-15; -3.88578e-16; -2.72005e-15; -6.07847e-15]


## Linear Algebraic Properties

Define $P_X$ and $M_X$, and show $\hat{e} = M_X Y = M_X  e$.

In [8]:
PX = X * inv( X' * X) * X'
MX = diagm( ones(n), 0) - PX
hcat( ehat, MX * Y, MX * e ) 

20×3 Array{Float64,2}:
 -0.740503    -0.740503    -0.740503  
 -0.377045    -0.377045    -0.377045  
  0.286604     0.286604     0.286604  
 -0.737935    -0.737935    -0.737935  
 -0.266004    -0.266004    -0.266004  
  2.47314      2.47314      2.47314   
  0.327272     0.327272     0.327272  
  0.311446     0.311446     0.311446  
 -0.579079    -0.579079    -0.579079  
 -0.409734    -0.409734    -0.409734  
  0.723551     0.723551     0.723551  
 -1.12141     -1.12141     -1.12141   
 -0.773775    -0.773775    -0.773775  
  0.322449     0.322449     0.322449  
 -0.00879343  -0.00879343  -0.00879343
  0.0934853    0.0934853    0.0934853 
 -0.749232    -0.749232    -0.749232  
  1.29249      1.29249      1.29249   
 -0.290677    -0.290677    -0.290677  
  0.223748     0.223748     0.223748  

## FWL Theorem

In [12]:
X1 = X[:,1:2 ]
PX1 = X1 *  inv(X1' * X1) * X1'
MX1 = diagm( ones(n), 0 ) - PX1
X2 = X[:, 3:4]
bhat12 =  ( X2' * MX1 * X2) \ (X2' * MX1 * Y )

2×1 Array{Float64,2}:
 -0.477072
  1.19246 

$(\hat{\beta}_3, \hat{\beta}_4)$ is the same as the counterpart in $\hat{\beta}$.

In [14]:
# the residuls after purging out X1 is the same as that from the full regression
ehat12 = MX1 * Y - MX1 * X2 * bhat12 
hcat( ehat, ehat12)

20×2 Array{Float64,2}:
 -0.740503    -0.740503  
 -0.377045    -0.377045  
  0.286604     0.286604  
 -0.737935    -0.737935  
 -0.266004    -0.266004  
  2.47314      2.47314   
  0.327272     0.327272  
  0.311446     0.311446  
 -0.579079    -0.579079  
 -0.409734    -0.409734  
  0.723551     0.723551  
 -1.12141     -1.12141   
 -0.773775    -0.773775  
  0.322449     0.322449  
 -0.00879343  -0.00879343
  0.0934853    0.0934853 
 -0.749232    -0.749232  
  1.29249      1.29249   
 -0.290677    -0.290677  
  0.223748     0.223748  