# OLS Regression

### Zhentao Shi

This is the first time that I write Python code by myself.

We demonstrate the OLS estimator and its algebraic properties.

In [5]:
## Generating data

import numpy as np
from scipy import stats

n = 20 # sample size 
K = 3  # number of paramters

b0 = np.mat( [[0.5], [1.0], [-1.0], [1.0]] ) # the true coefficient

X = np.hstack( ( np.ones( (n,1) ), stats.norm.rvs(  size = (n, K))) )  # the regressor matrix
e = stats.norm.rvs( size = (n,1) ) # the error term

Y = X @ b0 + e # generate the dependent variable

After the data generation, we obtain an $n\times 1$ vector of $Y$ and an $n\times K$ vector of $X$. Since the random generator seed is unspecified, the generated random variables are different every time we run the code.

In [6]:
bhat = np.linalg.solve( X.T @ X, X.T @ Y)
print(bhat)

[[ 0.11749363]
 [ 0.7619638 ]
 [-1.00237138]
 [ 0.35044278]]


Calculate the estimate as $\hat{\beta} = (X' X)^{-1} X'Y$.

The regression residual is defined as $\hat{e} = Y - X' \hat{\beta}$. Verify $X'\hat{e} = 0$.

In [10]:
ehat = Y - X @ bhat
print( X.T @ ehat )

[[ 1.77635684e-15]
 [-4.44089210e-15]
 [ 3.33066907e-15]
 [ 4.44089210e-16]]


## Linear Algebraic Properties

Define $P_X$ and $M_X$, and show $\hat{e} = M_X Y = M_X  e$.

In [13]:
PX = X @ np.linalg.solve( X.T @ X, X.T )
MX = np.diag( np.ones(n) ) - PX
print( np.hstack( ( ehat, MX @ Y, MX @ e ) ) )

[[-1.77482351 -1.77482351 -1.77482351]
 [-0.70161898 -0.70161898 -0.70161898]
 [ 0.66243152  0.66243152  0.66243152]
 [ 0.65106129  0.65106129  0.65106129]
 [ 1.13475278  1.13475278  1.13475278]
 [ 1.11228791  1.11228791  1.11228791]
 [-0.66888979 -0.66888979 -0.66888979]
 [-0.36795563 -0.36795563 -0.36795563]
 [-0.14861854 -0.14861854 -0.14861854]
 [-1.06233629 -1.06233629 -1.06233629]
 [ 0.19773133  0.19773133  0.19773133]
 [-0.46704015 -0.46704015 -0.46704015]
 [ 0.56069116  0.56069116  0.56069116]
 [ 0.33700027  0.33700027  0.33700027]
 [-1.60557983 -1.60557983 -1.60557983]
 [-0.69322679 -0.69322679 -0.69322679]
 [ 0.53439743  0.53439743  0.53439743]
 [ 0.28791348  0.28791348  0.28791348]
 [ 0.4938477   0.4938477   0.4938477 ]
 [ 1.51797465  1.51797465  1.51797465]]


## FWL Theorem

In [28]:
X1 = X[:,(0,1) ]
PX1 = X1 @ np.linalg.solve( X1.T @ X1, X1.T )
MX1 = np.diag( np.ones(n) ) - PX1
X2 = X[:,(2,3)]
bhat12 =  np.linalg.solve( X2.T @ MX1 @ X2, X2.T @ MX1 @ Y )
print(bhat12)

[[-1.00237138]
 [ 0.35044278]]


$(\hat{\beta}_3, \hat{\beta}_4)$ is the same as the counterpart in $\hat{\beta}$.

In [27]:
# the residuls after purging out X1 is the same as that from the full regression
ehat12 = MX1 @ Y - MX1 @ X2 @ bhat12 
print( np.hstack( (ehat, ehat12)) )

[[-1.77482351 -1.77482351]
 [-0.70161898 -0.70161898]
 [ 0.66243152  0.66243152]
 [ 0.65106129  0.65106129]
 [ 1.13475278  1.13475278]
 [ 1.11228791  1.11228791]
 [-0.66888979 -0.66888979]
 [-0.36795563 -0.36795563]
 [-0.14861854 -0.14861854]
 [-1.06233629 -1.06233629]
 [ 0.19773133  0.19773133]
 [-0.46704015 -0.46704015]
 [ 0.56069116  0.56069116]
 [ 0.33700027  0.33700027]
 [-1.60557983 -1.60557983]
 [-0.69322679 -0.69322679]
 [ 0.53439743  0.53439743]
 [ 0.28791348  0.28791348]
 [ 0.4938477   0.4938477 ]
 [ 1.51797465  1.51797465]]
