## Understanding the OLS Summary Table

- In the previous section, we showed theoretically and practically how we can derive a coefficient matrix $\beta$, just from the objective function of minimising the mean squared error (MSE)

- But you should notice something odd about our results. Our matrix algebra gave us only coefficient values

- But the OLS table actually gives us so much more than this! 

- How can we derive every part of the OLS Summary table? Let's find out

In [2]:
import pandas as pd
import numpy as np
from sklearn.datasets import make_regression
# import statsmodels.formula.api as smf
import statsmodels.api as sm

x,y = make_regression(
    n_samples=500, 
    n_features=5, 
    n_informative=2, 
    n_targets=1, 
    noise=5, 
    bias=5,
    random_state=123
)
x = np.append(x, np.ones((500,1)), axis = 1)
print(x.shape)

betas = np.linalg.inv((x.transpose() @ x)) @ x.transpose() @ y
np.set_printoptions(suppress=True)
print(betas)

print('='*50)
res = sm.OLS(exog=x, endog=y, hasconst=True).fit()
res.summary()

(500, 6)
[-0.16521089  0.2381359   0.00976686 60.45175552 26.46640238  4.8924384 ]


0,1,2,3
Dep. Variable:,y,R-squared:,0.994
Model:,OLS,Adj. R-squared:,0.994
Method:,Least Squares,F-statistic:,17750.0
Date:,"Fri, 17 Jan 2025",Prob (F-statistic):,0.0
Time:,16:40:01,Log-Likelihood:,-1508.0
No. Observations:,500,AIC:,3028.0
Df Residuals:,494,BIC:,3053.0
Df Model:,5,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
x1,-0.1652,0.233,-0.710,0.478,-0.622,0.292
x2,0.2381,0.236,1.008,0.314,-0.226,0.702
x3,0.0098,0.222,0.044,0.965,-0.426,0.445
x4,60.4518,0.222,272.722,0.000,60.016,60.887
x5,26.4664,0.227,116.601,0.000,26.020,26.912
const,4.8924,0.223,21.982,0.000,4.455,5.330

0,1,2,3
Omnibus:,1.207,Durbin-Watson:,1.76
Prob(Omnibus):,0.547,Jarque-Bera (JB):,1.202
Skew:,0.028,Prob(JB):,0.548
Kurtosis:,2.766,Cond. No.,1.11


### Coefficient Standard Error

- We already know the main formula in OLS;
    $$\begin{aligned}
        \hat{\beta} &= (X^TX)^{-1} X^Ty
    \end{aligned}$$

- However, every $\hat{\beta}$ also has a corresponding standard error estimate $\sigma_{\hat{\beta}}$, which lets us test how statistically significant the coefficients are. How is this magically derived?
    $$\begin{aligned}
        \sigma_{\hat{\beta}}^2 &= \text{Var}({\hat{\beta}}) \\
        &= \text{Var}((X^TX)^{-1} X^Ty) \\
        &= \text{Var}((X^TX)^{-1} X^T(X\beta + \epsilon)) \\
        &= \text{Var}((X^TX)^{-1} X^TX\beta + (X^TX)^{-1} X^T\epsilon)) \\
        &= \text{Var}(\beta + (X^TX)^{-1} X^T\epsilon)) \\
        &= \text{Var}(\beta) + \text{Var}((X^TX)^{-1} X^T\epsilon)) + 2 \cdot \text{Cov}(\beta, (X^TX)^{-1} X^T \epsilon) \\
        &= \text{Var}((X^TX)^{-1} X^T\epsilon) & \because \text{Var}(\beta) = 0 \\
        &= (X^TX)^{-1} X^T \text{Var}(\epsilon) X (X^TX)^{-1} & \because \text{Var}(A\epsilon) = A \text{Var}(\epsilon) A^T \\
        &= (X^TX)^{-1} X^T \sigma^2 I X (X^TX)^{-1} & \because \text{Var}(\epsilon) = \sigma^2 I \text{ where } \sigma^2 \text{ is a scalar; i.e. } \epsilon \text{ is homoscedastic} \\
        &= \sigma^2 (X^TX)^{-1} X^T I X (X^TX)^{-1} \\  
        &= \sigma^2 (X^TX)^{-1} X^T X (X^TX)^{-1} \\
        &= \sigma^2 (X^TX)^{-1}
    \end{aligned}$$

- Amazingly, there is a closed form for the variance of the coefficients, if we simply assume that the errors are homoscedastic!

In [3]:
yhat = x @ np.array(betas).reshape(-1, 1)
epsilon = y.reshape(-1,1) - yhat
variance_hat = np.sum(epsilon**2) / (x.shape[0] - x.shape[1])
var_covar_matrix = variance_hat * np.linalg.inv(x.T @ x)
sigma = np.sqrt(np.diagonal(var_covar_matrix)) ##Same as OLS!
sigma

array([0.23265304, 0.2362958 , 0.22176446, 0.22166033, 0.22698345,
       0.22256677])

### Hypothesis Testing the Coefficients

- Once we have the standard errors, conducting a hypothesis test is trivial

- Compute the t-test statistic $t$ using
    $$\begin{aligned}
        t &= \frac{\beta_i - b}{\hat{\sigma}_{\beta_i}}
    \end{aligned}$$

- Since we're testing for coefficient significance, $b = 0$ typically

In [5]:
t_values = betas / sigma ## Same as OLS!
t_values

array([ -0.71011705,   1.00778727,   0.04404157, 272.7224774 ,
       116.60058235,  21.98189102])

### Compute Significance Values

In [18]:
from scipy.stats import t
twotail_probability_of_observing_value_at_or_greater_than_t = 1 - (t.cdf(np.abs(t_values), df=x.shape[0]) - t.cdf(-np.abs(t_values), df=x.shape[0]))
twotail_probability_of_observing_value_at_or_greater_than_t

array([0.47796261, 0.31404405, 0.96488885, 0.        , 0.        ,
       0.        ])

### Confidence Interval

In [23]:
from scipy.stats import t
lb, ub = t.ppf(0.025, df=x.shape[0]), t.ppf(0.975, df=x.shape[0])
betas + (lb * sigma), betas + (ub * sigma) 

(array([-0.62230892, -0.22611915, -0.42593818, 60.01625506, 26.02044349,
         4.45515706]),
 array([ 0.29188715,  0.70239095,  0.44547189, 60.88725598, 26.91236126,
         5.32971974]))