In [1]:
%%javascript
MathJax.Hub.Config({
    TeX: { equationNumbers: { autoNumber: "AMS" } }
});

<IPython.core.display.Javascript object>

In [2]:
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd
from scipy import stats
import statsmodels.api as sm
import statsmodels.stats as smstats

import matplotlib
#matplotlib.use('nbagg')
import matplotlib.pyplot as plt

In statistics, degrees of freedom are parameters associated with the chi-squared and associated distributions. We need to spcify one such parameter for the chi-squared and t distributions, whereas the F distribution requires two. The t distribution arises from the ratio of a standard normal variable and a chi-squared variable divided by its degrees of freedom, so it inherits the underlying degrees of freedom parameter. The F distribution is a ratio of two chi-squared variables divided by their respective degrees of freedom, and so needs both parameters for its  specification. The interesting question is where the degrees of freedom come from. In order to answer this question, we will have to look at idempotent quadratic forms. 

Given an $n\times1$ vector $\mathbf{x}$, a quadratic form is a scalar value defined as
\begin{equation}
\label{eq-qform}
q = \sum_{i,j} a_{ij} x_i x_j = \mathbf{x'Ax}
\end{equation}
where $\mathbf{A}$ is an $n\times n$ symmetric matrix. Now, since $\mathbf{A}$ is symmetric, its spectral decomposition implies that
\begin{equation}
\mathbf{A = C'\Lambda C}
\end{equation}
where $\mathbf{\Lambda}$ is a diagonal matrix containing the eigenvalues of $\mathbf{A}$ and the columns of $\mathbf{C}$ are the eigenvectors of $\mathbf{A}$, and $\mathbf{C'C = CC' = I}$.

Plugging this into equation (\ref{eq-qform}), we get
$$
q = \mathbf{x'C'\Lambda Cx} = \mathbf{y'\Lambda y} = \sum_i \lambda_i y_i^2
$$
where we have defined $\mathbf{y=Cx}$ and $\lambda_i$ are the eigenvalues of $\mathbf{A}$.

An idempotent matrix is one whose square is the matrix itself. We have
$$
\mathbf{A^2 = A A = C'\Lambda C C' \Lambda C = C' \Lambda^2 C }
$$

But if $\mathbf{A}$ is idempotent, $\mathbf{A^2 = A}$, which implies that $\mathbf{\Lambda^2 = \Lambda}$ or $\lambda_i^2 = \lambda_i$ for all $i$. This can only happen if every $\lambda_i$ is either 0 or 1. This means that
$$
q = \sum_{k=1}^K y_k^2
$$
where $k$ ranges over the non-zero eigenvalues of $\mathbf{A}$. Note that the total number of non-zero eigenvalues $K$ is also the sum of all the eigenvalues of $\mathbf{A}$, which is the trace of $\mathbf{A}$.

**Sample standard deviation**

The sample standard deviaiton involves the sum of squared deviations from the mean. A vector of deviations from the mean can be expressed in matrix form. To do this, we start with a column vector of ones
$$
\mathbf{i} =
\begin{pmatrix}
1\\
1\\
\vdots\\
1
\end{pmatrix}
$$
This means that $\sum_i x_i = \mathbf{i'x}$ and $\bar{x}=\frac{1}{n} \mathbf{i'x}$ and $\frac{1}{n} \mathbf{ii'x} $ produces a column vector all of whose entries are $\bar{x}$. From this, we get

$$
\begin{pmatrix}
x_1-\bar{x}\\
x_2-\bar{x}\\
\vdots\\
x_n-\bar{x}
\end{pmatrix} =
\begin{pmatrix}
x_1\\
x_2\\
\vdots\\
x_n
\end{pmatrix} -
\begin{pmatrix}
\bar{x}\\
\bar{x}\\
\vdots\\
\bar{x}
\end{pmatrix} =
\left(\mathbf{I}-\frac{1}{n}\mathbf{ii'} \right)\;\mathbf{x} \equiv \mathbf{M x}
$$

$\mathbf{M}$ is symmetric and idempotent
$$
\mathbf{M^2} = \left(\mathbf{I}-\frac{1}{n}\mathbf{ii'} \right)\left(\mathbf{I}-\frac{1}{n}\mathbf{ii'} \right) =
\mathbf{I} - \frac{2}{n}\mathbf{ii'} + \frac{1}{n^2}\mathbf{ii'\; ii'} = \mathbf{I} - \frac{1}{n}\mathbf{ii'}
$$
since $\mathbf{ii'\;ii'}=n\;\mathbf{ii'}$.

This implies that we can write the sum of squared deviations from the mean as

\begin{equation}
\sum_{i=1}^n (x_i-\bar{x})^2 = \mathbf{x'Mx}
\end{equation}

Now, if $\frac{1}{\sigma}\mathbf{x}$ is a vector of normal random variables, so is $\mathbf{y=\frac{1}{\sigma}Cx}$ where $\mathbf{C}$ is the eigenvector matrix for $\mathbf{M}$. Thus

$$
\sum_{i=1}^n \left(\frac{x_i-\bar{x}}{\sigma}\right)^2 = \sum_{k=1}^K y_k^2 \sim \chi^2_K
$$

since a sum of squared normal random variables has a chi-squared distribution. All that remains is for us to determine $K$, which equals the trace of $\mathbf{M}$. Now,
$$
K = \text{trace}(\mathbf{M}) = n(1-\frac{1}{n}) = n-1
$$
So, we conclude that
$$
\sum_{i=1}^n \left(\frac{x_i-\bar{x}}{\sigma}\right)^2 \sim \chi^2_{n-1}
$$

**Regression**

In multiple regression, we have a model of the form
\begin{equation}
\mathbf{y = X \boldsymbol{\beta + \epsilon}} \quad \boldsymbol{\epsilon} \sim N(\mathbf{0},\sigma\mathbf{I})
\end{equation}

where $\mathbf{y}$ is an $n\times 1$ vector, $X$ is an $n \times k$ design matrix, $\boldsymbol{\beta}$ is the vector of $k$ coefficients, and $\boldsymbol{\epsilon}$ is the $n \times 1$ vector of standard normal random errors.

We derive an estimate $\mathbf{b}$ of $\boldsymbol{\beta}$ from a sample of observations for which
\begin{equation}
\label{eq-Reg}
\mathbf{y = Xb + e}
\end{equation}
by minimizing the squared sum of errors $\mathbf{e'e}$. The estimate is given by
\begin{equation}
\label{eq-b}
\mathbf{b = \left(X'X\right)^{-1} X'y}
\end{equation}

Plugging equation (\ref{eq-b}) into equation (\ref{eq-Reg}), we get
\begin{equation}
\label{eq-H}
\mathbf{y = X \left(X'X\right)^{-1} X'y + e \equiv Hy + e }
\end{equation}

The predicted values $\mathbf{Hy}$ are often denoted $\mathbf{\hat{y}}$, which explains the name "hat matrix" for $\mathbf{H}$.

The hat matrix $\mathbf{H}$ is a projection matrix, since it projects $\mathbf{y}$ on to the column space of $\mathbf{X}$. From equation (\ref{eq-H}), we have

$$
\mathbf{e = y - Hy = (I-H)y = (I-H)(X\boldsymbol{\beta}+\boldsymbol{\epsilon}) = (I-H)\boldsymbol{\epsilon}}
$$

where the last equality follows from the fact that $\mathbf{(I-H)X\boldsymbol{\beta} = \boldsymbol{0}}$ since $\mathbf{HX = X}$.

It is easy to show that $\mathbf{H}$ is symmetric and idempotent, which means that so is $\mathbf{I-H}$. This implies that

$$
\frac{\mathbf{e'e}}{\sigma^2} = \frac{\boldsymbol{\epsilon}'}{\sigma} (\mathbf{I-H}) \frac{\boldsymbol{\epsilon}}{\sigma}
$$

is going to have a chi-squared distribution, since $\boldsymbol{\epsilon}/\sigma$ is a vector of standard normal variates. The degrees of freedom for this chi-squared distribution is given by

$$
\text{trace}(\mathbf{I}_{n\times n}-\mathbf{H}) = \text{trace}(\mathbf{I}_{n\times n}) - \text{trace}\left(\mathbf{X(X'X)^{-1}X'}\right) = n - \text{trace}\left(\mathbf{X'X(X'X)^{-1}}\right) = n - \text{trace}\left(\mathbf{I}_{k\times k}\right) = n-k
$$

Thus, we have established that, in multiple regression

\begin{equation}
\frac{\mathbf{e'e}}{\sigma^2} \sim \chi^2_{n-k}
\end{equation}

This leads to the definition of the **standard error of estimate** $s$:
\begin{equation}
s^2 = \frac{\mathbf{e'e}}{n-k} = \frac{\sum_i e_i^2}{n-k}
\end{equation}

so that $(n-k)s^2/\sigma^2$ has the chi-squared distribution with $n-k$ degrees of freedom. The hypothesis tests  for regression coefficients all follow from this.

