# Estimation methods in regression analysis

Let us assume a linear relationship,

$$
\mathbf{y}=\mathbf{X}\mathbf{\beta}+\mathbf{\epsilon}
$$

With $\mathbf{y} \in \mathbb{R}^{n \times 1}$, $\mathbf{X} \in \mathbb{R}^{n \times k}$, $\mathbf{\beta} \in \mathbb{R}^{k \times 1}$ and $\mathbf{\epsilon} \in \mathbb{R}^{n \times 1}$.

| method | concept |
| --------|---------|
| least squares | $\underset{\beta}{\text{min}}\sum_{i=1}^n (\epsilon_i)^2$ |
| maximum likelihood | $\underset{\beta,\sigma^2}{\text{argmax}}\,L(\beta,\sigma^2)=\underset{\beta,\sigma^2}{\text{argmax}}\,\prod_{i=1}^n\phi(\mathbf{\epsilon})=\underset{\beta,\sigma^2}{\text{argmax}}\,\sum_{i=1}^n\ln{\phi}(\mathbf{\epsilon})$ |

Where $\phi(\mathbf{\epsilon})=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{1}{2}\left(\frac{\mathbf{y}-\mathbf{X}\mathbf{\beta}}{\sigma}\right)^2}$.

## Ordinary least squares (OLS)

1. Objective: minimise the sum of squared errors wrt the vector of parameters.

\begin{equation*}
\begin{aligned}
\min_{\mathbf{\beta}}SSE \Rightarrow \frac{\partial}{\partial \mathbf{\beta}} \left[ \sum_{i=1}^n \mathbf{\epsilon}'\mathbf{\epsilon} \right]= {}& \mathbf{0} \\
\frac{\partial}{\partial \mathbf{\beta}} \left[ (\mathbf{y}-\mathbf{X}\mathbf{\beta})'(\mathbf{y}-\mathbf{X}\mathbf{\beta}) \right]= {}& \mathbf{0} \Rightarrow \hat{\mathbf{\beta}}
\end{aligned}
\end{equation*}


2. Assumptions.

- $y_i,x_i \sim \text{i.i.d.}$
- $\text{rank}\left(\mathbf{X}\right)=k$ (no multicollinearity)
- $\mathbb{E}\left[\epsilon_i \mid \mathbf{X} \right]=0$ (exogeneity)
- $\text{Var}\left(\epsilon_i\right)=\mathbb{E}\left[\epsilon_i^2 \mid \mathbf{X} \right]=\sigma^2<\infty$ (homoskedasticity)
- $\text{Cov}\left(\epsilon_i,\epsilon_j\right)=\mathbb{E}\left[\epsilon_i\epsilon_j \mid \mathbf{X} \right]=0 \text{ for } i\neq j$ (no serial correlation)
- linearity in $\mathbf{\beta}$
- $\mathbf{\epsilon} \mid \mathbf{X} \sim \mathcal{N}\left(0,\sigma^2\mathbf{I}_n\right)$ (normality)

3. Solve the gradient for the coefficients.

\begin{equation*}
\begin{aligned}
\frac{\partial}{\partial \mathbf{\beta}} \left[ (\mathbf{y}-\mathbf{X}\mathbf{\beta})'(\mathbf{y}-\mathbf{X}\mathbf{\beta}) \right] = \frac{\partial}{\partial \mathbf{\beta}} \left[ \mathbf{y}'\mathbf{y} - \mathbf{y}'\mathbf{X}\mathbf{\beta} - \mathbf{\beta}'\mathbf{X}'\mathbf{y} + \mathbf{\beta}'\mathbf{X}' \mathbf{X}\mathbf{\beta} \right] = {}& \mathbf{0} \\
-2 \mathbf{X}'\mathbf{y} +2 \mathbf{X}'\mathbf{X}\mathbf{\beta} = {}& \mathbf{0} \\
\mathbf{X}'\mathbf{X}\mathbf{\beta} = {}& \mathbf{X}'\mathbf{y} \\
(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{X}\mathbf{\beta} = {}& (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y} \\
\hat{\mathbf{\beta}} = {}& (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}
\end{aligned}
\end{equation*}

4. Compute the variance-covariance matrix.

\begin{equation*}
\begin{aligned}
\text{Var}(\mathbf{\epsilon}) ={}& \mathbb{E}[\mathbf{\epsilon}\mathbf{\epsilon}'] = \sigma^2 \mathbf{I}_n \\
\text{Var}(\hat{\mathbf{\beta}})= {}& \text{Var} \left( (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y} \right) \\
= {}& \left( (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}' \right) \text{Var}(\mathbf{y}) \left( (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}' \right)' \\
= {}& \left( (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}' \right) \sigma^2 \left( (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}' \right)' \\
= {}& \sigma^2 (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{X}(\mathbf{X}'\mathbf{X})^{-1} \\
= {}& \sigma^2 (\mathbf{X}'\mathbf{X})^{-1} \\
\end{aligned}
\end{equation*}

### Estimator and the central limit theorem

| design matrix | estimator | CLT |
| ------------- | --------- | --- |
| $\mathbf{\iota}=\left[\begin{matrix} 1 \\ \vdots \\ 1 \end{matrix}\right]$ | $\hat{\beta}=(\mathbf{\iota}'\mathbf{\iota})\mathbf{\iota}'\mathbf{y}=\bar{y}=\frac{1}{n}\sum_{i=1}^n y_i$ | $\frac{\bar{y}-\mu}{\sqrt{\sigma^2/n}} \overset{d}{\to} \mathcal{N}(0,1)$ |
| | | $P\left(-t_{1-\alpha/2, n-k}\leq \frac{\bar{y}-\mu}{\sqrt{s^2/n}} \leq t_{1-\alpha/2, n-k}\right)=1-\alpha$ |
| $\mathbf{X}=\left[\begin{matrix} 1 & x_{11} \\ \vdots & \vdots \\ 1 & x_{n1} \end{matrix}\right]$ or $\mathbf{x}=\left[\begin{matrix} x_{11} \\ \vdots \\ x_{n1} \end{matrix}\right]$ | $\hat{\beta}=(\mathbf{x}'\mathbf{x})^{-1}\mathbf{x}'\mathbf{y}=\frac{\text{Cov}(\mathbf{x},\mathbf{y})}{\text{Var}(\mathbf{x})}$ | $\frac{\hat{\beta}-\beta}{\sqrt{\sigma^2/\sum(x_i-\bar{x})^2}} \overset{d}{\to} \mathcal{N}(0,1)$ |
| | | $P\left(-t_{1-\alpha/2, n-k}\leq \frac{\hat{\beta}-\beta}{\sqrt{s^2/n}} \leq t_{1-\alpha/2, n-k}\right)=1-\alpha$ |
| $\mathbf{X}=\left[\begin{matrix} 1 & x_{11} & \cdots & x_{1m} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_{n1} & \cdots & x_{nm} \end{matrix}\right]$ | $\hat{\mathbf{\beta}}=(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}$ | $\hat{\mathbf{\beta}}\overset{a}{\sim} \mathcal{N}\left(\mathbf{\beta}, \sigma^2(\mathbf{X}'\mathbf{X})^{-1}\right)$ |
| | | $P\left(-t_{1-\alpha/2, n-k}\leq \frac{\hat{\beta}_j-\beta_j}{\text{SE}(\hat{\beta}_j)} \leq t_{1-\alpha/2, n-k}\right)=1-\alpha$ |

### Mean squared error (sample variance)

$$
\hat{\sigma}^2=\frac{\mathbf{\hat{\mathbf{\epsilon}}}'\mathbf{\hat{\mathbf{\epsilon}}}}{n-k}=s^2=\frac{\mathbf{e}'\mathbf{e}}{n-k}=\frac{\frac{1}{n}\sum_{i=1}^n \left(y_i-\hat{y}_i\right)^2}{n-k}
$$

### Variance

$$
\text{Var}(\hat{\mathbf{\beta}})=s^2(\mathbf{\mathbf{X}}'\mathbf{\mathbf{X}})^{-1}
$$

### Standard error

$$
\text{SE}(\hat{\beta}j) = \sqrt{s^2 [(\mathbf{\mathbf{X}}'\mathbf{\mathbf{X}})^{-1}]{jj}}
$$

### Confidence intervals

$$
\beta_j \in \hat{\beta}_j \pm c_{1-\alpha /2} \text{SE}\left(\hat{\beta}_j\right)
$$

Where $c_{1-\alpha/2}=t_{1-\alpha /2,n-k}$ for small samples and $c_{1-\alpha /2}=Z_{1-\alpha /2}$ for large samples, and $(1-\alpha/2)$ implies two-tailed .

### SSE, SSR and SST

- $SSE=\sum_{i=1}^n (\hat{y}_i-\bar{y})^2$ (explained)
- $SSR=\sum_{i=1}^n e_i^2$ (residuals)
- $SST=SSE+SSR=\sum_{i=1}^n (y_i-\bar{y})^2$ (total)

### $R^2$

$$
R^2=1-\frac{SSE}{SST}
$$

$$
R_{\text{adj}}^2=1-\frac{(1-R^2)(n-1)}{(n-k-1)}
$$

## Maximum likelihood

- lay out the estimator and info matrix derivation


## Weighted least squares

- lay out the steps
