<a href="https://colab.research.google.com/github/yifdai/PM-520-repo/blob/main/HW/PM520_HW1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Homework 1. Linear regression and normal equations

# 1. Linear model simulation
In class we defined a Python function that simulates $N$ $P\times 1$ variables $X$ (i.e. an $N \times P$ matrix $X$) and outcome $y$ as a linear function of $X$. Please include its definition here and use for problem 2.

In [3]:
import jax.numpy as jnp
import jax.random as rdm

def sim_linear_reg(key, N, P, r2=0.5):
  key1, key2, key3 = rdm.split(key, num = 3)
  design_mat = rdm.normal(key1, shape = (N, P))
  sim_beta = rdm.normal(key2, shape = (P, ))
  noise = rdm.normal(key3, shape = (N, ))
  sim_y = design_mat @ sim_beta + noise
  return sim_y

seed = 0
key = rdm.PRNGKey(seed)
sim_y = sim_linear_reg(key, N = 100, P = 5)

# 2. Just-in time decorator and ordinary least squares
Complete the definition of `ordinary_least_squares` below, that estimates the effect and its standard error. `@jit` wraps a function to perform just-in-time compilation, which boosts computational performance/speed.

Compare the times of with and without JIT
Hint: use [`block_until_ready()`](https://jax.readthedocs.io/en/latest/_autosummary/jax.block_until_ready.html) to get correct timing estimates.

In [None]:
import jax

from jax import jit


def ordinary_least_squares(X, y):
  """
  computes the OLS solution to linear system y ~ X.
  Returns a tuple of $\hat{beta}$ and $\text{se}(\hat{beta})$.
  """
  XtX_inv = jnp.linalg.inv(X.T @ X)
  beta_hat = XtX_inv @ X.T @ y
  sigma_sq = jnp.var(y)
  se = sigma_sq @ XtX_inv

  return (beta_hat, se)

jit_ordinary_least_squares = jit(ordinary_least_squares)

# 3. OLS derivation
Assume that $y = X \beta + \epsilon$ where $y$ is $N \times 1$ vector, $X$ is an $N \times P$ matrix where $P < N$ and $\epsilon$ is a random variable such that $\mathbb{E}[\epsilon_i] = 0$ and $\mathbb{V}[\epsilon_i] = \sigma^2$ for all $i = 1 \dots n$. Derive the OLS "normal equations".

Before deriving the ordinary least square (OLS) "normal equation", we should first briefly review a few key assumptions and settings for the liear regression. We assume that the response data $\mathbf{y}$ can be expressed linearly by such formula $\mathbf{y} = \mathbf{X} \mathbf{\beta} + \mathbf{\epsilon}$, where $\mathbf{\epsilon} \sim N(\mathbf{0}, \mathbf{\sigma^2} I_n)$, $\mathbf{\beta}$ is the true effect. Subsquentially we could obtain $E(\mathbf{y}) = \mathbf{X} \mathbf{\beta}, Var(\mathbf{y}) = \mathbf{\sigma^2} I_n$. Since the "normal equation" is essentially calculating the maximum iikelihood estimator (MLE) of the true effect in linear regresion, in the following section we would consider obtain the MLE of the true effect $\mathbf{\beta}$.

The objective of the OLS is to minimize the sum of squared error (SSE), which is defined as $SSE = \sum^n_{i=1}(y_i - ŷ_i)^2 = (\mathbf{y} - \mathbf{ŷ})^T(\mathbf{y} - \mathbf{ŷ})$, where $\mathbf{ŷ}$ is defined as $\mathbf{X} \hat{\mathbf{\beta}}$, . To minimize the $SSE$, here we consider taking derivative with regard to the $\mathbf{\beta}$ to obtain the MLE of the true effect $\mathbf{\beta}$ under the linear assumption.

$\frac{\partial SSE}{\partial \beta} = \frac{\partial (\mathbf{y} - \mathbf{ŷ})^T(\mathbf{y} - \mathbf{ŷ})}{\partial \beta} = \frac{\partial (\mathbf{y} - \mathbf{X} \mathbf{\beta})^T(\mathbf{y} - \mathbf{\mathbf{X} \mathbf{\beta}})}{\partial \beta} = \frac{\partial (\mathbf{y}^T\mathbf{y} - 2\mathbf{y}^T\mathbf{X} \mathbf{\beta} + \mathbf{\beta}^T \mathbf{X}^T \mathbf{X} \mathbf{\beta})}{\partial \beta} = -2\mathbf{X}^T\mathbf{y} + 2 \mathbf{X}^T\mathbf{X} \mathbf{\beta}$.

Set the first derivative to zero, we could obtain the MLE estimator of $\mathbf{\beta}$, $\hat{\mathbf{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}$. The expectation of this estimator is $E(\mathbf{\hat{\beta}}) = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^TE(\mathbf{y}) = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{X} \mathbf{\beta} = \mathbf{\beta}$, which indicates that this is an unbiased estimator. The variance is equal to $Var(\mathbf{\beta}) = \mathbf{\sigma^2} (\mathbf{X}^T\mathbf{X})^{-1}$.
