# Ridge Regression

## Dependency

In [4]:
import numpy as np
from sklearn.linear_model import Ridge

## How to Compute Ridge Regression Coefficient $\hat {\beta}^{ridge}$

$$
\hat {\beta}^{ridge} = \underset{\beta}{argmin} \{ \sum_{i = 1}^{N} (y_i - \beta_0 - \sum_{j = 1}^{p} x_{ij} \beta_j)^2 + \lambda \sum_{j = 1}^{p} \beta_{j}^2 \}
$$

$$
RSS(\lambda) = (y - X \beta)^T (y - X \beta) + \lambda \beta^T \beta 
$$

$$
= (y^T - \beta^T X^T ) (y - X \beta) + \lambda \beta^T \beta 
$$
$$
= y^T y - 2 \beta^T X^T y + \beta^T X^T X \beta + \lambda \beta^T \beta 
$$

$$
\frac {\partial RSS(\lambda)} {\partial \beta} = - 2 X^T y + 2 X^T X \beta + 2 \lambda \beta
$$

Setting the first derivative to zero,

$$
- 2 X^T y + 2 X^T X \beta + 2 \lambda \beta = 0
$$
$$
- X^T y + X^T X \beta + \lambda \beta = 0
$$
$$
X^T X \beta + \lambda \beta = X^T y
$$
$$
(X^T X + \lambda I) \beta = X^T y
$$
$$
(X^T X + \lambda I)^{-1} (X^T X + \lambda I) \beta = (X^T X + \lambda I)^{-1} X^T y
$$
$$
\beta = (X^T X + \lambda I)^{-1} X^T y
$$

In [58]:
np.random.seed(0)

# Number of data
n = 100
# Number of features
p = 2
# Ridge regression lambda
l = 10

# Make the given data
X = np.random.randn(n, p)
y = np.random.randn(n, 1)

# Center input
X = X - np.mean(X, axis=0)

# Compute ridge regression manually
# Intercept is just mean of y, and design matrix doesn't include all 1s column
intercept = np.mean(y)
coefficients = np.linalg.inv(X.T @ X + l * np.eye(p)) @ X.T @ y
coefficients = coefficients.reshape((1, 2))

print(f'Intercept by manual: {intercept:.4f}')
print(f'Coefficients by manual: {np.round(coefficients, 4)}')
print()

# Compute ridge regression by sklearn
ridge = Ridge(10)
ridge.fit(X, y)
print(f'Intercept by sklearn: {ridge.intercept_[0]:.4f}')
print(f'Coefficient by sklearn: {np.round(ridge.coef_, 4)}')
print()

Intercept by manual: -0.0592
Coefficients by manual: [[ 0.1009 -0.0495]]

Intercept by sklearn: -0.0592
Coefficient by sklearn: [[ 0.1009 -0.0495]]



In [None]:
64

## Reference
- [Statistical Modeling and Analysis of Neural Data (NEU 560)](http://pillowlab.princeton.edu/teaching/statneuro2018/slides/notes03b_LeastSquaresRegression.pdf)