# Regression Notes

# Linear Regression and Ridge Regression Derivations

## Linear Regression

### 1. Linear Regression Model

The linear regression model with \( p \) predictors is given by:

$$ Y = X \beta + \epsilon $$

where:
- \( Y \) is the vector of observed values.
- \( X \) is the design matrix (including predictor variables).
- \( \beta \) is the vector of coefficients to be estimated.
- \( \epsilon \) is the vector of residuals (errors).

### 2. Least Squares Objective

The goal is to minimize the sum of squared residuals:

$$ \text{Minimize} \; \| Y - X \beta \|^2 $$

Expanding the residuals:

$$ \| Y - X \beta \|^2 = (Y - X \beta)^T (Y - X \beta) $$

Expanding this:

$$ (Y - X \beta)^T (Y - X \beta) = Y^T Y - 2 Y^T X \beta + \beta^T X^T X \beta $$

### 3. Finding the Minimum

To find the minimum, take the derivative with respect to \( \beta \) and set it to zero:

$$ \frac{\partial}{\partial \beta} \left[ Y^T Y - 2 Y^T X \beta + \beta^T X^T X \beta \right] = 0 $$

Simplify the derivative:

$$ -2 X^T Y + 2 X^T X \beta = 0 $$

Divide by 2:

$$ X^T X \beta = X^T Y $$

### 4. Solving for (\beta)

The normal equation is:

$$ \beta = (X^T X)^{-1} X^T Y $$

**Note:** For a general case with multiple predictors, the equation is:

$$ \beta = \begin{bmatrix}
\beta_0 \\
\beta_1 \\
\vdots \\
\beta_p
\end{bmatrix} $$

where \(\beta_0\) is the intercept and \(\beta_1, \beta_2, \ldots, \beta_p\) are the coefficients for each predictor variable.

## Ridge Regression

### 1. Ridge Regression Model

Ridge regression adds a regularization term to the linear regression model:

$$ \text{Minimize} \; \| Y - X \beta \|^2 + \lambda \| \beta \|^2 $$

where:
- \( \lambda \) is the regularization parameter.
- \( \| \beta \|^2 \) is the squared \( L_2 \) norm of the coefficient vector \( \beta \).

### 2. Expanding the Objective Function

Expand the ridge regression objective function:

$$ \text{Objective} = \| Y - X \beta \|^2 + \lambda \| \beta \|^2 $$

Expanding:

$$ \| Y - X \beta \|^2 = (Y - X \beta)^T (Y - X \beta) $$

$$ \| \beta \|^2 = \beta^T \beta $$

Thus:

$$ \text{Objective} = (Y - X \beta)^T (Y - X \beta) + \lambda \beta^T \beta $$

Combine quadratic terms:

$$ \text{Objective} = Y^T Y - 2 Y^T X \beta + \beta^T X^T X \beta + \lambda \beta^T \beta $$

$$ = Y^T Y - 2 Y^T X \beta + \beta^T (X^T X + \lambda I) \beta $$

where \( I \) is the identity matrix.

### 3. Finding the Minimum

Take the derivative with respect to \( \beta \) and set it to zero:

$$ \frac{\partial}{\partial \beta} \left[ Y^T Y - 2 Y^T X \beta + \beta^T (X^T X + \lambda I) \beta \right] = 0 $$

Simplify:

$$ -2 X^T Y + 2 (X^T X + \lambda I) \beta = 0 $$

Divide by 2:

$$ -X^T Y + (X^T X + \lambda I) \beta = 0 $$

Rearrange to solve for \( \beta \):

$$ (X^T X + \lambda I) \beta = X^T Y $$

### 4. Solving for \(\beta\)

The normal equation for ridge regression is:

$$ \beta = (X^T X + \lambda I)^{-1} X^T Y $$

**Note:** For a general case with multiple predictors, the ridge regression formula is:

$$ \beta = \begin{bmatrix}
\beta_0 \\
\beta_1 \\
\vdots \\
\beta_p
\end{bmatrix} $$

where \(\beta_0\) is the intercept and \(\beta_1, \beta_2, \ldots, \beta_p\) are the coefficients for each predictor variable, adjusted with regularization.


# Linear Regression and Ridge Regression Derivations

## Linear Regression

### 1. Linear Regression Model

The linear regression model with \( p \) predictors is given by:

$$ Y = X \beta + \epsilon $$

where:
- \( Y \) is the vector of observed values.
- \( X \) is the design matrix (including predictor variables).
- \( \beta \) is the vector of coefficients to be estimated.
- \( \epsilon \) is the vector of residuals (errors).

### 2. Least Squares Objective

The goal is to minimize the sum of squared residuals:

$$ \text{Minimize} \; \| Y - X \beta \|^2 $$

Expanding the residuals:

$$ \| Y - X \beta \|^2 = (Y - X \beta)^T (Y - X \beta) $$

Expanding this:

$$ (Y - X \beta)^T (Y - X \beta) = Y^T Y - 2 Y^T X \beta + \beta^T X^T X \beta $$

### 3. Finding the Minimum

To find the minimum, take the derivative with respect to \( \beta \) and set it to zero:

$$ \frac{\partial}{\partial \beta} \left[ Y^T Y - 2 Y^T X \beta + \beta^T X^T X \beta \right] = 0 $$

Simplify the derivative:

$$ -2 X^T Y + 2 X^T X \beta = 0 $$

Divide by 2:

$$ X^T X \beta = X^T Y $$

### 4. Solving for \(\beta\)

The normal equation is:

$$ \beta = (X^T X)^{-1} X^T Y $$

**Note:** For a general case with multiple predictors, the equation is:

$$ \beta = \begin{bmatrix}
\beta_0 \\
\beta_1 \\
\vdots \\
\beta_p
\end{bmatrix} $$

where \(\beta_0\) is the intercept and \(\beta_1, \beta_2, \ldots, \beta_p\) are the coefficients for each predictor variable.

### Example

Consider the following simple linear regression problem with 2 predictors each column a seprate variable:

$$ X = \begin{bmatrix}
1 & 2 \\
3 & 4 \\
5 & 6
\end{bmatrix} $$

and

$$ Y = \begin{bmatrix}
1 \\
2 \\
3
\end{bmatrix} $$

For this example:
- The design matrix \( X \) includes 2 predictors (or features) for each of 3 observations.
- \( Y \) is the vector of target values.

To find the coefficients \( \beta \):

1. Compute \( X^T X \):

$$ X^T X = \begin{bmatrix}
1 & 3 & 5 \\
2 & 4 & 6
\end{bmatrix} \begin{bmatrix}
1 & 2 \\
3 & 4 \\
5 & 6
\end{bmatrix} = \begin{bmatrix}
35 & 44 \\
44 & 56
\end{bmatrix} $$

2. Compute \( X^T Y \):

$$ X^T Y = \begin{bmatrix}
1 & 3 & 5 \\
2 & 4 & 6
\end{bmatrix} \begin{bmatrix}
1 \\
2 \\
3
\end{bmatrix} = \begin{bmatrix}
22 \\
28
\end{bmatrix} $$

3. Compute \( \beta \):

$$ \beta = (X^T X)^{-1} X^T Y $$

4. Compute \( (X^T X)^{-1} \):

$$ (X^T X)^{-1} = \begin{bmatrix}
35 & 44 \\
44 & 56
\end{bmatrix}^{-1} = \begin{bmatrix}
0.5714 & -0.4286 \\
-0.4286 & 0.3571
\end{bmatrix} $$

5. Finally,

$$ \beta = \begin{bmatrix}
0.5714 & -0.4286 \\
-0.4286 & 0.3571
\end{bmatrix} \begin{bmatrix}
22 \\
28
\end{bmatrix} = \begin{bmatrix}
0.5 \\
0.5
\end{bmatrix} $$

## Ridge Regression

### 1. Ridge Regression Model

Ridge regression adds a regularization term to the linear regression model:

$$ \text{Minimize} \; \| Y - X \beta \|^2 + \lambda \| \beta \|^2 $$

where:
- \( \lambda \) is the regularization parameter.
- \( \| \beta \|^2 \) is the squared \( L_2 \) norm of the coefficient vector \( \beta \).

### 2. Expanding the Objective Function

Expand the ridge regression objective function:

$$ \text{Objective} = \| Y - X \beta \|^2 + \lambda \| \beta \|^2 $$

Expanding:

$$ \| Y - X \beta \|^2 = (Y - X \beta)^T (Y - X \beta) $$

$$ \| \beta \|^2 = \beta^T \beta $$

Thus:

$$ \text{Objective} = (Y - X \beta)^T (Y - X \beta) + \lambda \beta^T \beta $$

Combine quadratic terms:

$$ \text{Objective} = Y^T Y - 2 Y^T X \beta + \beta^T X^T X \beta + \lambda \beta^T \beta $$

$$ = Y^T Y - 2 Y^T X \beta + \beta^T (X^T X + \lambda I) \beta $$

where \( I \) is the identity matrix.

### 3. Finding the Minimum

Take the derivative with respect to \( \beta \) and set it to zero:

$$ \frac{\partial}{\partial \beta} \left[ Y^T Y - 2 Y^T X \beta + \beta^T (X^T X + \lambda I) \beta \right] = 0 $$

Simplify:

$$ -2 X^T Y + 2 (X^T X + \lambda I) \beta = 0 $$

Divide by 2:

$$ -X^T Y + (X^T X + \lambda I) \beta = 0 $$

Rearrange to solve for \( \beta \):

$$ (X^T X + \lambda I) \beta = X^T Y $$

### 4. Solving for \(\beta\)

The normal equation for ridge regression is:

$$ \beta = (X^T X + \lambda I)^{-1} X^T Y $$

**Note:** For a general case with multiple predictors, the ridge regression formula is:

$$ \beta = \begin{bmatrix}
\beta_0 \\
\beta_1 \\
\vdots \\
\beta_p
\end{bmatrix} $$

where \(\beta_0\) is the intercept and \(\beta_1, \beta_2, \ldots, \beta_p\) are the coefficients for each predictor variable, adjusted with regularization.
