# Linear Regression Direct Solution

## Step 1: Define the Cost Function

We want to minimize the error between predictions and true values.

$$
J(w) = \frac{1}{2} \| Xw - t \|^2
$$

✅ Meaning:
- $Xw$: predicted values
- $t$: true target values
- $Xw - t$: error vector
- $\| \cdot \|^2$: sum of squared errors
- $\frac{1}{2}$: for convenient derivative later

## Step 2: Expand $\| Xw - t \|^2$

Use the rule:

$$
\| v \|^2 = v^\top v
$$

Expand:

$$
\| Xw - t \|^2 = (Xw - t)^\top (Xw - t)
$$

Distribute:

$$
= (Xw)^\top (Xw) - (Xw)^\top t - t^\top (Xw) + t^\top t
$$

Since $(Xw)^\top t = t^\top (Xw)$ (scalars are symmetric):

$$
= (Xw)^\top (Xw) - 2 t^\top (Xw) + t^\top t
$$

## Step 3: Rewrite in Matrix Terms

Recognizing standard matrix rules:

- $(Xw)^\top (Xw) = w^\top X^\top X w$
- $t^\top (Xw) = t^\top X w$

Thus:

$$
J(w) = \frac{1}{2} \left( w^\top X^\top X w - 2 t^\top Xw + t^\top t \right)
$$

## Step 4: Take the Gradient $\nabla_w J(w)$

Differentiate term-by-term:

- The derivative of $w^\top X^\top X w$ with respect to $w$ is $2X^\top Xw$.
- The derivative of $-2 t^\top Xw$ with respect to $w$ is $-2 X^\top t$.
- The derivative of $t^\top t$ (a constant) is $0$.

Because we have a $\frac{1}{2}$ factor outside, it cancels out the 2’s from the derivatives.

Thus:

$$
\nabla_w J(w) = X^\top X w - X^\top t
$$

## Step 5: Set the Gradient to Zero (Find the Minimum)

Setting the gradient to zero:

$$
X^\top X w - X^\top t = 0
$$

Rearranging:

$$
X^\top X w = X^\top t
$$

✅ This is called the **normal equation**.

## Step 6: Solve for $w$

Multiply both sides by $(X^\top X)^{-1}$:

$$
w^* = (X^\top X)^{-1} X^\top t
$$

✅ This gives the **optimal weights**.

# Final Steps Summary (plain text)

1. Write the cost function: $J(w) = \frac{1}{2} \|Xw - t\|^2$
2. Expand the square: $w^\top X^\top Xw - 2 t^\top Xw + t^\top t$
3. Differentiate term-by-term:
   - $w^\top X^\top Xw \to 2X^\top Xw$
   - $-2 t^\top Xw \to -2 X^\top t$
   - $t^\top t \to 0$
4. Set the gradient to zero: $X^\top Xw = X^\top t$
5. Solve for $w$: $w^* = (X^\top X)^{-1} X^\top t$

# Important Notes

✅ $X^\top X$ must be invertible (i.e., $X$ has full rank).  
✅ If $X^\top X$ is singular (no inverse), you can fix it by adding Ridge regularization:

$$
w^* = (X^\top X + \lambda I)^{-1} X^\top t
$$

✅ Linear regression has this direct solution.  
Other models like logistic regression or neural networks need iterative optimization methods.

# Final Formula

The direct closed-form solution for linear regression is:

$$
w^* = (X^\top X)^{-1} X^\top t
$$
