# I. Algorithm

## 1. Mathematics:
### Linear Regression: 
$$ y \approx  f(\mathbf{x}) = \hat{y} $$
$$ \text{where: } f(\mathbf{x}) =w_1 x_1 + w_2 x_2 + w_3 x_3 + w_0$$
$$ \Rightarrow y \approx \mathbf{\bar{x}}\mathbf{w} = \hat{y} ~~~(1)$$
Note:

- $\mathbf{\bar{x}} = [1, x_1, x_2, x_3]$ (row vector)
- $\mathbf{w} = [w_0, w_1, w_2, w_3]^T$ (column vector)

###  Loss function:
$$ \frac{1}{2}e^2 = \frac{1}{2}(y - \hat{y})^2 = \frac{1}{2}(y - \mathbf{\bar{x}}\mathbf{w})^2 $$
Note:
- $\frac{1}{2}$ is used to facilitates the caculation process (when calculate the derivative $\frac{1}{2}$ will be eliminated).
- We use $e^2$ instead of $|e|$ because it has derivative at any point, while the derivative of $|e|$ is not identified at 0.

### Cost function:
$$\mathcal{L}(\mathbf{w}) = \frac{1}{2}\sum_{i=1}^N (y_i - \mathbf{\bar{x}}_i\mathbf{w})^2$$
$$= \frac{1}{2}\|\mathbf{y} - \mathbf{\bar{X}}\mathbf{w} \|_2^2$$
Note:
- $\| \mathbf{z} \|_2^2$ is the sum of the square of all point in $\mathbf{z}$

### Derivative:
To choose theta which can minimize the cost function, we need to calculate the derivative first.
$$\frac{\partial{\mathcal{L}(\mathbf{w})}}{\partial{\mathbf{w}}} = \frac{1}{2}(\|\mathbf{y} - \mathbf{\bar{X}}\mathbf{w} \|_2^2)'$$
$$= -\frac{1}{2}(\|\mathbf{\bar{X}}\mathbf{w} - \mathbf{y} \|_2^2)'$$
$$= -\frac{1}{2}2\mathbf{\bar{X}}^T(\mathbf{\bar{X}}\mathbf{w} - \mathbf{y})$$
$$= \mathbf{\bar{X}}^T(\mathbf{y} - \mathbf{\bar{X}}\mathbf{w})$$
Note:
- $(\|\mathbf{A}\mathbf{x} - \mathbf{b} \|_2^2)' = 2\mathbf{A}^T(\mathbf{A}\mathbf{x} - \mathbf{b})$

### Theta:
After finding the derivative, we assign the derivative equal to 0.
$$\mathbf{\bar{X}}^T(\mathbf{y} - \mathbf{\bar{X}}\mathbf{w})=0$$
$$\mathbf{\bar{X}}^T\mathbf{\bar{X}}\mathbf{w} = \mathbf{\bar{X}}^T\mathbf{y}$$
$$\mathbf{w}= (\mathbf{\bar{X}}^T\mathbf{\bar{X}})^{\dagger} \mathbf{\bar{X}}^T\mathbf{y} ~~~(2)$$

## 2. Code:
- Step 1: Use the formular $(2)$ to calculate $\mathbf{w}$.
- Step 2: Use the formular $(1)$ to calculate $\hat{y}$ based on $\mathbf{X}$ and $\mathbf{w}$.

In [5]:
# Calculate theta
def calculate_theta(X_train, y_train):
    X_bar = np.c_[np.ones(X_train.shape), X_train]
    theta = np.dot(np.linalg.pinv(np.dot(X_bar.T, X_bar)), np.dot(X_bar.T, y_train))
    return theta

In [3]:
# Predict the new values
def predict(X_test, theta):
    X_bar = np.c_[np.ones(X_test.shape), X_test]
    y_pre = np.dot(X_bar, theta)
    return y_pre

# II. Practice

Predict the height based on weight.
<table>
  <thead>
    <tr>
      <th style="text-align: center">Height (cm)</th>
      <th style="text-align: center">Weight (kg)</th>
      <th style="text-align: center">Height (cm)</th>
      <th style="text-align: center">Weight (kg)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center">147</td>
      <td style="text-align: center">49</td>
      <td style="text-align: center">168</td>
      <td style="text-align: center">60</td>
    </tr>
    <tr>
      <td style="text-align: center">150</td>
      <td style="text-align: center">50</td>
      <td style="text-align: center">170</td>
      <td style="text-align: center">72</td>
    </tr>
    <tr>
      <td style="text-align: center">153</td>
      <td style="text-align: center">51</td>
      <td style="text-align: center">173</td>
      <td style="text-align: center">63</td>
    </tr>
    <tr>
      <td style="text-align: center">155</td>
      <td style="text-align: center">52</td>
      <td style="text-align: center">175</td>
      <td style="text-align: center">64</td>
    </tr>
    <tr>
      <td style="text-align: center">158</td>
      <td style="text-align: center">54</td>
      <td style="text-align: center">178</td>
      <td style="text-align: center">66</td>
    </tr>
    <tr>
      <td style="text-align: center">160</td>
      <td style="text-align: center">56</td>
      <td style="text-align: center">180</td>
      <td style="text-align: center">67</td>
    </tr>
    <tr>
      <td style="text-align: center">163</td>
      <td style="text-align: center">58</td>
      <td style="text-align: center">183</td>
      <td style="text-align: center">68</td>
    </tr>
    <tr>
      <td style="text-align: center">165</td>
      <td style="text-align: center">59</td>
      <td style="text-align: center">&nbsp;</td>
      <td style="text-align: center">&nbsp;</td>
    </tr>
  </tbody>
</table>

In [4]:
# Load data
X_train = np.array([147, 150, 153, 158, 163, 165, 168, 170, 173, 175, 178, 180, 183]).reshape(-1,1)
y_train = np.array([ 49, 50, 51,  54, 58, 59, 60, 62, 63, 64, 66, 67, 68]).reshape(-1,1)
# Calculate theta
theta = calculate_theta(X_train, y_train)
# Predict
X_test = np.array([155, 160]).reshape(-1,1)
y_pre = predict(X_test, theta)
print (y_pre)

[[52.94135889]
 [55.7373837 ]]


# III.Preference

Machine Learning cơ bản - Bài 3: Linear Regression [https://machinelearningcoban.com/2016/12/28/linearregression/]