### Multiple features (variables)
if there are more than variable of x -> predict y

e.g:

size|number of bedrooms|number of floors| age of home|price
----|------------------|----------------|------------|-----
2104|5|1|45|460
...|...|...|...|...

> Notation:

> n : number of features

>$x^{(i)}$ input(features) of $i^{th}$ training example

>$x^{(i)}_j$ value of feature j in $i^{th}$ training example

**Hypothesis**:

$$h_\theta(x) = \theta_0 + \theta_1x_1 +\theta_2x_2 +...+ \theta_nx_n$$

For convenience of notation, define $x_0 = 1$.

Then, $x = \begin{bmatrix}
x_0 \\
x_1 \\
x_2 \\
...
x_n \\
\end{bmatrix}$   and    $\theta = \begin{bmatrix}
\theta_0 \\
\theta_1 \\
\theta_2 \\
  ...
\theta_n \\
\end{bmatrix}$

Hence, 

$$\color{red}{h_{\theta}(x) = \theta^Tx}$$

### Gradient descent for Multiple Variables

**Cost function**

$$ J(\theta) = \frac{1}{2m}\sum_{i=1}^m(h_{\theta}(x^{(i)}) - y^{(i)})^2$$

**Gradient descent** :

Repeat $ \{ \\
\theta_j := \theta_j - \alpha \frac{\partial}{\partial\theta_j}J(\theta) \\
\}$ (Simultaneously update for every J = 0,...,n)

**New algorithm** (n ≥ 1):

Repeat $ \{ \\
\theta_j := \theta_j - \alpha \frac{1}{m}\sum_{i = 1}^m(h_{\theta}(x^{(i)}) - y^{(i)})x^{(i)}_0 \\
\}$ (Simultaneously update for every J = 0,...,n)


### Feature Scaling
Idea: Make sure features are on a simlar scale.

E.g. $x_1 = $ size (0-2000 ft^2)

$x_1 = $ number of bedrooms(1-5)

2 variables have different scales => $x_1 = \frac{size}{2000} \\
x_2 = \frac{number\ of\ bedrooms}{5} $

**General rule**:
Get every feature into ~ $-1 ≤ x_i ≤ 1$

### Mean Normalization
Replace $x_i$ with $x_i - \mu_i$ to make features have ~ zero mean (do not apply to $x_0$ = 1)

$$x_i \leftarrow \frac{x_i - \mu_i}{max - min}$$

### Debugging

**1. Make sure that gradient descent is working correctly**

$J(\theta)$ should decrease after every iteration (look at the plot between no.iteration and $J(\theta)$).

If $J(\theta)$ is inceasing, use a smaller $\alpha$.

To choose $\alpha$, try: 0.001, 0.01, 0.1 and 1


### Features and Polynomial Regression

Choice of deatures:
- $h_\theta(x) = \theta_0 +\theta_1(size) + \theta_2(size)^2$

### Normal Equation
**definition** method to solve for $\theta$ analytically

**intuition**: if 1D($\theta \in \mathbb{R}$)

$\frac{d}{d\theta)J(\theta)} = 0$ then solve for $\theta$

In this context

$\theta \in \mathbb{R}^{n+1} $ and 
$$J(\theta_0, \theta_1,..., \theta_m) = \frac{1}{2m}\sum^{m}_{i = 1}(h_\theta(x^{(i)}) - y^{(i)})^2$$

Set $\frac{\partial}{\partial\theta_j)}J(\theta) = 0$ (for every $j$)

Then solve for $\theta_0, \theta_1,..., \theta_m$

#### Example
**m=4**

$x_0$ | Size(ft^2)|Number of bedrooms|number of floors|Age of home(years)| Price(1000 dollars)
----|----|---|----|---|---
1|2104|5|1|45|460
1|1416|3|2|40|232
1|1534|3|2|30|315
1|852|2|1|36|178

$X = \begin{bmatrix}
1 & 210 & 5 & 1 & 45  \\
1 & 1416 & 3 & 2 & 40  \\
1 & 1534 & 3 & 2 & 30  \\
1 & 852 & 2 & 1 & 36  \\
\end{bmatrix}$

X is the parameters we use to predict the Price (y)

$y = \begin{bmatrix}
460 \\
232 \\
315 \\
278 \\
\end{bmatrix}$

$$ \theta = (X^TX)^{-1}X^Ty$$

**Matlab**
>pinv(X' * X) * X' * y

**NOTE**
> It is ok to not do the feature scaling
> **Gradient descent:**
>* Need to choose $\alpha$
>* Needs many iterations
>* Works well with even large n
>
> **Normal Equation**
>* Doesnt work well with a large n (n < 10k)

In [1]:
(89 + 72 +94 +69)/4

81.0

In [3]:
94-69

25

In [4]:
(94-81)/25

0.52

In [5]:
50*15

750