# Part I: Multiple Features

**Multiple Variables**
![image.png](attachment:image.png)
Notation:
- $x_{j}: j^{th}$ feature
- n: number of features
- $\vec{v}^{(i)}$: features of $i^{th}$ training example, as a row vector
- $x_{j}^{(i)}$: value of variable j in the $i^{th}$ training example

**Model**: $f_{w, b}(x) = w_{1}x_{1} + w_{2}x_{2} + ... + w_{n}x_{n}$

Redefine the variables:
- $\vec{w}: [w_{1}, w_{2}, ..., w_{n}]$
- b: bias
- $\vec{x}: [x_{1}, x_{2}, ..., x_{n}]$

Then: $f_{w, b}(x) = \vec{w} \cdot \vec{x} + b = w_{1}x_{1} + w_{2}x_{2} + ... + w_{n}x_{n}$

# Part II: Vectorization

In [25]:
import numpy as np

**Parameters and Features**

- $\vec{w}: [w_{1}, w_{2}, w_{3}]$
- b: bias
- $\vec{x} = [x_{1}, x_{2}, x_{3}]$

In [26]:
w = np.array([1.0, 2.0, 3.0])
n = 2
b = 4
x = np.array([10, 20, 30])

**Without Vectorization**

In [27]:
#Method 1
f = w[0] * x[0] + w[1] * x[1] + w[2] * x[2] + b

In [28]:
#Method 2
f = 0
for i in range(n):
    f += w[i] * x[i]
f += b

**With Vectorization**

In [29]:
f = np.dot(w, x) + b

**Gradient Descent**
- $\vec{w} = (w_{1} w_{2} ... w_{6})$
- $\vec{d} = (d_{1} w_{2} ... d_{6})$  
Where d is the derivative

In [30]:
w = np.array([3, 2, 6, 3, 1, 8])
d = np.array([1.0, 1.5, 6.5, 3.4, 6.0, 4.1])
alpha = 0.1

Task: Compute $w_{j} = w_{j} - \alpha * d_{j}$ for j = 1, ..., 6  
where $\alpha$ is the learning rate

In [31]:
#Without Vectorization
for i in range(6):
    w[i] = w[i] - alpha * d[i]
#With Vectorization
w = w - alpha * d

# Part III: Gradient Descent for multiple linear regression

**Vector Notation**
- $\vec{w} = [w_{1} ... w_{n}]$
- b: number for bias
- $f_{\vec{w}, b}(\vec{x}) = \vec{w} \cdot \vec{x} + b$  

**Gradient Descent**  
repeat, simultaneously update:
- $w_{j} = w_{j} - \alpha * d/dw_{j} * J(\vec{w}, b)$
- $b = b - \alpha * d/db * J(\vec{w}, b)$

**An alternative to Gradient Descent**  
- Normal Equation:  
Only for linear regression  
Solve for w, b without iterations  
- Disadvantage:  
Does not generalize to other learning algorithms  
Slow when number of features is large (> 10000)  
- What we need to know:  
Normal equation may be used in machine learning libraries that implement linear regression  
Gradient descent is the recommended method finding paramters w, b