In [1]:
#1.1 Goals Extend our regression model routines to support multiple features.
##Extend our data structures to support multiple features.
#Rewrite its prediction, cost and gradient routines to support multiple features.
#Utilize NumPy np.dot to vectorize their implementations for speed and simplicity.


In [2]:
import copy, math
import numpy as np
import matplotlib.pyplot as plt
np.set_printoptions(precision = 2)

In [None]:
## 1.3 Notation
#Here is a summary of some of the notation you will encounter, updated for multiple features.  

|General <img width=70/> <br />  Notation  <img width=70/> | Description<img width=350/>| Python (if applicable) |
|: ------------|: ------------------------------------------------------------||
| $a$ | scalar, non bold                                                      ||
| $\mathbf{a}$ | vector, bold                                                 ||
| $\mathbf{A}$ | matrix, bold capital                                         ||
| **Regression** |         |    |     |
|  $\mathbf{X}$ | training example maxtrix                  | `X_train` |   
|  $\mathbf{y}$  | training example  targets                | `y_train` 
|  $\mathbf{x}^{(i)}$, $y^{(i)}$ | $i_{th}$Training Example | `X[i]`, `y[i]`|
| m | number of training examples | `m`|
| n | number of features in each example | `n`|
|  $\mathbf{w}$  |  parameter: weight,                       | `w`    |
|  $b$           |  parameter: bias                                           | `b`    |     
| $f_{\mathbf{w},b}(\mathbf{x}^{(i)})$ | The result of the model evaluation at $\mathbf{x^{(i)}}$ parameterized by $\mathbf{w},b$: $f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)}+b$  | `f_wb` |

# 2 Problem Statement You will use the motivating example of housing price prediction. The training dataset contains three examples with four features (size, bedrooms, floors and, age) shown in the table below.  Note that, unlike the earlier labs, size is in sqft rather than 1000 sqft. This causes an issue, which you will solve in the next lab!

| Size (sqft) | Number of Bedrooms  | Number of floors | Age of  Home | Price (1000s dollars)  |   
| ----------------| ------------------- |----------------- |--------------|-------------- |  
| 2104            | 5                   | 1                | 45           | 460           |  
| 1416            | 3                   | 2                | 40           | 232           |  
| 852             | 2                   | 1                | 35           | 178           |  

#You will build a linear regression model using these values so you can then predict the price for other houses. For example, a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old.   Please run the following code cell to create your `X_train` and `y_train` variables.


In [12]:
X_train = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])
X_train


array([[2104,    5,    1,   45],
       [1416,    3,    2,   40],
       [ 852,    2,    1,   35]])

In [13]:
y_train = np.array([460, 232, 178])

# 2.1 Matrix X containing our examples
Similar to the table above, examples are stored in a NumPy matrix `X_train`. Each row of the matrix represents one example. When you have $m$ training examples ( $m$ is three in our example), and there are $n$ features (four in our example), $\mathbf{X}$ is a matrix with dimensions ($m$, $n$) (m rows, n columns).


$$\mathbf{X} = 
\begin{pmatrix}
x^{(0)}_0 & x^{(0)}_1 & \cdots & x^{(0)}_{n-1} \\ 
x^{(1)}_0 & x^{(1)}_1 & \cdots & x^{(1)}_{n-1} \\
\cdots \\
x^{(m-1)}_0 & x^{(m-1)}_1 & \cdots & x^{(m-1)}_{n-1} 
\end{pmatrix}
$$


notation:
- $\mathbf{x}^{(i)}$ is vector containing example i. $\mathbf{x}^{(i)}$ $ = (x^{(i)}_0, x^{(i)}_1, \cdots,x^{(i)}_{n-1})$
- $x^{(i)}_j$ is element j in example i. The superscript in parenthesis indicates the example number while the subscript represents an element.   Display the input data.



In [15]:
# data is stored in numpy array/matrix 
print(f"X Shape: {X_train.shape}, X Type: {type(X_train)})")

X Shape: (3, 4), X Type: <class 'numpy.ndarray'>)


In [16]:
print(X_train)

[[2104    5    1   45]
 [1416    3    2   40]
 [ 852    2    1   35]]


In [17]:
print(f"y Shape: {y_train.shape}, y Type: {type(y_train)})")

y Shape: (3,), y Type: <class 'numpy.ndarray'>)


In [18]:
print(y_train)

[460 232 178]


# 2.2 Parameter vector w, b * $\mathbf{w}$ is a vector with $n$ elements.
  - Each element contains the parameter associated with one feature.
  - in our dataset, n is 4.
  - notionally, we draw this as a column vector $$\mathbf{w} = \begin{pmatrix}
w_0 \\ 
w_1 \\
\cdots\\
w_{n-1}
\end{pmatrix}
$$
* $b$ is a scalar parameter.  

In [20]:
b_init = 745.1811367994083
w_init = np.array([0.39133535, 18.75376741, -53.36032453, -26.421131618])
print(f"w_init shape: {w_init.shape}, b_init type: {type(b_init)}")

w_init shape: (4,), b_init type: <class 'float'>



# 3 Model Prediction With Multiple Variables
The model's prediction with multiple variables is given by the linear model: $$ f_{\mathbf{w},b}(\mathbf{x}) =  w_0x_0 + w_1x_1 +... + w_{n-1}x_{n-1} + b \tag{1}$$
or in vector notation:
$$ f_{\mathbf{w},b}(\mathbf{x}) = \mathbf{w} \cdot \mathbf{x} + b  \tag{2} $$ 
where $\cdot$ is a vector `dot product` To demonstrate the dot product, we will implement prediction using (1) and (2).



In [25]:
def predict_single_loop(x, w, b):
    """
    single predict using linear regression
    
    Arge:
        x(ndarray): Shape(n,) example with multiple features
        w(ndarray): Shape(n,) model parameters
        b(scalar): model parameter
        
        Returns: 
        p (scalar): prediction
    """
    n = x.shape[0]
    p = 0
    for i in range(n):
        p_i = x[i] * w[i]
        p = p + p_i
        p = p + b
        return p

In [28]:
# get a row from the training data
x_vec = X_train[0,:]
print(f"x_vec shape {x_vec.shape}, x_vec value: {x_vec}")

# make a prediction 
f_wb = predict_single_loop(x_vec, w_init, b_init)
print(f"f_wb shape {f_wb.shape}, prediction: {f_wb}")


x_vec shape (4,), x_vec value: [2104    5    1   45]
f_wb shape (), prediction: 1568.5507131994082


In [29]:
print(X_train[0])
print(y_train[0])

[2104    5    1   45]
460


In [31]:
def predict(x, w, b):
    p = np.dot(x,w) + b
    return p

In [32]:
# get a row from the training data
x_vec = X_train[0,:]
print(f"x_vec shape {x_vec.shape}, x_vec value: {x_vec}")

# make a prediction 
f_wb = predict(x_vec, w_init, b_init)
print(f"f_wb shape {f_wb.shape}, prediction: {f_wb}")

x_vec shape (4,), x_vec value: [2104    5    1   45]
f_wb shape (), prediction: 420.00830290940826


#We made a python program that returns a prediction of the house price in Oregon using a linear regression model. We analyzed the data using Matrix and Vector Dot Product and we also vector notation with multiple variables making the prediction for the model. We used vectors to take the parameter.