## Exercise 1

Let's $x$ be a data vector containing a set of N houses with some information relative to them: its area (in sq. feet).

Also let's $y$ be a vector of size N with the price relative to each house in $x$.

If we represent the total area of each house versus the price, we see that there is a linear trend. The bigger the house, the more expensive it is. Given this intuition, we want to build a model to estimate the price of a new house give its area.

To achieve this goal we're going to build a linear regression model from scratch using bare matrix multiplication operations. A linear regression model it's built with the following formula:
$$y=wX^T$$

where $X$ is the data matrix, $y$ is the price vector and $w$ contains the weights of the linear regression model. Given this, we know $x$ and $y$ so we must calculate $w$ given the following formula.

$$w = (X^TX)^{-1}X^Ty$$

Follow this instructions:
 1. Load both `x` and `y` from _data/x1d.npy_ and _data/y.npy_ using the `numpy.load` function
 1. Use the function `plot_data()` to represent data in a scatter plot
 1. Create a function called `fit` that trains a linear regression model (calculates the weights). This function must:
     1. Add a bias term to `x` with the provided function `add_bias`. Store the result in `X` (upper)
     1. Calculate `w` using the formula $w = (X^TX)^{-1}X^Ty$  _(Hint: to calculate the inverse of a matrix remember `np.inv` function)_

    ```python
    def fit(x: np.ndarray, y: np.ndarray) -> np.ndarray:
        # Add bias term
        X = add_bias(x)
        # Calculate weights
        w = ...
        return w
    ``` 

 1. Once trained, you can use the function `plot_data_w_model()` to represent data and the trained linear regression model
 1. Create a new function called `predict` that, given a list of areas (in sq. ft.) it returns the estimated price for each one.
    ```python
    def predict(x: np.ndarray, w: np.ndarray) -> np.ndarray:
        # Add bias term
        X = add_bias(x)
        # Calculate predictions
        y = ...
        return y
    ```
 
 **Question**: What price would have a house with 13478 sq. feet?

In [46]:
import numpy as np
import plotly.express as px

def plot_data(x, y):
    return px.scatter(x=x, y=y, title="Sale Price -vs- Area", labels={"x":"Area (sq. feet)", "y":"Sale Price ($)"}, template="none")

def plot_data_w_model(x, y, w):
    X = add_bias(x)
    y_new = w.dot(X.T)
    fig = px.scatter(x=X[:,1], y=y, title="Sale Price -vs- Area", labels={"x":"Area (sq. feet)", "y":"Sale Price ($)"}, template="none")
    fig.add_scatter(x=X[:,1], y=y_new, name="Estimated LR Model")
    return fig
    
def add_bias(X):
    return np.c_[np.ones(len(X)),X]

In [15]:
# 1. Import numpy and load both X and y from data/X.npy and data/y.npy using the numpy.load function
x = np.load("data/x1d.npy")
y = np.load("data/y.npy")

In [16]:
# 2. Use the function plot_data() to represent data in a scatter plot
plot_data(x,y)

In [45]:
# 3. Calculate the weights for the linear regression model using the formula  𝑤=(𝑋𝑇𝑋)−1𝑋𝑇𝑦

def fit(x, y):
    X = add_bias(x)
    w = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)
    return w

w = fit(x,y)

In [47]:
# 4. Use the function plot_data_and_model() to represent data and the trained linear regression model
plot_data_w_model(x,y,w)

In [49]:
# 5. Create a new function called `predict` that, given a list of areas (in sq. ft.) it returns the estimated price for each one.
def predict(x, w):
    X = add_bias(x)
    return w.dot(X.T)

In [54]:
y_hat = predict(np.array([13478]), w)
y_hat

array([203554.98842139])

In [53]:
print("Estimated price for a 13478 sq. ft. house: $", round(y_hat[0],2))

Estimated price for a 13478 sq. ft. house: $ 203554.99
