- Learn to implement the model $f_{w,b}$ for linear regression with one variable

#### Notation
Here is a summary of some of the notation you will encounter.  
<br />
𝑎 - scalar, non bold	<br/>
𝐚 - vector, bold	Regression	<br/>	
𝐱 	- Training Example feature values x_train <br/>
𝐲	- Training Example targets 	y_train <br/>
𝑥(𝑖),𝑦(𝑖)𝑖𝑡ℎ -  Training Example	x_i, y_i <br/>
m - Number of training examples	m <br/>
𝑤 - parameter: weight	w <br/>
𝑏 - parameter: bias	b <br/>
𝑓𝑤,𝑏(𝑥(𝑖)) - The result of the model evaluation at  𝑥(𝑖) parameterized  by  𝑤,𝑏 :  𝑓𝑤,𝑏(𝑥(𝑖))=𝑤𝑥(𝑖)+𝑏


#### Packages
- Numpy for scientific computing
- Matplotlib,for plotting data

##### problem
- A grocery store wants to predict the price
of rice bags based on their weight. You are given 
the following dataset:

In [1]:
import numpy as np
import matplotlib.pyplot as plt

<a name="4"></a>
#### 4 - Refresher on linear regression

In this we will fit the linear regression parameters $(w,b)$ to our dataset.
- The model function for linear regression, which is a function that maps from `x`  to `y`  is represented as $$f_{w,b}(x) = wx + b$$
    

- To train a linear regression model, we  want to find the best $(w,b)$ parameters that fit our dataset.  

    - To compare how one choice of $(w,b)$ is better or worse than another choice, we can evaluate it with a cost function $J(w,b)$
      - $J$ is a function of $(w,b)$. That is, the value of the cost $J(w,b)$ depends on the value of $(w,b)$.
  
    - The choice of $(w,b)$ that fits our data the best is the one that has the smallest cost $J(w,b)$.


- To find the values $(w,b)$ that gets the smallest possible cost $J(w,b)$, we can use a method called **gradient descent**. 
  - With each step of gradient descent, our parameters $(w,b)$ come closer to the optimal values that will achieve the lowest cost $J(w,b)$.
  

- The trained linear regression model can then take the input feature $x$ and output a prediction $f_{w,b}(x)$.

<a name="5"></a>
## 5 - Compute Cost

Gradient descent involves repeated steps to adjust the value of our parameter $(w,b)$ to gradually get a smaller and smaller cost $J(w,b)$.
- At each step of gradient descent, it will be helpful for us to monitor our progress by computing the cost $J(w,b)$ as $(w,b)$ gets updated. 
- In this section, we will implement a function to calculate $J(w,b)$ so that we can check the progress of our gradient descent implementation.

#### Cost function
for one variable, the cost function for linear regression $J(w,b)$ is defined as

$$J(w,b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})^2$$ 

- we can think of $f_{w,b}(x^{(i)})$ as the model's prediction
- $m$ is the number of training examples in the dataset

#### Model prediction

- For linear regression with one variable, the prediction of the model $f_{w,b}$ for an example $x^{(i)}$ is representented as:

$$ f_{w,b}(x^{(i)}) = wx^{(i)} + b$$

This is the equation for a line, with an intercept $b$ and a slope $w$


In [None]:
class LR:
    def __init__(self, lr=0.01, iters=1000):
        self.lr = lr
        self.epochs = iters
        self.weight = 0.0
        self.bias = 0.0
        self.mse_history = []

    def train(self, X, y):
        np.random.seed(42)
        self.weight = np.random.randn()
        self.bias = np.random.randn()
        m = X.shape[0]

        for _ in range(self.epochs):
            y_pred = self.weight * X + self.bias
            error = y_pred - y

            dj_dw = (1/m) * np.dot(error, X)
            dj_db = (1/m) * np.sum(error)

            self.weight -= self.lr * dj_dw
            self.bias -= self.lr * dj_db

            cost = (1/(2*m)) * np.sum(error**2)
            self.mse_history.append(cost)

    def predict(self, X):
        return self.weight * X + self.bias

    def get_cost_history(self):
        return self.mse_history
