# LINEAR REGRESSION

**Example: Linear regression - house sizes and prices**

![Screenshot%202024-01-21%20201123.png](attachment:Screenshot%202024-01-21%20201123.png)

- We can build a linear regression model from this dataset, that will fit a straight line to the data
- Based on this, a house that is 1250 sq feet will worth 220k dollars
- This is an example of a supervised learning model
- Linear regression is one type of regression model

**Regression vs. Classification model**
- Classification model: 
 Predicts categories or discrete categories, such as predicting if a picture is of a cat, meow or a dog, woof, or if given medical record, it has to predict if a patient has a particular disease
 There are only a small number of possible outputs. If your model is recognizing cats versus dogs, that's two possible outputs, so there's a discrete, finite set of possible outputs 
- Regression model:
 Predicts numbers 
 There are infinitely many possible numbers that the model could output

**Terminology:**

![Screenshot%202024-01-21%20202034.png](attachment:Screenshot%202024-01-21%20202034.png)

- Training set: data used to train the model, denoted as 'x'
- Notation:
+ x = "input" variable feature
+ y = "output"/"target" variable
+ m = number of training examples
+ (x, y): a single traning example
+ (x(i), y(i)): i'th training example

![Screenshot%202024-01-21%20202829.png](attachment:Screenshot%202024-01-21%20202829.png)

- To train the model, you feed the training set, both the input features and the output targets to your learning algorithm
- Then your supervised learning algorithm will produce some function (or hypothesis), denoted as 'f'
- function f takes the input x and estimate or a prediction, which I'm going to call y-hat
- y-hat is the estimate or the prediction for y, where y are the actual values

**How to represent f**

![Screenshot%202024-01-21%20203211.png](attachment:Screenshot%202024-01-21%20203211.png)

- f is denoted as f(x) = wx + b
- This function is called linear regression, specifically this is linear regression with one variable
- Another name for a linear model with one input variable is univariate linear regression

# COST FUNCTION

**Training Set**

![Screenshot%202024-01-22%20235151.png](attachment:Screenshot%202024-01-22%20235151.png)

- Model: $f_{w, b}(x)$ = wx + b
- w, b: parameters (or coefficients)

**What does w and b do**

![Screenshot%202024-01-22%20235619.png](attachment:Screenshot%202024-01-22%20235619.png)

![Screenshot%202024-01-22%20235718.png](attachment:Screenshot%202024-01-22%20235718.png)

 - For a given input $x^{(i)}$, the function f also makes a predictive value for y, denoted as $y_hat^{(i)}$:
 + $y_{hat}^{(i)} = f_{w, b}(x^{(i)})$
 + $f_{w, b}(x^{(i)})$ = w$x^{(i)}$ + b

**Cost Function: Squared Error Cost Function**

$$J_{w, b} = \frac{1}{2m} \sum_{i=1}^{m} (y_{\text{hat}}^{(i)} - y^{(i)})^2$$
where:
- $(y_{\text{hat}}^{(i)} - y^{(i)})$: error of prediction and actual at (i)
- m: number of training examples

- This function calculated average the sum of errors, denoted by $J_{w, b}$
- In machine learning different people will use different cost functions for different applications, but the squared error cost function is by far the most commonly used one for linear regression 
- We can rewrite the function above as:
$$J_{w, b} = \frac{1}{2m} \sum_{i=1}^{m} (f_{w, b}(x^{(i)}) - y^{(i)})^2$$

# COST FUNCTION INTUITION

**Cost Function Recall**
- Given the model $f_{w, b}(x) = wx + b$
- The model changes for the values of: w, b
- Measure the error (aka cost function) by the formula: $$J_{w, b} = \frac{1}{2m} \sum_{i=1}^{m} (f_{w, b}(x^{(i)}) - y^{(i)})^2$$
- To $minimize_{w, b}J(w, b)$ 

**Simplified**
- $f_{w}(x) = wx$
- Cost function: $$J_{w} = \frac{1}{2m} \sum_{i=1}^{m} (f_{w}(x^{(i)}) - y^{(i)})^2$$
- Find minimize$_{w}J(w)$

**Example:**

![Screenshot%202024-03-16%20204246.png](attachment:Screenshot%202024-03-16%20204246.png)

At w = 1:
- $f_{w}(x)$ correctly predicts the data set, leading the cost funtion to be 0
- Similarly on the graph of J(w), the value of J(w) is 0

![Screenshot%202024-03-16%20204509.png](attachment:Screenshot%202024-03-16%20204509.png)

At w = 0.5:
- $f_{w}(x) yields the value of the cost function at 0.58
- So J(0.5) = 0.58

![Screenshot%202024-03-16%20204618.png](attachment:Screenshot%202024-03-16%20204618.png)

At w = 0, the value is as follows

- Each different value of w corresponds to a different straight line fit
- We can continue computing the cost function for different w's to trace what function J looks like
- So choosing w that gives the smallest J(w) as possible

# VISUALIZING THE COST FUNCTION

![image.png](attachment:image.png)

- Now, the cost function J takes two variables, w and b
- This results to the 3D surface plot, with w, b, and J(w, b) as the variables

![image.png](attachment:image.png)

- The contour plot takes two variables, w and b
- We slice through the 3D model, result in these many eclipses, with the smallest one being minimized