<a href="https://colab.research.google.com/github/thomouvic/SENG474/blob/main/LinearRegression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Linear Regression with One Variable

**Example:** Housing Prices (Portland, OR)

<div>
<img src="https://github.com/thomouvic/SENG474/raw/main/images/housing.png" width="500"/>
</div>

**Supervised Learning:**
Given the "right answer" for each example in data (training instance), predict real-valued output (**regression**), or discrete-valued output (**classification**). 

**Formally:**

Training set:

| size (x) | price (y)  |
|--|--|
| 2104 | 460 |
| 1416 | 232 |
| 1534 | 315 |

**Notation:**

$m$: number of training examples (instances)

$x$: input variable/feature

$y$: output variable / target variable / label variable

$(x,y)$: a generic training example

$\left( x^{(i)}, y^{(i)} \right)$: the i-th training example.


**E.g.**

$x^{(1)} = 2104$

$x^{(2)} = 1416$

$y^{(1)} = 460$

$y^{(2)} = 232$



---


<div>
<img src="https://github.com/thomouvic/SENG474/raw/main/images/hypothesis.png" width="300"/>
</div>

$h$ maps from x's to y's.

How to represent $h$?

$$
h_{\theta}(x) = \theta_0 + \theta_1 x
$$

$\theta$ is the parameter vector represented as a *column matrix*:

$$
\theta = 
\begin{bmatrix}
    \theta_0 \\
    \theta_1
\end{bmatrix}
$$


We will learn the best $\theta$. 

A given $\theta$ vector defines a line. What's the best line?

<img src="https://github.com/thomouvic/SENG474/raw/main/images/hypothesis_line.png" width="200"/>
</div>




---


# Cost Function

How to choose parameters $\theta_0, \theta_1$?

Choose $\theta_0, \theta_1$ so that $h_{\theta}(x)$ is close to $y$ for our training examples. 

$$
\min_{\theta_0, \theta_1} \frac{1}{m} \sum_{i=1}^{m} \left( h_{\theta}(x^{(i)})-y^{(i)}  \right)^2
$$

This means: Find the values of $\theta_0, \theta_1$ that minimize the cost function. 

This cost function is also called Mean Squared Error (MSE). We will denote the cost function by $J(\theta_0, \theta_1)$. The variables in this function are $\theta_0, θ_1$. $x$ and $y$ are data, i.e. constant.

So, the cost function is:
$$
J(\theta_0, \theta_1) = \frac{1}{m} \sum_{i=1}^{m} \left( h_{\theta}(x^{(i)})-y^{(i)}  \right)^2
$$
Sometimes, people add a 2 in the denominator: 
$$
J(\theta_0, \theta_1) = \frac{1}{2m} \sum_{i=1}^{m} \left( h_{\theta}(x^{(i)})-y^{(i)}  \right)^2
$$
Of course, the same $\theta_0, \theta_1$ that minimize the second version will also minimize the first version and vice-versa. This is only to make the formulas look nicer. 

## Cost function intuition

Let's consider a simplified hypothesis where $θ_0 = 0$, so we have $h_{\theta}(x) = \theta_1 x$. The cost function is:

$$
J(\theta_1) = \frac{1}{m} \sum_{i=1}^{m} \left( \theta_1 x^{(i)}-y^{(i)}  \right)^2
$$

and we want to solve $\min_{\theta_1} J(\theta_1)$.

### Comparing $h_{\theta}(x)$ and $J(\theta_1)$

<img src="https://github.com/thomouvic/SENG474/raw/main/images/hypothesis_vs_cost.png" width="600"/>
</div>

$h_{\theta}(x)$, for fixed $\theta_1$, is a function of $x$. In the left figure, each line corresponds to a $\theta_1$. There are three training points. The line in the middle has $\theta_1=1$. It makes zero error, i.e. $J(1)=0$. More precisely, 

$$
J(\theta_1) = \frac{1}{2m} \sum_{i=1}^{m} \left( \theta_1 x^{(i)}-y^{(i)}  \right)^2 = \frac{1}{2*3}(0^2+0^2+0^2) = 0
$$

The lower line makes errors. It has $\theta_1=0.5$. The cost for this line is: 
$$
J(0.5) = \frac{1}{2m} \sum_{i=1}^{m} \left( \theta_1 x^{(i)}-y^{(i)}  \right)^2 = \frac{1}{2*3}((0.5-1)^2+(1-2)^2+(1.5-3)^2) = 0.58 
$$ 

The uppper line also makes errors. It has $\theta_1=1.5$. The cost for this line is also 0.58. 

Exercise: What is the cost value for $\theta_1=0$?

We plot these cost values on the chart on the right. Each line on the left becomes one point in the right. The shape of the graph in the right if we repeat the process for many lines in the left will be a parabola as shown there.

---

Now suppose $\theta_0 \neq 0$. We can plot $J()$ in 3D:

<img src="https://github.com/thomouvic/SENG474/raw/main/images/cost_in_3d.png" width="500"/>
</div>

Each point on the 3D surface corresponds to a line in 2D. The 3D surface has a bowl shape. For a fixed $\theta_0$, we still get parabola as in the case when $\theta_0=0$. 

Contour plots are a way to show three-dimensional data on a two-dimensional graph. They use lines or colors to represent different values of a third variable. 

<img src="https://github.com/thomouvic/SENG474/raw/main/images/countour.png" width="400"/>
</div>

Each contour point corresponds to pair of $\theta_0, \theta_1$, i.e. to a line. 

All the points on a contour line have the same $J$ value. 

Point $\theta_0 = 230, \theta_1=0.15$ is right at the center of contour circles and is the point that minimizes $J(\theta_0, \theta_1)$.



