# Topic 1 -- Linear Regression

Welcome everyone! In this workshop, we will learn about one of the fundamental algorithms of supervised learning: Linear Regression. This notebook contains a step-by-step guide on implementing linear regression in SciKitLearn, as well as a collection of pictures and interactive graphs that will give you an intuition on the idea of linear regression.

## Table of Contents ...

## Before We Begin...

There are some **components** that must be downloaded before we continue on with the notebook. First, your instructor would tell you to zip the `datasets` folder and upload it to **Google Colab**. Once uploaded, please run the cell below to **unzip** the archive as well as download some other modules needed.

In [1]:
!unzip assets.zip
!pip install numpy sklearn pandas matplotlib bokeh
from disp_utils import *

Archive:  assets.zip
You should consider upgrading via the '/home/zonyyu/Documents/Programming Projects/Project Files/M2MTech/Beginner AI Course/workshop-1-linear-regression/env/bin/python3 -m pip install --upgrade pip' command.[0m


## What is Linear Regression?

Linear regression is the simplest form of supervised learning, and is used to tackle regression problems. Regression problems are problems where you try to predict a continuous output given various input features. Examples include predicting the height of someone given their age, ethnicity, and biological sex. Notice that height can be any value within some reasonable range (reasonable as in there are no humans that are 10 meters tall), or it could be that you predict the age of stones given the amount of carbon-14 present. All these problems can be solved using linear regression.

### The Car Price Problem

Say you are given a **dataset** containing the **milage** of cars as well as their respective prices. Your task is to **train** a model from this dataset so that in the future, you can input a certain milage, and the model will return the price of that car. The dataset looks something like this:


In [2]:
show_price_vs_mileage()

Disregarding some outliers, you can see that there is a downwards trend to the data. Now the question is ***How do we predict the price of a car given its mileage in km?*** Linear regression will try to predict a function that best **fits** the data. In the future, we will use this word \"**fit**\" to reference the training of a machine learning model. And that is really what \"training a ML algorithm\" is doing -- it is trying to fit a function to the training data! For the most basic linear regression problem, we have one input variable $x$, and one output variable $y$. The function that linear regression will be fitting is shown below:


<div style="text-align: center">
    <div>&nbsp;</div>
    $\hat{y} = wx + b$
</div>

Where:
- $\hat{y}$ is your hypothesis function, AKA the function you are trying to fit
- $x$ is the input data
- $w$ is the weight (slope)
- $b$ is the bias (y-intercept)

In linear regression, the weights and bias are the **parameters** we get to tinker with. By changing the weights and bias, you change the nature of the function. See for your self by playing around with the weight and bias sliders below, and try to fit the function to the data:


In [3]:
show_pvm_with_sliders()

interactive(children=(FloatSlider(value=0.05, description='w', max=0.3, min=-0.3, step=0.001), IntSlider(value…

By playing around with the sliders, you have fit a function to the data. This is basically what all supervised learning algorithms are doing, just that usually they are fitting much, **MUCH** more complicated functions. Once you fit a function, you can use that function to make predictions! For example, if you input a mileage of $300,000$ km, this function will output a price of around \$5000.

### Finding the Optimal $w$ and $b$

You might have the question: *How does a machine figure out which $w$ and $b$ are the correct values?*