# Linear regression

This notebook contains theory and an implementation of linear regression using numpy.

## Theory

In linear regression, a straight line is fit to the data. The $ith$ input
vector is denoted by $\pmb{x}^{(i)}$ and the corresponding output (also called
label) is denoted by $y^{(i)}$. The goal of linear regression is to optimize
the parameters $\theta_j$, with j ranging from zero to the feature size, in 
the function

$$\hat{y}^{(i)} = \sum_{j=0} \theta_j x^{(i)}_j = \Theta \cdot \pmb{x}^{(i)}$$

For convience is the first element of $\pmb{x}^{(i)}$ equal to one.
In order to quantify how good the parameters are, a cost function is used. 
One example is the mean squared error defined as 

$$ J\left(\Theta \right) = \frac{1}{2m} \sum^m_{i=0} \left(\hat{y}^{(i)} - y^{(i)}\right)^2 $$

where m is the number of training examples.
The best parameters of the linear regression model are found by minimizing the 
cost function. This can be done by using gradient descent, which is an iterative 
algorithm.

$Repeat\ until\ convergence:\\$
$ \theta_j $ &colone; $ \theta_j - \alpha \frac{\partial}{\partial \theta_j} J\left(\theta_0, \theta_1\right) \qquad$  with  $\ j \in [0, 1]$

where $\alpha$ is the learning rate. Working out the derivative of the cost function gives

$Repeat\ until\ convergence:\\$
$\theta_j$ &colone; $ \theta_j - \frac{\alpha}{m} \sum_{i=0} \left(\hat{y}^{(i)} - y^{(i)}\right)x^{(i)}_j$



## Implementation

In [1]:
import numpy as np
import plotly.graph_objects as go
import random

In [27]:
def create_dataset(datapoints, variance, correlation=None, step=2):
    """
    Create a random dataset

    :param datapoints: number of datapoints
    :param variance: the amount of variance in the dataset
    :param correlation: either 'pos' or 'neg' (default is None)
    :param step: determines the slope of the correlation (default is 2)
    """
    val = 1
    points = []

    for x in range(datapoints):
        y = val + random.randrange(-variance, variance)
        X = np.asarray([1, x])
        points.append((X, y))
        if correlation == "pos":
            val += step
        elif correlation == "neg":
            val -= step

    return np.asarray(points, dtype=object)

In [28]:
def compute_MSE(theta, points):
    """
    Compute the Mean Square Error

    :param theta: regression parameters
    :param points: points to fit a straight line
    """

    totalError = sum([((theta @ x) - y) ** 2 for x, y in points])

    return totalError / len(points)

In [48]:
def gradient_descent(
    points, learning_rate, num_iterations, threshold=1e-3
):
    """
    Use gradient descent to optimize regression parameters theta in order to find the best straight 
    line for the given points

    :param points: points to fit the line
    :param learning_rate: learning rate used in the algorithm
    :param num_iterations: maximum number of iterations
    :param threshold: minimum difference between two sequential mean squared 
        error values (default is 1e-3)
    """

    # Init values
    theta = np.zeros(points[0, 0].shape[0])
    m = len(points)
    iteration = 0

    J = compute_MSE(theta, points)
    prev_J = np.inf

    # Loop until convergence or maximum number of iterations is reached
    while iteration < num_iterations and np.all(np.abs(J - prev_J) > threshold):
        
        new_thetas = np.zeros(len(theta))
        prev_J = J

        # Compute new theta's using the gradient
        for j, theta_j in enumerate(theta):
            new_thetas[j] = theta_j - learning_rate/m * np.sum([(theta @ x - y)*x[j] for x, y in points])

        theta = new_thetas
        # Compute new MSE
        J = compute_MSE(theta, points)
        iteration += 1

    return theta

In [49]:
def fit(points):
    """
    Plot the regression line of the given points using gradient descent

    :param points: datapoints to fit a straight line
    """

    # hyperparameters
    learning_rate = 0.001

    # Optimize m and b using gradient descent
    theta = gradient_descent(
        points, learning_rate, num_iterations=1000
    )

    # Plot the data and the regression line
    xs, ys = points[:, 0], points[:, 1]
    regression_line = [theta @ x for x in xs]
    
    fig = go.Figure()

    fig.add_traces([
        go.Scatter(x=[x[1] for x in xs], y=ys, mode='markers'),
        go.Scatter(x=[x[1] for x in xs], y=regression_line, mode='lines', name=f"{theta[1]:.3f}x + {theta[0]:.3f}")
    ])

    fig.update_layout(autosize=False, width=700, height=500, margin=dict(l=60, r=50, b=70, t=30),
        xaxis_title='x', yaxis_title='y')

    fig.show()

In [55]:
points = create_dataset(datapoints=40, variance=1, correlation='neg')
fit(points)

In [56]:
points = create_dataset(datapoints=40, variance=1, correlation='pos')
fit(points)

In [57]:
points = create_dataset(datapoints=40, variance=10, correlation='neg')
fit(points)

In [59]:
points = create_dataset(datapoints=40, variance=10)
fit(points)

source: https://www.coursera.org/specializations/machine-learning-introduction?utm_source=gg&utm_medium=sem&utm_campaign=04-CourseraPlus-EU&utm_content=B2C&campaignid=13520447723&adgroupid=124369969820&device=c&keyword=coursera&matchtype=b&network=g&devicemodel=&adpostion=&creativeid=527622276210&hide_mobile_promo=&gclid=EAIaIQobChMIv_GT2NbQ-AIVgo9oCR0p1QN0EAAYASAAEgI4IPD_BwE