# Basic Regression

Here we will get few simple functions that are used in ML *every time*

In [None]:
import numpy as np
#Our input data ;-)
x = np.array([1.,2.,3,4,5,6,7,8,9])
y = np.array([1,3,2,3,5,4,6,4,7])

## 1. Calculate Mean and Variance

The first step is to estimate the mean and the variance of data variables.

```
mean(x) = sum(x) / count(x)
```
Below is a function named mean() that implements this behavior for a list of numbers.

In [None]:
# Calculate the mean value of a list of numbers
def mean(values):
    return sum(values) / float(len(values))

In [None]:
mean(x)

In [None]:
mean(y)

The variance is the sum squared difference for each value from the mean value.

Variance for a list of numbers can be calculated as:
```
variance = sum( (x - mean(x))^2 )
```


In [None]:
def variance(val):
    m = mean(val)
    var = .0
    for x in val:
        var = var + (x-m)**2
    return var

In [None]:
variance(x)

## 2. Calculate Covariance

The covariance of two groups of numbers describes how those numbers change together.

Covariance is a generalization of correlation. Correlation describes the relationship between two groups of numbers, whereas covariance can describe the relationship between two or more groups of numbers.

Additionally, covariance can be normalized to produce a correlation value.

Nevertheless, we can calculate the covariance between two variables as follows:

```
covariance = sum((x(i) - mean(x)) * (y(i) - mean(y)))
```

Below is a function named covariance() that implements this statistic. 


In [None]:
# Calculate covariance between x and y
def covariance(x, y):
    mean_x = mean(x)
    mean_y = mean(y)
    covar = 0.0
    for i in range(len(x)):
        covar += (x[i] - mean_x) * (y[i] - mean_y)
    return covar


In [None]:
covar = covariance(x, y)
print('Covariance: %.3f' % (covar))


## 3. Estimate Coefficients
We must estimate the values for two coefficients in simple linear regression.

The first is W which can be estimated as:

```
W = sum((x(i) - mean(x)) * (y(i) - mean(y))) / sum( (x(i) - mean(x))^2 )
```
We have learned some things above and can simplify this arithmetic to:

```
W = covariance(x, y) / variance(x)
```
We already have functions to calculate covariance() and variance().

Next, we need to estimate a value for b, also called the intercept as it controls the starting point of the line where it intersects the y-axis.

```
b = mean(y) - W * mean(x)
```
Again, we know how to estimate W and we have a function to estimate mean().

We can put all of this together into a function named coefficients() that takes the dataset as an argument and returns the coefficients.

In [None]:
# Calculate coefficients
def coefficients(x,y):
    x_mean, y_mean = mean(x), mean(y)
    W = covariance(x, y) / variance(x)
    b = y_mean - W * x_mean
    return [W, b]

In [None]:
W, b = coefficients(x,y)
print('Coefficients: W=%.3f, b=%.3f' % (W,b))

Now we can check if everything is working as expected!

Lets build out prediction function.

In [None]:
def predict(x,W,b):
    p = list()
    for i in x:
        y = W*i + b
        p.append(y)
    return p

In [None]:
y_ = predict(x,W,b)

Lets display our output data!

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.scatter(x, y , color='#2F08EC', marker="s")
plt.scatter(x, y_ , color='#FF082C')
plt.plot(x, y_ , color='#FF0000')

We just build linear regression!

Lets make it more "offcial".

We will also add in a function to manage the evaluation of the predictions called evaluate_algorithm() and another function to estimate the Root Mean Squared Error of the predictions called rmse_metric().



In [None]:
from math import sqrt
# Calculate root mean squared error
def rmse_metric(actual, predicted):
    sum_error = 0.0
    for i in range(len(actual)):
        prediction_error = predicted[i] - actual[i]
        sum_error += (prediction_error ** 2)
    mean_error = sum_error / float(len(actual))
    return sqrt(mean_error)
 

In [None]:
# Evaluate regression algorithm on training dataset
def evaluate_algorithm(x, y, algorithm):
    predicted = algorithm(x, y)
    print(predicted)
    rmse = rmse_metric(y, predicted)
    return rmse,predicted
 

In [None]:
# Simple linear regression algorithm
def simple_linear_regression(x,y):
    predictions = list()
    W, b = coefficients(x,y)
    for row in x:
        yhat = b + W * row
        predictions.append(yhat)
    return predictions

In [None]:
rmse,predictions = evaluate_algorithm(x, y, simple_linear_regression)
print('RMSE: %.3f' % (rmse))

In [None]:
y

In [None]:
predictions

## Use scikit library

We can do the same using scikit-learn!

One trick here - scikit needs x to be 2D input - so we expand our x using numpy.newaxis

In [None]:

from sklearn.linear_model import LinearRegression
model = LinearRegression(fit_intercept=True)

model.fit(x[:, np.newaxis], y)

xfit = np.linspace(0, 10, 1000)
yfit = model.predict(xfit[:, np.newaxis])

plt.scatter(x, y)
plt.plot(xfit, yfit, color='#FF0000');

In [None]:
model

In [None]:
print(model.intercept_)
print(model.coef_)

print('Coefficients: W=%.3f, b=%.3f' % (model.coef_[0],model.intercept_))