# Basic Regression


Generally Machine learning allows us do the tasks that are too diﬃcult to solve with ﬁxed programs written/designed by humans.

For example, if we want a robot to be able to walk, then we could program the robot to learn to walk, or we could attempt to directly write a program that speciﬁes how to walk manually. Machine learning are usually described in terms of how the machine learning system should process an example. An example is a collection of features that have been measured from some object/event that we want the machine learning system to process. For example,the features of an image are usually the values of the pixels in the image.

Linear Regression: is to predict a numerical value given some input. 

An example of a regression task is the prediction of the expected claim amount that an insured person will make (used to set insurance premiums), or the prediction of future prices of securities. These kinds of predictions are also used for algorithmic trading.


Here we will get few simple functions that are used in ML *every time*


## Simple start

Our assumption here is that we are dealing with very simple linear data that could be described as:
```
 y = W*x + b
```

In [None]:
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plt

import seaborn; seaborn.set()  # set plot style

#Our input data ;-)
x = np.array([1.,2.,3,4,5,6,7,8,9], float)
y = np.array([1,3,2,3,5,4,6,4,7], float)

## 1. Calculate Mean and Variance

The first step is to estimate the mean and the variance of data variables.

```
mean(x) = sum(x) / count(x)
```
Below is a function named mean() that implements this behavior for a list of numbers.

In [None]:
# Calculate the mean value of a list of numbers
def mean(values):
    return sum(values) / float(len(values))

In [None]:
mean(x)

In [None]:
mean(y)

The variance is the sum squared difference for each value from the mean value.

Variance for a list of numbers can be calculated as:
```
variance = sum( (x - mean(x))^2 )
```


**Variance** is the expectation of the squared deviation of a random variable from its mean. Informally, it measures how far a set of numbers are spread out from their average value. Variance has a central role in statistics, where some ideas that use it include descriptive statistics, statistical inference, hypothesis testing, goodness of fit, and Monte Carlo sampling. Variance is an important tool in the sciences, where statistical analysis of data is common. The variance is the square of the standard deviation, the second central moment of a distribution, and the covariance of the random variable with itself.




![Example of samples from two populations with the same mean but different variances. The red population has mean 100 and variance 100 (SD=10) while the blue population has mean 100 and variance 2500 (SD=50).](../images/variance.png)
Example of samples from two populations with the same mean but different variances. The red population has mean 100 and variance 100 (SD=10) while the blue population has mean 100 and variance 2500 (SD=50).


In [None]:
def variance(val):
    # challenge! what is mean of x
    mean_x = None 
    var = .0
    for x in val:
        #challenge! tip:remember ^2 in python is **2
        var += None 
    return var

In [None]:
variance(x)

expected: 60.0

## 2. Calculate Covariance

The covariance of two groups of numbers describes how those numbers change together.

Covariance is a generalization of correlation. Correlation describes the relationship between two groups of numbers, whereas covariance can describe the relationship between two or more groups of numbers.

Additionally, covariance can be normalized to produce a correlation value.

Nevertheless, we can calculate the covariance between two variables as follows:

```
covariance = sum((x(i) - mean(x)) * (y(i) - mean(y)))
```

Below is a function named covariance() that implements this statistic. 


In [None]:
# Calculate covariance between x and y
def covariance(x, y):
    # challenge : 2 lines on means :-)
    mean_x = None
    mean_y = None
    
    covar = 0.0
    for i in range(len(x)):
        # challenge! i is your index so accessing x of index i is x[i] 
        covar += None
    return covar


In [None]:
covar = covariance(x, y)
print('Covariance: %.3f' % (covar))


Expected: 36.000

## 3. Estimate Coefficients
We must estimate the values for two coefficients in simple linear regression.

The first is W which can be estimated as:

```
W = sum((x(i) - mean(x)) * (y(i) - mean(y))) / sum( (x(i) - mean(x))^2 )
```
Looking above we can simplify this to:

```
W = covariance(x, y) / variance(x)
```
We already have functions to calculate covariance() and variance().

Next, we need to estimate a value for b, also called the intercept as it controls the starting point of the line where it intersects the y-axis.

```
b = mean(y) - W * mean(x)
```
Again, we know how to estimate W and we have a function to estimate mean().

We can put all of this together into a function named coefficients() that takes the dataset as an argument and returns the coefficients.

In [None]:
# Calculate coefficients
def coefficients(x,y):
    x_mean, y_mean = mean(x), mean(y)
    # challenge! 2 lines - see above
    W = None
    b = None
    return [W, b]

In [None]:
W, b = coefficients(x,y)
print('Coefficients: W=%.3f, b=%.3f' % (W,b))

Expected: W=0.600, b=0.889

Now we can check if everything is working as expected!

Lets build out prediction function.

In [None]:
def predict(x,W,b):
    p = list()
    for i in x:
        #challenge! do you remember our initial assumption? 
        y = None
        p.append(y)
    return p

In [None]:
y_ = predict(x,W,b)

Lets display our output data!

In [None]:
plt.scatter(x, y , color='#2F08EC', marker="s")
plt.scatter(x, y_ , color='#FF082C')
plt.plot(x, y_ , color='#FF0000')

**We just build linear regression model!**

Lets make it more "offcial".

We will also add in a function to manage the evaluation of the predictions called evaluate_algorithm() and another function to estimate the Root Mean Squared Error of the predictions called rmse_metric().



Root-mean-square error (RMSE) is a frequently used measure of the differences between values (sample and population values) predicted by a model or an estimator and the values actually observed. The RMSE represents the sample standard deviation of the differences between predicted values and observed values. These individual differences are called residuals when the calculations are performed over the data sample that was used for estimation, and are called prediction errors when computed out-of-sample. The RMSE serves to aggregate the magnitudes of the errors in predictions for various times into a single measure of predictive power. RMSE is a measure of accuracy, to compare forecasting errors of different models for a particular data and not between datasets, as it is scale-dependent.

RMSE is the square root of the average of squared errors. The effect of each error on RMSE is proportional to the size of the squared error; thus larger errors have a disproportionately large effect on RMSE. Consequently, RMSE is sensitive to outliers.


![](../images/rsme.svg)

In [None]:
from math import sqrt
# Calculate root mean squared error
def rmse_metric(actual, predicted):
    sum_error = 0.0
    for i in range(len(actual)):
        prediction_error = predicted[i] - actual[i]
        sum_error += (prediction_error ** 2)
    mean_error = sum_error / float(len(actual))
    return sqrt(mean_error)
 

In [None]:
# Evaluate regression algorithm on training dataset
def evaluate_algorithm(x, y, algorithm):
    predicted = algorithm(x, y)
    rmse = rmse_metric(y, predicted)
    return rmse,predicted
 

In [None]:
# Simple linear regression algorithm
def simple_linear_regression(x,y):
    predictions = list()
    W, b = coefficients(x,y)
    for row in x:
        yhat = W * row + b
        predictions.append(yhat)
    return predictions

In [None]:
rmse,predictions = evaluate_algorithm(x, y, simple_linear_regression)
print('RMSE: %.3f' % (rmse))

In [None]:
y

In [None]:
predictions

## Use scikit library

We can do the same using scikit-learn!

*One trick here - scikit needs x to be 2D input - so we expand our x using numpy.newaxis*

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
model = LinearRegression(fit_intercept=True)

model.fit(x[:, np.newaxis], y)

xfit = x.copy()
yfit = model.predict(x[:, np.newaxis])

In [None]:
plt.scatter(x, y)
plt.plot(x, yfit, color='#FF0000');


In [None]:
model


In [None]:
print('Coefficients: W=%.3f, b=%.3f' % (model.coef_[0],model.intercept_))

print(sqrt(mean_squared_error(y,yfit)))


## And Tensorflow version


Lets see two versions here:
- using LinearRegressor estimator 
- using GradientDescentOptimizer - how this works will be in next lesson


In first example we will se first hyperparameter - epoch. An epoch is a full iteration over samples/train set. The number of epochs is how many times the algorithm is going to run. The number of epochs affects directly (or not) the result of the training step (with just a few epochs you can reach only a local minimum, but with more epochs, you can reach a global minimum or at least a better local minimum).
If you set epoch to too small value - you will not get good results - underfitting, if you set to too big - depends on algorithm, you can get overfitting, but generally bigger is better;-).


*Note: Tensorflow in general feels more complicated as is quite low level, but allows to build more advanced architectures. In future we will see how Keras is used on top of tensorflow to simplify development phase!*

In [None]:
import tensorflow as tf

column =  tf.feature_column.numeric_column('x')
lin_reg = tf.estimator.LinearRegressor(feature_columns=[column])

# Train the estimator
train_input = tf.estimator.inputs.numpy_input_fn(x={"x": x}, y=y, shuffle=False, num_epochs=100, batch_size=1)
lin_reg.train(train_input)

# Make predictions
predict_input = tf.estimator.inputs.numpy_input_fn(x={"x": x}, num_epochs=1, shuffle=False)
results = lin_reg.predict(predict_input)


tf_pred=list()
 # Print result
for value in results:
    print(value['predictions'])
    tf_pred.append(value['predictions'])
        
print(sqrt(mean_squared_error(y,tf_pred)))

In [None]:
# challenge! 
# can you spot difference? if so change epochs to 400 
plt.scatter(x, y)
plt.plot(x, yfit, color='#FF0000');
plt.plot(x, tf_pred);

**Gradient descent** is a first-order iterative optimization algorithm for finding the minimum of a function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point. If instead one takes steps proportional to the positive of the gradient, one approaches a local maximum of that function; the procedure is then known as gradient ascent.

![](../images/gradient1.png)

**Note** Gradient descent work on cost function!

In [None]:
tx = tf.placeholder(tf.float32, [None,1])
ty_ = tf.placeholder(tf.float32, [None,1])

tw = tf.Variable(tf.zeros([1]))
tb = tf.Variable(tf.zeros([1]))

ty = tw*tx + tb

cost  = tf.reduce_mean(tf.square(ty_- ty))

#learning_rate 0.01
train_step = tf.train.GradientDescentOptimizer(0.0001).minimize(cost)

# initialize global variables and start session
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

for epoch in range(100):
    for i in range(len(x)):
        xs = np.array([[x[i]]])
        ys = np.array([[y[i]]])
        sess.run(train_step,feed_dict={ tx: xs, ty_:ys })
    
print("after %d iteration" % epoch)
print("w: %f" % sess.run(tw))
print("b: %f" % sess.run(tb))

tgy = x*sess.run(tw)+sess.run(tb)

In [None]:
# challenge
# can you spot difference? if so change learning_rate to 0.001 and epochs to 1000
plt.scatter(x, y)
plt.plot(x, yfit, color='#FF0000');
plt.plot(x, tgy);

Here we have another hyperparameter: **learning rate**.

It's one of most important parameters - in simple words learning rate determines how fast coefficents (in case of linear regression or logistic regression) or weights (in case of a neural network) change.

If c is a cost function with variables (or weights) w1,w2….wn then,
Lets take stochastic gradient descent where we change weights sample by sample -

```
for every sample(
   w1new = w1 + (learning rate)* (derivative of cost function w1)
)
```

If learning rate is too high derivative may miss the 0 slope point or learning rate is too low then it may take forever to reach that point.

![](../images/gradient2.png)


So we need to figure out that balanced learning rate.  

*Note: On top of gradient descent there are several more advanced optimizers (RMSProp, Adam, Momentum, Adadelta ...) which has additional hyperparameters.*


## H2O version


In [None]:
import h2o
h2o.init(max_mem_size = "2G")             #specify max number of bytes. uses all cores by default.


In [None]:
from h2o.estimators.glm import H2OGeneralizedLinearEstimator
import pandas as pd
#create input dataframe
train = h2o.H2OFrame(pd.DataFrame(data={'x':x,'y':y}))

glm = H2OGeneralizedLinearEstimator()
glm.train(['x'],'y',training_frame=train) #inputs : list of X column, y column, and train frame
glm

In [None]:
h2oY = glm.predict(train)


In [None]:
plt.scatter(x, y)
plt.plot(x, yfit, color='#FF0000');
plt.plot(x, h2oY.as_data_frame().predict)


In [None]:
# Optional challenge:  what parameters can you change: solver, alpha, lambda ...?? 

help(H2OGeneralizedLinearEstimator)