1.  Pull the Boston Housing notebook I've created for this assignment.

2.  Impliment scikit learn's r2 and mse methods to measure the performance of my linear regressor.

3.  Impliment either sklearn.linear_model.Ridge or sklearn.linear_model.Lasso.

4.  Optimize (by reviewing the r2 and mse scores and adjusting the regularization paramater) the regression model you pick.

5.  Turn in the github link to your work 

In [100]:
from sklearn import datasets
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.cross_validation import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
import numpy as np
import math

In [4]:
bean = datasets.load_boston()
print bean.DESCR

Boston House Prices dataset

Notes
------
Data Set Characteristics:  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive
    
    :Median Value (attribute 14) is usually the target

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
      

In [5]:
def load_boston():
    scaler = StandardScaler()
    boston = datasets.load_boston()
    X=boston.data
    y=boston.target
    X = scaler.fit_transform(X)
    return train_test_split(X,y)

The load_boston function takes the boston dataset and separates the data in two ways.  The first is that load_boston()has both data and target attributes making creating X and y easy. Note that a the scaler is already built into sklearn and that the X data set is scaled for us with fit_transform which fits the transformer to X.  Then the train_test_split function is called to return X_train, X_test, y_train, and y_test (in that order). Now we have both, a set of data for our LR to learn from and data to test it against. 

In [20]:
X_train, X_test, y_train, y_test = load_boston()

The train_test_split() randomizes the training and testing sets.  When running the load_boston() and running the linear regression and measuring performance, expect discrepancies between runs.

In [7]:
X_train.shape

(379L, 13L)

In [11]:
X_test.shape

(127L, 13L)

In [27]:
y_train.shape

(379L,)

In [28]:
y_test.shape

(127L,)

In [21]:
clf = LinearRegression()
clf.fit(X_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

LinearRegression() is an 'ordinary least squares linear regression'. 

In [22]:
zip (y_test, clf.predict(X_test))

[(31.199999999999999, 28.773818371197478),
 (19.600000000000001, 19.68296604273441),
 (37.299999999999997, 34.311345425128337),
 (50.0, 33.811933294514034),
 (30.5, 30.46752284747928),
 (23.100000000000001, 9.1089541487993646),
 (15.6, 15.720272076712888),
 (18.399999999999999, 18.38605556393054),
 (33.299999999999997, 36.407232083453088),
 (28.699999999999999, 25.270560926685093),
 (10.5, 5.540196112020066),
 (17.399999999999999, 17.353026029316233),
 (20.5, 20.492023136606811),
 (20.0, 22.220409621931612),
 (23.600000000000001, 31.002771091945302),
 (50.0, 41.196147455822675),
 (27.5, 24.910665603917938),
 (18.600000000000001, 16.773375723172876),
 (23.300000000000001, 26.054210518275404),
 (8.3000000000000007, 12.477265456411889),
 (16.800000000000001, 20.63164894538988),
 (12.5, 19.103410806247496),
 (22.800000000000001, 28.689359760453684),
 (22.5, 28.903889758616224),
 (20.600000000000001, 20.670980855491841),
 (18.100000000000001, 17.706545486600703),
 (23.600000000000001, 29.08

r2 is the r-sqaured score function that scores the accuracy of a regression. The parameters it takes are y_true, y_pred, and sample_weight. All we will need are results from zip(y_test, clf.predict(X_test)) to run r2 to see how the linear regression performed.

In [23]:
r2_score(y_test, clf.predict(X_test))

0.73184372885394744

In [19]:
math.sqrt(mean_squared_error(y_test, clf.predict(X_test)))

4.583909221172794

The R-squared score is 0.732 which is not too bad!  House cost estimation does not need to be precise considering that a final price is always negotiated.  A ballpark number is all that is needed. The root mean squared error is 4.58.

In [24]:
clf.coef_

array([-0.8105614 ,  0.93334006,  0.06284285,  0.6253368 , -1.87563712,
        3.06195251, -0.17073159, -3.09544965,  2.06377857, -1.80216947,
       -2.06302984,  0.80723566, -3.47400453])

Just looking at the coefficients. 

Ridge Regression as defined in the scikit-learn.org site: Ridge regression addresses some of the problems of ordinary least squares by imposing a penalty on the size of coefficients.  the ridge coefficients minimize a penalized residual sum of squares. 

The site continues by saying that this model solves a regression model where the loss function is the linear least squares function and regularization is given by the l2-norm.  Also known as Ridge Regression or Tikhonov regularization.  Regularization (by wikipedia) refers to a process of introducing additional information in order to solve an ill-posed problem or to prevent overfitting. Regularizing the coefficients is just changing alpha or how much our coefficients are going to change. 

In [31]:
clfRidge = Ridge()
clfRidge.fit(X_train, y_train)

Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, solver='auto', tol=0.001)

In [32]:
r2_score(y_test, clfRidge.predict(X_test))

0.73113736785921202

In [33]:
math.sqrt(mean_squared_error(y_test, clfRidge.predict(X_test)))

4.145111836744919

It looks like with an alpha of 1.0 (default) our Ridge linear regression does not appear to perform any better than the original ordinary least square regression.  The values very closely match so let's change the regularization parameter. I'll divide it in ha

In [56]:
clfRidge = Ridge(alpha = 0.1)
clfRidge.fit(X_train, y_train)

Ridge(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, solver='auto', tol=0.001)

In [57]:
clfRidge.score(X_test, y_test) #another way to calculate rsquared!

0.73177258255384947

In [58]:
math.sqrt(mean_squared_error(y_test, clfRidge.predict(X_test)))

4.140212321756897

In [69]:
clfRidge = Ridge(alpha = 0.01)
clfRidge.fit(X_train, y_train)

Ridge(alpha=0.01, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, solver='auto', tol=0.001)

In [60]:
clfRidge.score(X_test, y_test)

0.73183660939164619

In [61]:
math.sqrt(mean_squared_error(y_test, clfRidge.predict(X_test)))

4.139718150567489

In [74]:
clfRidge = Ridge(alpha = 10)
clfRidge.fit(X_train, y_train)

Ridge(alpha=10, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, solver='auto', tol=0.001)

In [75]:
clfRidge.score(X_test, y_test)

0.72538511192273902

In [76]:
clfRidge = Ridge(alpha = 100)
clfRidge.fit(X_train, y_train)
clfRidge.score(X_test, y_test)

0.69406703511719203

Changing our alpha has not made a significant change in our r squared.  It looks like as we approach a smaller and smaller alpha we are converging but as we approach a larger alpha our regression algorithm gets worse and worse.  Let's try the Lasso method now and see what kinds of results we can get! Lasso stands for least absolute shrinkage and selection operator.  This

In [79]:
clfLasso = Lasso()
clfLasso.fit(X_train, y_train)

Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

In [80]:
clfLasso.score(X_test, y_test)

0.64901939271161391

In [81]:
math.sqrt(mean_squared_error(y_test, clfLasso.predict(X_test)))

4.736009774142978

I hope that Lasso and Ridge were not supposed to be better models for this particular problem. Compared to our very first set of results, they do not appear to be more effective. Let's try some other regularizers. 

In [82]:
clfLasso = Lasso(alpha = 0.1)
clfLasso.fit(X_train, y_train)

Lasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

In [83]:
clfLasso.score(X_test, y_test)

0.71136423983232944

In [84]:
math.sqrt(mean_squared_error(y_test, clfLasso.predict(X_test)))

4.2948311831899515

In [85]:
clfLasso = Lasso(alpha = 0.01)
clfLasso.fit(X_train, y_train)

Lasso(alpha=0.01, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

In [86]:
clfLasso.score(X_test, y_test)

0.73033478152108788

In [87]:
math.sqrt(mean_squared_error(y_test, clfLasso.predict(X_test)))

4.1512940478489355

In [93]:
clfLasso = Lasso(alpha = 0.001)
clfLasso.fit(X_train, y_train)

Lasso(alpha=0.001, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

In [94]:
clfLasso.score(X_test, y_test)

0.73169374704770207

In [95]:
math.sqrt(mean_squared_error(y_test, clfLasso.predict(X_test)))

4.14082070802403

Lasso appear to have been greatly impacted by changes in the regularizer.  It does appear that we have been able to get an r squared score better than the 0.73184 from our original linear regression. 