![alt text](images/BostonHousing.png "comparators")
##  Boston Housing Data

Sources:
   (a) Origin:  This dataset was taken from the StatLib library which is
                maintained at Carnegie Mellon University.
   (b) Creator:  Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the 
                 demand for clean air', J. Environ. Economics & Management,
                 vol.5, 81-102, 1978.
   (c) Date: July 7, 1993

Past Usage:
   -   Used in Belsley, Kuh & Welsch, 'Regression diagnostics ...', Wiley, 
       1980.   N.B. Various transformations are used in the table on
       pages 244-261.
    -  Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning.
       In Proceedings on the Tenth International Conference of Machine 
       Learning, 236-243, University of Massachusetts, Amherst. Morgan
       Kaufmann.

Relevant Information:

   Concerns housing values in suburbs of Boston.

Number of Instances: 506

Number of Attributes: 13 continuous attributes (including "class"
                         attribute "MEDV"), 1 binary-valued attribute.

Attribute Information:

- CRIM      per capita crime rate by town
- ZN        proportion of residential land zoned for lots over 25,000 sq.ft.
- INDUS     proportion of non-retail business acres per town
- CHAS      Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
- NOX       nitric oxides concentration (parts per 10 million)
- RM        average number of rooms per dwelling
- AGE       proportion of owner-occupied units built prior to 1940
- DIS       weighted distances to five Boston employment centres
- RAD       index of accessibility to radial highways
- TAX       full-value property-tax rate per 10,000
- PTRATIO   pupil-teacher ratio by town
- B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
- LSTAT    % lower status of the population
- MEDV     Median value of owner-occupied homes in $1000's

Missing Attribute Values:  None.




### Objectives:

Regression
- Simple linear regression using Scitkit-Learn
- Graphics using Matplotlib with plots and scatter plots
- Simple Liner Regression using statsmodels api
- Multiple Regression using statsmodels formula.api
- Cross Validation using k-folds

In [None]:
%matplotlib inline


import matplotlib.pyplot as plt
import numpy as np
from sklearn.ensemble import RandomForestRegressor, AdaBoostRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error, explained_variance_score, r2_score
from sklearn.utils import shuffle
from sklearn import datasets, linear_model
import pandas as pd
plt.style.use('fivethirtyeight')

In [None]:
boston = datasets.load_boston() # Call the boston dataset from sklearn
columns = boston.feature_names
df_boston = pd.DataFrame(boston.data, columns=columns) # load the dataset as a pandas data frame
y = boston.target # define the target variable (dependent variable) as y

In [None]:
df_boston.head()

In [None]:
y

In [None]:
boston_rooms = df_boston[[5]]

In [None]:
boston_rooms.head()

In [None]:
y = y.reshape(-1,1)

In [None]:
y

In [None]:
type(y)

In [None]:
df_rooms = df_boston[[5]]

In [None]:
df_rooms.head()

In [None]:
plt.scatter(df_rooms, y)
plt.ylabel('Value of a home')
plt.xlabel('Number of Rooms')
plt.title('Boston home prices')

In [None]:
reg = linear_model.LinearRegression()

In [None]:
reg.fit(df_rooms, y)

In [None]:
plt.scatter(df_rooms, y, color='green')

In [None]:
predictions = reg.predict(df_rooms)

In [None]:
plt.scatter(df_rooms, y, color='green')
plt.plot(df_rooms, predictions, color='black')

In [None]:
plt.style.use('dark_background')
plt.scatter(y, predictions, color='blue')

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
df_boston.head(15)

In [None]:
df_boston['Price'] = boston.target

In [None]:
df_boston.head(5)

In [None]:
df_boston.describe()

In [None]:
X = df_boston.drop('Price', axis=1)
Y = df_boston['Price']

In [None]:
X.head()

In [None]:
Y.head()

In [None]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.3, random_state = 42)

In [None]:
X_train.head(15)

In [None]:
Y_train.head(15)

In [None]:
reg_all = linear_model.LinearRegression()

In [None]:
reg_all.fit(X_train, Y_train)

In [None]:
y_pred = reg_all.predict(X_test)

In [None]:
reg_all.score(X_test, Y_test)

In [None]:
reg_all.coef_

## xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
## Multiple Regression

using statsmodel for multiple independent variables


In [None]:
import statsmodels
import statsmodels.api as sm
import statsmodels.formula.api as smf

In [None]:
results = sm.OLS(df_rooms, y).fit()
results.summary()

In [None]:
results = sm.OLS(X_test['RM'], Y_test).fit()
results.summary()

In [None]:
est = smf.ols(formula='Price ~ CRIM + ZN + INDUS + CHAS + NOX + RM + AGE + DIS + RAD + TAX + PTRATIO + B + LSTAT', data=df_boston).fit()

In [None]:
est.summary()

## xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
## Lasso Regression
#### Lasso: 
- It arbitrarily selects any one feature among the highly correlated ones and reduced the coefficients of the rest to zero. Also, the chosen variable changes randomly with change in model parameters. This generally doesn’t work that well as compared to ridge regression.
- Since it provides sparse solutions, it is generally the model of choice (or some variant of this concept) for modelling cases where the #features are in millions or more. In such a case, getting a sparse solution is of great computational advantage as the features with zero coefficients can simply be ignored.

In [None]:
from sklearn.linear_model import Lasso

In [None]:
lasso = Lasso(alpha=0.1, normalize=True)

In [None]:
lasso.fit(X_train, Y_train)

In [None]:
lasso_pred = lasso.predict(X_test)

In [None]:
lasso.score(X_test, Y_test)

In [None]:
df_boston.head()

In [None]:
names = df_boston.columns

In [None]:
names

In [None]:
lasso_coef = lasso.fit(X, y).coef_

In [None]:
lasso_coef

In [None]:
df_lasso = pd.DataFrame([lasso_coef])

In [None]:
df_lasso

In [None]:
df_lasso.columns = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX',
       'PTRATIO', 'B', 'LSTAT']

In [None]:
df_lasso

## xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
## Ridge Regression
#### Ridge: 
- It generally works well even in presence of highly correlated features as it will include all of them in the model but the coefficients will be distributed among them depending on the correlation.
- It is majorly used to prevent overfitting. Since it includes all the features, it is not very useful in case of exorbitantly high #features, say in millions, as it will pose computational challenges.

In [None]:
from sklearn.linear_model import Ridge

In [None]:
ridge = Ridge(alpha=0.1, normalize=True)

In [None]:
ridge.fit(X_train, Y_train)

In [None]:
ridge_pred = ridge.predict(X_test)

In [None]:
ridge.score(X_test, Y_test)

In [None]:
ridge_coef = ridge.fit(X, y).coef_

In [None]:
df_ridge = pd.DataFrame(ridge_coef)

In [None]:
df_ridge.columns = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX',
       'PTRATIO', 'B', 'LSTAT']

In [None]:
df_ridge

In [None]:
import seaborn as sns
colormap = plt.cm.viridis
plt.figure(figsize=(14,12))
plt.title('Pearson Correlation of Features', y=1.05, size=15)
sns.heatmap(df_boston.astype(float).corr(),linewidths=0.1,vmax=1.0, square=True, cmap=colormap, linecolor='white', annot=True)

## xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
## Decision Tree Regressor and Adaboost

#### Decision Tree Regressor
- Decision tree builds regression or classification models in the form of a tree structure. It brakes down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. 
- The final result is a tree with decision nodes and leaf nodes. 
- A decision node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast and Rainy), each representing values for the attribute tested. Leaf node (e.g., Hours Played) represents a decision on the numerical target. 
- The topmost decision node in a tree which corresponds to the best predictor called root node. Decision trees can handle both categorical and numerical data. 

#### Adaboost
- Boosting is a general ensemble method that creates a strong classifier from a number of weak classifiers.
- This is done by building a model from the training data, then creating a second model that attempts to correct the errors from the first model. Models are added until the training set is predicted perfectly or a maximum number of models are added.
- AdaBoost was the first really successful boosting algorithm developed for binary classification. It is the best starting point for understanding boosting.

In [None]:
dt_regressor = DecisionTreeRegressor(max_depth=4)
dt_regressor.fit(X_train, Y_train)

In [None]:
y_pred_dt = dt_regressor.predict(X_test)
mse = mean_squared_error(Y_test, y_pred_dt)
evs = explained_variance_score(Y_test, y_pred_dt)

In [None]:
print('\n#### Decision Tree performance ####')
print('Mean squared error =', round(mse, 2))
print('Explained variance score =', round(evs, 2))

In [None]:
ab_regressor = AdaBoostRegressor(DecisionTreeRegressor(max_depth=4), n_estimators=400, random_state=7)
ab_regressor.fit(X_train, Y_train)

In [None]:
y_pred_ab = ab_regressor.predict(X_test)
mse_a = mean_squared_error(Y_test, y_pred_ab)
evs_a = explained_variance_score(Y_test, y_pred_ab)


print('\n#### AdaBoost performance ####')
print('Mean squared error =', round(mse_a, 2))
print('Explained variance score =', round(evs_a, 2))

In [None]:
def plot_feature_importances(feature_importances, title, feature_names):
    # Normalize the importance values
    feature_importances = 100.0 * (feature_importances / max(feature_importances))

    # Sort the values and flip them
    index_sorted = np.flipud(np.argsort(feature_importances))

    # Arrange the X ticks
    pos = np.arange(index_sorted.shape[0]) + 0.5

    # Plot the bar graph
    plt.figure()
    plt.bar(pos, feature_importances[index_sorted], align='center')
    plt.xticks(pos, feature_names[index_sorted])
    plt.ylabel('Relative Importance')
    plt.title(title)
    plt.show()

In [None]:
dt_regressor.feature_importances_

In [None]:
ab_regressor.feature_importances_

In [None]:
plt.style.use('fivethirtyeight')
plot_feature_importances(dt_regressor.feature_importances_,'Decision Tree regressor', boston.feature_names)
plot_feature_importances(ab_regressor.feature_importances_,'AdaBoost regressor', boston.feature_names)