# Results


After running several models Lasso provides the best results for the dataset and the results are shown below:

Lasso parameters:  {'alpha': 1048}

Best Mean Cross-validation score: 88.61%

Lasso Test Performance:  88.66%

Lasso Train Performance: 90.50%

## Data PreProcessing

In [28]:
from math import sqrt
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats

pd.pandas.set_option('display.max_columns', None)
%matplotlib inline

### Load Datasets

In [29]:
# load dataset
# your code here

In [30]:
data = pd.read_csv("C:/Users/tharu/OneDrive/Desktop/MS-Sem 2/Applied Machine Learning/Lectures/Module -2/houseprice.csv")

### Types of variables



In [31]:
# we have an Id variable, that we should not use for predictions:

print('Number of House Id labels: ', len(data.Id.unique()))
print('Number of Houses in the Dataset: ', len(data))

Number of House Id labels:  1460
Number of Houses in the Dataset:  1460


#### Find categorical variables

In [32]:
# find categorical variables- hint data type = 'O'

categorical = [var for var in data.columns if data[var].dtype=='O']

print(f'There are {len(categorical)} categorical variables')

There are 43 categorical variables


#### Find temporal variables

In [33]:
# make a list of the numerical variables first= Hint data type != O
numerical = [var for var in data.columns if data[var].dtype!='O']

# list of variables that contain year information= Hint variable namme has Yr or 
year_vars = [var for var in numerical if 'Yr' in var or 'Year' in var]

year_vars

['YearBuilt', 'YearRemodAdd', 'GarageYrBlt', 'YrSold']

#### Find discrete variables

To identify discrete variables- numerical variables with less than 20 unique values 

In [34]:
# let's visualise the values of the discrete variables
discrete = [var for var in numerical if len(data[var].unique()) < 20 and var not in year_vars]

print(f'There are {len(discrete)} discrete variables')

There are 14 discrete variables


#### Continuous variables

In [35]:
# find continuous variables- hint numerical variables not in discrete and  year_years 
# Also remove the Id variable and the target variable SalePrice
# which are both also numerical

continuous = [var for var in numerical if var not in discrete and var not in [
    'Id', 'SalePrice'] and var not in year_vars]

print('There are {} numerical and continuous variables'.format(len(numerical)))

There are 38 numerical and continuous variables


### Separate train and test set

In [36]:
# Let's separate into train and test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(data.drop(['Id', 'SalePrice'], axis=1),
                                                    data['SalePrice'],
                                                    test_size=0.1,
                                                    random_state=0)

X_train.shape, X_test.shape

((1314, 79), (146, 79))

**Now we will move on and engineer the features of this dataset. The most important part for this course.**

### Craete New Variables

Replace 'YearBuilt', 'YearRemodAdd', 'GarageYrBlt  with time elapsed since YrSold
So YearBuilt = YrSold-YearBuilt. 

Similarly transform 'YearRemodAdd', 'GarageYrBlt.
After making transformation drop YrSold

In [37]:
# function to calculate elapsed time

def elapsed_years(df, var):
    # capture difference between year variable and
    # year the house was sold
    
    df[var] = df['YrSold'] - df[var]
    return df

In [38]:
for var in ['YearBuilt', 'YearRemodAdd', 'GarageYrBlt']:
    X_train = elapsed_years(X_train, var)
    X_test = elapsed_years(X_test, var)

In [39]:
# drop YrSold
X_train.drop('YrSold', axis=1, inplace=True)
X_test.drop('YrSold', axis=1, inplace=True)

In [40]:
year_vars.remove('YrSold')

In [41]:
# capture the column names for use later in the notebook
final_columns = X_train.columns
final_columns

Index(['MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street', 'Alley',
       'LotShape', 'LandContour', 'Utilities', 'LotConfig', 'LandSlope',
       'Neighborhood', 'Condition1', 'Condition2', 'BldgType', 'HouseStyle',
       'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd', 'RoofStyle',
       'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType', 'MasVnrArea',
       'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual', 'BsmtCond',
       'BsmtExposure', 'BsmtFinType1', 'BsmtFinSF1', 'BsmtFinType2',
       'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating', 'HeatingQC',
       'CentralAir', 'Electrical', '1stFlrSF', '2ndFlrSF', 'LowQualFinSF',
       'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath', 'HalfBath',
       'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual', 'TotRmsAbvGrd',
       'Functional', 'Fireplaces', 'FireplaceQu', 'GarageType', 'GarageYrBlt',
       'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual', 'GarageCond',
       'PavedDrive', 'Wo

### Feature Engineering Pipeline

In [42]:
# I will treat discrete variables as if they were categorical
# to treat discrete as categorical using Feature-engine
# we need to re-cast them as object

X_train[discrete] = X_train[discrete].astype('O')
X_test[discrete] = X_test[discrete].astype('O')

In [43]:
pip install feature_engine

Note: you may need to restart the kernel to use updated packages.


In [44]:
# import relevant modules for feature engineering
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from feature_engine import missing_data_imputers as mdi
from feature_engine import categorical_encoders as ce
from feature_engine.variable_transformers import YeoJohnsonTransformer
from sklearn.preprocessing import StandardScaler
from feature_engine.discretisers import DecisionTreeDiscretiser

In [45]:
house_preprocess = Pipeline([
    
    # missing data imputation 
    ('missing_ind', mdi.AddNaNBinaryImputer(
        variables=['LotFrontage', 'MasVnrArea',  'GarageYrBlt'])),
    ('imputer_num', mdi.MeanMedianImputer(imputation_method='mean',
                                          variables=['LotFrontage', 'MasVnrArea',  'GarageYrBlt'])),
    ('imputer_cat', mdi.CategoricalVariableImputer(variables=categorical)),

    # categorical encoding 
     ('rare_label_enc', ce.RareLabelCategoricalEncoder(
         tol=0.01,n_categories=6, variables=categorical+discrete)),
    ('categorical_enc', ce.MeanCategoricalEncoder(variables = categorical + discrete)),
     
    # Transforming Numerical Variables
    ('yjt', YeoJohnsonTransformer(variables = ['LotFrontage','MasVnrArea', 'GarageYrBlt'])),

    
    # discretisation and encoding
    ('treeDisc',  DecisionTreeDiscretiser(cv=2, scoring='neg_mean_squared_error',
                                   regression=True,
                                   param_grid={'max_depth': [1,2,3,4,5,6]})),

    # feature Scaling
    ('scaler', StandardScaler()),
    
    

])

In [46]:
house_preprocess.fit(X_train,y_train)

Pipeline(memory=None,
         steps=[('missing_ind',
                 AddNaNBinaryImputer(variables=['LotFrontage', 'MasVnrArea',
                                                'GarageYrBlt'])),
                ('imputer_num',
                 MeanMedianImputer(imputation_method='mean',
                                   variables=['LotFrontage', 'MasVnrArea',
                                              'GarageYrBlt'])),
                ('imputer_cat',
                 CategoricalVariableImputer(variables=['MSZoning', 'Street',
                                                       'Alley', 'LotShape',
                                                       'LandContour',
                                                       'Utilities', '...
                                                    'Utilities', 'LotConfig',
                                                    'LandSlope', 'Neighborhood',
                                                    'Condition1', 'Condition2',
    

In [47]:
# Apply Transformations
X_train=house_preprocess.transform(X_train)
X_test=house_preprocess.transform(X_test)

## Regression Models- Tune different models one by one

In [48]:
import warnings
warnings.filterwarnings('ignore')

# Linear Regression with Cross val

In [49]:

#Linear Regression with Cross val
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score


lr = LinearRegression().fit(X_train, y_train)
cv_scores = cross_val_score(lr, X_train, y_train,cv=3)

# Mean Cross validation Score
print("Cross-validation scores: {}".format(cv_scores))
print("Mean Cross-validation scores: {:.2f}".format(cv_scores.mean()))
# Print Co-efficients
print("lr.coef_:", lr.coef_)
print("lr.intercept_:", lr.intercept_)

# Check test data set performance
print("LR Performance Test: ", lr.score(X_test,y_test))


Cross-validation scores: [ 8.76542321e-01 -5.50178080e+23  8.94787033e-01]
Mean Cross-validation scores: -183392693292153526288384.00
lr.coef_: [ 8.74692295e+02  9.72879918e+02  1.43598424e+03  2.41530789e+03
  1.55922446e+03  3.31539942e+02  5.84804509e+02  1.20332634e+03
  1.48520143e+03  2.33009441e+03  1.17630965e+03  1.13744157e+04
  1.26748073e+03  2.03977054e+03  1.11272495e+03 -1.03635447e+03
  1.58434519e+04 -2.93676378e+02 -5.41610661e+03  3.98221701e+03
  3.68590080e+02 -1.04048824e+03  3.19187631e+03 -2.39551624e+03
 -8.64923875e+02  2.87894951e+02  2.67846806e+03  7.08957253e+02
  1.02948711e+02  2.34865476e+03  5.71438886e+02  3.60875596e+03
  1.23986809e+03  5.64889112e+03 -1.47594682e+03  1.93575359e+03
 -2.70259491e+02  7.61385836e+03  2.15847425e+02  1.45627677e+03
  1.18463265e+03 -7.40189368e+02  1.22327269e+04  1.12776771e+04
  4.16965516e+03  5.78069444e+03  2.85514634e+03 -1.32019971e+03
  4.18391534e+03  5.24407111e+03  6.05123160e+02  3.06070473e+03
  3.7470485

## Linear Regression with SGD and cross validation

In [50]:

#Linear Regression with SGD and cross validation

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import SGDRegressor

sgd=SGDRegressor(max_iter=1000, tol = 1e-5, eta0=0.01)
sgd.fit(X_train, y_train)
cv_scores = cross_val_score(sgd, X_train, y_train,cv=2)
# Mean Cross validation Score
print("Cross-validation scores: {}".format(cv_scores))
print("Mean Cross-validation scores: {}".format(cv_scores.mean()))
# Print Co-efficients
print("lr.coef_:", sgd.coef_)
print("lr.intercept_:", sgd.intercept_)

# Check test data set performance
print("LR Performance Test: ", sgd.score(X_train,y_train))

Cross-validation scores: [0.86124974 0.88779009]
Mean Cross-validation scores: 0.8745199134958315
lr.coef_: [  770.9348899    941.06650267  1577.68212881  2437.14161925
  1859.16972791   415.23418883   480.63109964  1224.90962257
  1481.47162386  2417.68940026   919.11353591 11557.97564088
  1511.43603974  2446.86020361   899.46431389 -1131.16411756
 15826.95943569  -229.65524507 -5143.57850428  4005.72207858
   553.03987212  -698.62478132  3150.68871604 -2052.09837307
  -641.47878547   102.81864424  2706.29739266   853.60837667
  -109.39576131  2468.2736187    504.00066741  3442.27170692
  1179.6486113   5560.60751743 -1647.6221293   1896.72049254
  -225.35549057  8019.3017401    102.71527379  1486.55316522
  1291.83484427  -737.79376026 12436.08141618 11294.3446184
  4759.05575972  5913.88563226  2545.35659818 -1461.3773921
  4459.9378136   5117.04174257   689.09391386  2901.45268157
  3736.79712654  3390.00833819  2960.67101983   822.99366855
  1673.3592507  -2056.58898251 -2143.386

## Linear Regression with SGD and GridSearchCv

In [51]:
# Linear Regression with SGD and GridSearchCv

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import SGDRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import MinMaxScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_squared_error, r2_score
from math import sqrt


reg_sgd_pipe = Pipeline([

    # feature Scaling
    ('scaler', MinMaxScaler()),
    # regression
    ('sgd_reg', SGDRegressor(max_iter=10000, tol = 1e-6))
])

param_sgd = {'sgd_reg__eta0':[0.01, 0.05, 0.1 ,0.5]}
grid_sgd = GridSearchCV(reg_sgd_pipe, param_sgd,cv=5, n_jobs=-1, return_train_score = True,scoring='neg_mean_squared_error')
grid_sgd.fit(X_train, y_train)

X_train_preds = grid_sgd.predict(X_train)
X_test_preds = grid_sgd.predict(X_test)

print('train mse: {}'.format(mean_squared_error(y_train, X_train_preds)))
print('train rmse: {}'.format(sqrt(mean_squared_error(y_train, X_train_preds))))
print('train r2: {}'.format(r2_score(y_train, X_train_preds)))
print()
print('test mse: {}'.format(mean_squared_error(y_test, X_test_preds)))
print('test rmse: {}'.format(sqrt(mean_squared_error(y_test, X_test_preds))))
print('test r2: {}'.format(r2_score(y_test, X_test_preds)))
print()

print("Best parameters: {}".format(grid_sgd.best_params_))
print("Best cross-validation score: {:.2f}".format(grid_sgd.best_score_))
print()
print("Best estimator:\n{}".format(grid_sgd.best_estimator_))

train mse: 692584040.0558541
train rmse: 26316.991470452205
train r2: 0.8890770837930482

test mse: 1075038011.726254
test rmse: 32787.77228977678
test r2: 0.8435651954479455

Best parameters: {'sgd_reg__eta0': 0.1}
Best cross-validation score: -763056882.65

Best estimator:
Pipeline(memory=None,
         steps=[('scaler', MinMaxScaler(copy=True, feature_range=(0, 1))),
                ('sgd_reg',
                 SGDRegressor(alpha=0.0001, average=False, early_stopping=False,
                              epsilon=0.1, eta0=0.1, fit_intercept=True,
                              l1_ratio=0.15, learning_rate='invscaling',
                              loss='squared_loss', max_iter=10000,
                              n_iter_no_change=5, penalty='l2', power_t=0.25,
                              random_state=None, shuffle=True, tol=1e-06,
                              validation_fraction=0.1, verbose=0,
                              warm_start=False))],
         verbose=False)


# Ridge regression

In [52]:
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import Ridge
ridge = Ridge()


#define a list of parameters
param_ridge = {'alpha':np.logspace(-4,1,30)}

grid_ridge = GridSearchCV(ridge, param_ridge, cv=4, return_train_score = True)
grid_ridge.fit(X_train, y_train)

# Mean Cross Validation Score
print("Best Mean Cross-validation score: {}".format(grid_ridge.best_score_))

print()

#find best parameters
print('Ridge parameters: ', grid_ridge.best_params_)

# print co-eff

print("Ridge.coef_:", grid_ridge.best_estimator_.coef_)
print("Ridge.intercept_:", grid_ridge.best_estimator_.intercept_)

# Check test data set performance

print("Ridge Test Performance: ", grid_ridge.score(X_test,y_test))

Best Mean Cross-validation score: 0.8824748308049819

Ridge parameters:  {'alpha': 10.0}
Ridge.coef_: [ 8.29549172e+02  1.00520329e+03  1.41907291e+03  2.45597808e+03
  1.56476556e+03  3.15846034e+02  5.91322052e+02  1.20599829e+03
  1.48362551e+03  2.32371115e+03  1.20526043e+03  1.11935611e+04
  1.26236156e+03  2.05403204e+03  1.12076053e+03 -9.72818777e+02
  1.54617799e+04 -3.19815798e+02 -5.01931325e+03  3.85901413e+03
  4.23980960e+02 -9.99184855e+02  2.93013237e+03 -2.14991039e+03
 -8.68642665e+02  3.53138591e+02  2.78127261e+03  7.02910945e+02
  2.89525147e+01  2.39174264e+03  5.55247534e+02  3.58250149e+03
  1.27614162e+03  5.69204212e+03 -1.44936748e+03  1.91368069e+03
 -2.68899524e+02  7.66517621e+03  1.95080463e+02  1.45940911e+03
  1.20773749e+03 -7.11187851e+02  1.20416757e+04  1.11416366e+04
  4.12582377e+03  5.86228267e+03  2.79856151e+03 -1.28528512e+03
  4.13908324e+03  5.16003152e+03  6.18912987e+02  3.04868546e+03
  3.80554780e+03  3.21918000e+03  3.00001538e+03  7.9

# Lasso regression

In [53]:
from sklearn.linear_model import Lasso
lasso = Lasso(random_state=42)

#define a list of parameters
param_lasso = {'alpha':np.logspace(-4,4,50) }

grid_lasso = GridSearchCV(lasso, param_lasso, cv=4, return_train_score = True)
grid_lasso.fit(X_train, y_train)

# Mean Cross Validation Score
print("Best Mean Cross-validation score: {:}".format(grid_lasso.best_score_))
print()

#find best parameters
print('Lasso parameters: ', grid_lasso.best_params_)

# print co-eff

print("Lasso.coef_:", grid_lasso.best_estimator_.coef_)
print("Lasso.intercept_:", grid_lasso.best_estimator_.intercept_)

# Check test data set performance
print("Lasso Train Performance: ", grid_lasso.score(X_train,y_train))
print("Lasso Test Performance: ", grid_lasso.score(X_test,y_test))

Best Mean Cross-validation score: 0.8861779353224504

Lasso parameters:  {'alpha': 1048.1131341546852}
Lasso.coef_: [ 4.98696002e+02  7.57017519e+02  5.90379777e+02  2.95246614e+03
  3.46289623e+02  0.00000000e+00  2.22483500e+02  5.51470410e+02
  1.72744436e+02  1.42355195e+03  6.88994713e+02  1.06998115e+04
  3.88019570e+02  1.27196643e+03  1.05218376e+03  0.00000000e+00
  1.60919646e+04 -0.00000000e+00 -0.00000000e+00  2.51095187e+03
  0.00000000e+00 -0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  1.72759469e+03  0.00000000e+00
 -0.00000000e+00  1.22854720e+03  0.00000000e+00  3.01660427e+03
  0.00000000e+00  6.23338056e+03 -0.00000000e+00  8.76217415e+02
 -0.00000000e+00  8.15480315e+03  0.00000000e+00  4.28630003e+02
  9.67409601e+02  0.00000000e+00  1.20342814e+04  1.13569827e+04
  2.63429944e+03  6.08850239e+03  2.23450454e+03 -2.30186553e+02
  2.06467024e+03  3.82770997e+03  2.79840542e+02  1.75612416e+03
  4.08797263e+03  2.58396561e+03  1.608

# ElasticNet

In [54]:
#ElasticNet
from sklearn.linear_model import ElasticNet
from sklearn.model_selection import GridSearchCV
elasticnet = ElasticNet()


param_elasticnet = {'alpha':np.logspace(-4,1,30), 'l1_ratio' :[0.1,0.2,0.3,0.4]}

grid_elasticnet = GridSearchCV(elasticnet , param_elasticnet, cv=5, return_train_score = True)
grid_elasticnet.fit(X_train, y_train)

grid_elasticnet_train_score = grid_elasticnet.score(X_train, y_train)
grid_elasticnet_test_score = grid_elasticnet.score(X_test, y_test)

print('Training set score: ', grid_elasticnet_train_score)
print('Test score: ', grid_elasticnet_test_score)

#find best parameters
print('Best parameters: ', grid_elasticnet.best_params_)
print('Best cross-validation score:', grid_elasticnet.best_score_)

Training set score:  0.9094017502609215
Test score:  0.8716313335123038
Best parameters:  {'alpha': 0.18873918221350977, 'l1_ratio': 0.4}
Best cross-validation score: 0.8841708558036789


# KNN Regressor

In [32]:
#KNN
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import MinMaxScaler
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import GridSearchCV

pipe_knn=Pipeline([ 
    ('scaler',MinMaxScaler()),
    ('knnreg', KNeighborsRegressor())
    
])

param_knn = {'knnreg__n_neighbors': range(1,8)}

#apply grid search
grid_knn = GridSearchCV(pipe_knn, param_knn, cv=5, return_train_score=True)
grid_knn.fit(X_train, y_train)

print('train score: ', grid_knn.score(X_train, y_train))
print('test score: ', grid_knn.score(X_test, y_test))

#find best parameters
print('Best parameters: ', grid_knn.best_params_)
print('Best cross-validation score:', grid_knn.best_score_)



train score:  0.8516070137143069
test score:  0.7587863099604379
Best parameters:  {'knnreg__n_neighbors': 6}
Best cross-validation score: 0.78587417331835


# Polynomial Regression 

In [44]:
#Polynomial 
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing  import PolynomialFeatures
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, r2_score
from math import sqrt


pipe_poly=Pipeline([ 
    ('polynomialfeatures', PolynomialFeatures()),
    ('norm_reg', LinearRegression())
    
])
#define a list of parameters
param_poly = {'polynomialfeatures__degree':range(1,4)}

grid_poly = GridSearchCV(pipe_poly, param_poly,cv=5, n_jobs=-1, return_train_score = True,scoring='neg_mean_squared_error')


grid_poly.fit(X_train, y_train)

# let's get the predictions
X_train_preds = grid_poly.predict(X_train)
X_test_preds = grid_poly.predict(X_test)

# check model performance:

print('train mse: {}'.format(mean_squared_error(y_train, X_train_preds)))
print('train rmse: {}'.format(sqrt(mean_squared_error(y_train, X_train_preds))))
print('train r2: {}'.format(r2_score(y_train, X_train_preds)))
print()
print('test mse: {}'.format(mean_squared_error(y_test, X_test_preds)))
print('test rmse: {}'.format(sqrt(mean_squared_error(y_test, X_test_preds))))
print('test r2: {}'.format(r2_score(y_test, X_test_preds)))

#find best parameters
print('Best parameters: ', grid_poly.best_params_)
print('Best cross validation score: ', grid_poly.best_score_)



train mse: 27492.40537606851
train rmse: 165.8083392838506
train r2: 0.9999955968696916

test mse: 1.2179477524995892e+24
test rmse: 1103606701909.5115
test r2: -177230401658953.0
Best parameters:  {'polynomialfeatures__degree': 2}
Best cross validation score:  -8.290251389947242e+25


# Decision Tree Regressor

In [56]:
from sklearn.tree import DecisionTreeRegressor
dtree = DecisionTreeRegressor(random_state=0)

#define a list of parameters
param_dtree = {'max_depth': range(1,10),'min_samples_split':range(2,10,1),'criterion':['mse','mae']}

#apply grid search
grid_dtree = GridSearchCV(dtree, param_dtree, cv=5, return_train_score = True)
grid_dtree.fit(X_train, y_train)

# Mean Cross Validation Score
print("Best Mean Cross-validation score: {:.2f}".format(grid_dtree.best_score_))
print()

#find best parameters
print('Decision Tree parameters: ', grid_dtree.best_params_)

# Check test data set performance
print("Decision Tree Performance: ", grid_dtree.score(X_test,y_test))

Best Mean Cross-validation score: 0.77

Decision Tree parameters:  {'criterion': 'mse', 'max_depth': 6, 'min_samples_split': 7}
Decision Tree Performance:  0.7946089841830359


# SVR

In [24]:
from sklearn.svm import SVR
from sklearn.model_selection import GridSearchCV
svr=SVR()
param_grid = [{'kernel': ['rbf'],
               'C': [0.001, 0.01, 0.1, 1,10],
               'gamma': [ 0.01, 0.1, 1,10,100]},
              {'kernel': ['poly'],
               'degree':[1,2],
               'C': [0.001, 0.01, 0.1, 1,10],
               'gamma': [ 0.01, 0.1, 1,10,100]},
              {'kernel': ['sigmoid'],
               'C': [0.001, 0.01, 0.1, 1,10],
               'gamma': [ 0.01, 0.1, 1,10,100]}]
grid_svr = GridSearchCV(svr, param_grid, cv=3,
                          return_train_score=True)
grid_svr.fit(X_train, y_train)
print('train score: ', grid_svr.score(X_train, y_train))
print('test score: ', grid_svr.score(X_test, y_test))
print("Best parameters: {}".format(grid_svr.best_params_))
print("Best cross-validation score: {:}".format(grid_svr.best_score_))

train score:  0.9016336525800952
test score:  0.8702806425506886
Best parameters: {'C': 10, 'degree': 1, 'gamma': 100, 'kernel': 'poly'}
Best cross-validation score: 0.8883364269292818


# Tune Multiple Models with one GridSearch

In [55]:
model_gs = Pipeline([("regressor", LinearRegression())])


In [56]:
model_parm_gd = [
    { 'regressor': [LinearRegression()]},
    
    { 'regressor': [Ridge()],
      'regressor__alpha':np.logspace(-4,10,30) },
    
    { 'regressor': [Lasso(random_state=42)],
      'regressor__alpha':np.logspace(-4,4,50)}

    

    
 
]

In [57]:

grid_search_house_pipe = GridSearchCV(model_gs, model_parm_gd,cv=4)


In [58]:
grid_search_house_pipe.fit(X_train,y_train)

GridSearchCV(cv=4, error_score=nan,
             estimator=Pipeline(memory=None,
                                steps=[('regressor',
                                        LinearRegression(copy_X=True,
                                                         fit_intercept=True,
                                                         n_jobs=None,
                                                         normalize=False))],
                                verbose=False),
             iid='deprecated', n_jobs=None,
             param_grid=[{'regressor': [LinearRegression(copy_X=True,
                                                         fit_intercept=True,
                                                         n_jobs=None,
                                                         normalize=False)]},
                         {'regressor': [Ridge(alpha=1.0, copy_X=...
       1.67683294e+01, 2.44205309e+01, 3.55648031e+01, 5.17947468e+01,
       7.54312006e+01, 1.09854114e+02, 1.599858

In [59]:
print(grid_search_house_pipe.best_params_)

{'regressor': Lasso(alpha=1048.1131341546852, copy_X=True, fit_intercept=True, max_iter=1000,
      normalize=False, positive=False, precompute=False, random_state=42,
      selection='cyclic', tol=0.0001, warm_start=False), 'regressor__alpha': 1048.1131341546852}


In [60]:
# let's get the predictions
X_train_preds = grid_search_house_pipe.predict(X_train)
X_test_preds = grid_search_house_pipe.predict(X_test)

In [61]:
print("Best Mean Cross-validation score: {}".format(grid_search_house_pipe.best_score_))

Best Mean Cross-validation score: 0.8861779353224504


In [62]:
# check model performance:
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score

print('train mse: {}'.format(mean_squared_error(y_train, X_train_preds)))
print('train rmse: {}'.format(sqrt(mean_squared_error(y_train, X_train_preds))))
print('train r2: {}'.format(r2_score(y_train, X_train_preds)))
print()
print('test mse: {}'.format(mean_squared_error(y_test, X_test_preds)))
print('test rmse: {}'.format(sqrt(mean_squared_error(y_test, X_test_preds))))
print('test r2: {}'.format(r2_score(y_test, X_test_preds)))

train mse: 593136246.5846152
train rmse: 24354.388651424106
train r2: 0.9050044494616056

test mse: 778873594.3317508
test rmse: 27908.306905503076
test r2: 0.8866617392399055
