# Project 2: Ames Housing Prices

## Model Tuning

In [1]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.linear_model import Ridge, Lasso, ElasticNet, LinearRegression, RidgeCV, LassoCV, ElasticNetCV
from sklearn.model_selection import cross_val_score
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import PolynomialFeatures

In [2]:
# Read in data
X_train = pd.read_csv('../datasets/model_tuning/X_train_processed.csv', keep_default_na=False, index_col='Id')
X_test = pd.read_csv('../datasets/model_tuning/X_test_processed.csv', keep_default_na=False, index_col='Id')
y_train = pd.read_csv('../datasets/model_tuning/y_train.csv', keep_default_na=False, index_col='Id')
y_test = pd.read_csv('../datasets/model_tuning/y_test.csv', keep_default_na=False, index_col='Id')

# Read in top Lasso coefficients from Feature Set 1
top_lasso_coef = pd.read_csv('../datasets/model_tuning/top_lasso_coef.csv', index_col = 0)

### Feature Set 2

For Feature Set 2, we can try retaining only the top 50 features of the previous Lasso Regression model.

In [3]:
# Examine top Lasso coefficients
top_lasso_coef

Unnamed: 0,feature,coef,abs_coef
8,Gr Liv Area,24095.70094,24095.70094
185,Kitchen Qual_Ex,23847.440798,23847.440798
150,Bsmt Qual_Ex,18516.320628,18516.320628
47,Neighborhood_NridgHt,17397.100149,17397.100149
162,Bsmt Exposure_Gd,15177.173105,15177.173105
141,Exter Qual_Ex,14577.094016,14577.094016
46,Neighborhood_NoRidge,13415.796923,13415.796923
35,Neighborhood_Crawfor,11487.809499,11487.809499
11,Overall Qual,9065.137909,9065.137909
6,Total Bsmt SF,8556.796445,8556.796445


In [4]:
# Create Feature Set 2
feature_set_2 = list(top_lasso_coef['feature'])

# Create new train and test sets
X_train_2 = X_train[feature_set_2]
X_test_2 = X_test[feature_set_2]

#### Ridge Regression

In [5]:
# Create Ridge Regression CV of training data of Feature Set 2
ridge_alphas = np.logspace(0, 5, 200)

optimal_ridge = RidgeCV(alphas=ridge_alphas, cv=5)
optimal_ridge.fit(X_train_2, y_train)

print(optimal_ridge.alpha_)

2.0022003718155847


In [6]:
# Create Ridge Regression model with optimal alpha
ridge = Ridge(alpha=optimal_ridge.alpha_)
ridge.fit(X_train_2, y_train)

Ridge(alpha=2.0022003718155847)

In [7]:
# Check cross val score (RMSE)
abs(cross_val_score(ridge, X_train_2, y_train, cv=5, scoring='neg_root_mean_squared_error')).mean()

20014.65003038865

In [8]:
# Compare model predictions to test data (RMSE)
mean_squared_error(y_test, ridge.predict(X_test_2), squared=False)

19112.26129192974

The Ridge Regression model for Feature Set 2 performs much better than Feature Set 1 in test predictions.

There is less overfitting as Feature Set 2 uses 50 features compared to 220 from Feature Set 1.

The model is better at generalizing to unseen data.

| Ridge          |     |                |     |
|----------------|-----|----------------|     |
|Feature Set 1   |RMSE |Feature Set 2   |RMSE |
|Cross validation|19950|Cross validation|20014|
|Test predictions|58395|Test predictions|19112|

#### Lasso Regression

In [9]:
# Create Lasso Regression CV of training data of Feature Set 2
optimal_lasso = LassoCV(n_alphas=500, cv=5)
optimal_lasso.fit(X_train_2, np.ravel(y_train))

print(optimal_lasso.alpha_)

82.23289058522015


In [10]:
# Create Lasso Regression model with optimal alpha
lasso = Lasso(alpha=optimal_lasso.alpha_)
lasso.fit(X_train_2, y_train)

Lasso(alpha=82.23289058522015)

In [11]:
# Check cross val score (RMSE)
abs(cross_val_score(lasso, X_train_2, y_train, cv=5, scoring='neg_root_mean_squared_error')).mean()

20151.096318317526

In [12]:
# Compare model predictions to test data (RMSE)
mean_squared_error(y_test, lasso.predict(X_test_2), squared=False)

19172.911723312995

The Lasso Regression of Feature Set 2 performs similarly to the Ridge Regression above.

| Lasso          |     |                |     |
|----------------|-----|----------------|     |
|Feature Set 1   |RMSE |Feature Set 2   |RMSE |
|Cross validation|20837|Cross validation|20151|
|Test predictions|50134|Test predictions|19172|

#### Elastic Net Regression

In [13]:
# Create Enet Regression CV of training data of Feature Set 2
l1_ratios = np.linspace(0.01, 1.0, 25)
optimal_enet = ElasticNetCV(l1_ratio=l1_ratios, n_alphas=100, cv=5)
optimal_enet.fit(X_train_2, np.ravel(y_train))

ElasticNetCV(cv=5,
             l1_ratio=array([0.01   , 0.05125, 0.0925 , 0.13375, 0.175  , 0.21625, 0.2575 ,
       0.29875, 0.34   , 0.38125, 0.4225 , 0.46375, 0.505  , 0.54625,
       0.5875 , 0.62875, 0.67   , 0.71125, 0.7525 , 0.79375, 0.835  ,
       0.87625, 0.9175 , 0.95875, 1.     ]))

In [14]:
print(optimal_enet.alpha_, optimal_enet.l1_ratio_)

82.23289058522015 1.0


In [15]:
# Enet Regression should perform similarly to Lasso, given optimal l1 = 1
enet = ElasticNet(alpha=optimal_enet.alpha_, l1_ratio=optimal_enet.l1_ratio_)
enet.fit(X_train_2, y_train)

ElasticNet(alpha=82.23289058522015, l1_ratio=1.0)

In [16]:
# Check cross val score (RMSE)
abs(cross_val_score(enet, X_train_2, y_train, cv=5, scoring='neg_root_mean_squared_error')).mean()

20151.096318317526

In [17]:
# Compare model predictions to test data (RMSE)
mean_squared_error(y_test, enet.predict(X_test_2), squared=False)

19172.911723312995

The Elastic Net regression of Feature Set 2 performs similarly to the Lasso Regression above, as l1_ratio = 1.

| Enet           |     |                |     |
|----------------|-----|----------------|     |
|Feature Set 1   |RMSE |Feature Set 2   |RMSE |
|Cross validation|20837|Cross validation|20151|
|Test predictions|50134|Test predictions|19172|

#### Summary (Feature Set 2)

Overall, Feature Set 2 performed much better than Feature Set 1 as only the top 50 features (based on Lasso coefficients) were used.

| Ridge          |     |                |     |
|----------------|-----|----------------|     |
|Feature Set 1   |RMSE |Feature Set 2   |RMSE |
|Cross validation|19950|Cross validation|20014|
|Test predictions|58395|Test predictions|19112|

| Lasso          |     |                |     |
|----------------|-----|----------------|     |
|Feature Set 1   |RMSE |Feature Set 2   |RMSE |
|Cross validation|20837|Cross validation|20151|
|Test predictions|50134|Test predictions|19172|

| Enet           |     |                |     |
|----------------|-----|----------------|     |
|Feature Set 1   |RMSE |Feature Set 2   |RMSE |
|Cross validation|20837|Cross validation|20151|
|Test predictions|50134|Test predictions|19172|

We can try to further reduce the number of features to 30 for Feature Set 3.

### Feature Set 3

In [18]:
# Create Feature Set 3 based on top 30 coefficients
feature_set_3 = list(top_lasso_coef.head(30)['feature'])

In [19]:
# Create new train and test sets
X_train_3 = X_train[feature_set_3]
X_test_3 = X_test[feature_set_3]

#### Ridge Regression

In [20]:
# Create Ridge Regression CV of training data of Feature Set 3
ridge_alphas = np.logspace(0, 5, 200)

optimal_ridge = RidgeCV(alphas=ridge_alphas, cv=5)
optimal_ridge.fit(X_train_3, y_train)

print(optimal_ridge.alpha_)

1.414991297434576


In [21]:
# Create Ridge Regression model with optimal alpha
ridge = Ridge(alpha=optimal_ridge.alpha_)
ridge.fit(X_train_3, y_train)

Ridge(alpha=1.414991297434576)

In [22]:
# Check cross val score (RMSE)
abs(cross_val_score(ridge, X_train_3, y_train, cv=5, scoring='neg_root_mean_squared_error')).mean()

20165.271527197277

In [23]:
# Compare model predictions to test data (RMSE)
mean_squared_error(y_test, ridge.predict(X_test_3), squared=False)

19213.26387075005

The Ridge Regression model for Feature Set 3 performs similarly to Feature Set 2.

| Ridge          |     |                |     |                |     |
|----------------|-----|----------------|     |----------------|     |
|Feature Set 1   |RMSE |Feature Set 2   |RMSE |Feature Set 3   |RMSE |
|Cross validation|19950|Cross validation|20014|Cross validation|20165|
|Test predictions|58395|Test predictions|19112|Test predictions|19213|

#### Lasso Regression

In [24]:
# Create Lasso Regression CV of training data of Feature Set 3
optimal_lasso = LassoCV(n_alphas=500, cv=5)
optimal_lasso.fit(X_train_3, np.ravel(y_train))

print(optimal_lasso.alpha_)

82.23289058522015


In [25]:
# Create Lasso Regression model with optimal alpha
lasso = Lasso(alpha=optimal_lasso.alpha_)
lasso.fit(X_train_3, y_train)

Lasso(alpha=82.23289058522015)

In [26]:
# Check cross val score (RMSE)
abs(cross_val_score(lasso, X_train_3, y_train, cv=5, scoring='neg_root_mean_squared_error')).mean()

20262.486172452464

In [27]:
# Compare model predictions to test data (RMSE)
mean_squared_error(y_test, lasso.predict(X_test_3), squared=False)

19326.15432645396

The Lasso Regression of Feature Set 3 performs similarly to Feature Set 2.

| Lasso          |     |                |     |                |     |
|----------------|-----|----------------|     |----------------|     |
|Feature Set 1   |RMSE |Feature Set 2   |RMSE |Feature Set 3   |RMSE |
|Cross validation|20837|Cross validation|20151|Cross validation|20262|
|Test predictions|50134|Test predictions|19172|Test predictions|19326|

#### Elastic Net Regression

In [28]:
# Create Enet Regression CV of training data of Feature Set 3
l1_ratios = np.linspace(0.01, 1.0, 25)
optimal_enet = ElasticNetCV(l1_ratio=l1_ratios, n_alphas=100, cv=5)
optimal_enet.fit(X_train_3, np.ravel(y_train))

ElasticNetCV(cv=5,
             l1_ratio=array([0.01   , 0.05125, 0.0925 , 0.13375, 0.175  , 0.21625, 0.2575 ,
       0.29875, 0.34   , 0.38125, 0.4225 , 0.46375, 0.505  , 0.54625,
       0.5875 , 0.62875, 0.67   , 0.71125, 0.7525 , 0.79375, 0.835  ,
       0.87625, 0.9175 , 0.95875, 1.     ]))

In [29]:
print(optimal_enet.alpha_, optimal_enet.l1_ratio_)

82.23289058522015 1.0


In [30]:
# Enet Regression should perform similarly to Lasso, given optimal l1 = 1
enet = ElasticNet(alpha=optimal_enet.alpha_, l1_ratio=optimal_enet.l1_ratio_)
enet.fit(X_train_3, y_train)

ElasticNet(alpha=82.23289058522015, l1_ratio=1.0)

In [31]:
# Check cross val score (RMSE)
abs(cross_val_score(enet, X_train_3, y_train, cv=5, scoring='neg_root_mean_squared_error')).mean()

20262.486172452464

In [32]:
# Compare model predictions to test data (RMSE)
mean_squared_error(y_test, enet.predict(X_test_3), squared=False)

19326.15432645396

The Elastic Net regression of Feature Set 3 performs similarly to the Lasso Regression above, as l1_ratio = 1.

| Enet           |     |                |     |                |     |
|----------------|-----|----------------|     |----------------|     |
|Feature Set 1   |RMSE |Feature Set 2   |RMSE |Feature Set 3   |RMSE |
|Cross validation|20837|Cross validation|20151|Cross validation|20262|
|Test predictions|50134|Test predictions|19172|Test predictions|19326|

#### Summary (Feature Set 3)

Overall, Feature Set 3 performed similarly to Feature Set 2, as it used a subset of the 50 features previously used.

| Ridge          |     |                |     |                |     |
|----------------|-----|----------------|     |----------------|     |
|Feature Set 1   |RMSE |Feature Set 2   |RMSE |Feature Set 3   |RMSE |
|Cross validation|19950|Cross validation|20014|Cross validation|20165|
|Test predictions|58395|Test predictions|19112|Test predictions|19213|

| Lasso          |     |                |     |                |     |
|----------------|-----|----------------|     |----------------|     |
|Feature Set 1   |RMSE |Feature Set 2   |RMSE |Feature Set 3   |RMSE |
|Cross validation|20837|Cross validation|20151|Cross validation|20262|
|Test predictions|50134|Test predictions|19172|Test predictions|19326|

| Enet           |     |                |     |                |     |
|----------------|-----|----------------|     |----------------|     |
|Feature Set 1   |RMSE |Feature Set 2   |RMSE |Feature Set 3   |RMSE |
|Cross validation|20837|Cross validation|20151|Cross validation|20262|
|Test predictions|50134|Test predictions|19172|Test predictions|19326|

### Feature Set 4

For Feature Set 4, we can try using Polynomial Features to see if the model performs better with interaction features.

We can set interaction_only = True for now to test only the interaction between features and to reduce complexity.

In [33]:
# Instantiate Polynomial Features with interaction only
poly = PolynomialFeatures(interaction_only=True)

In [34]:
X_train_4 = poly.fit_transform(X_train_3)
X_test_4 = poly.transform(X_test_3)

In [35]:
# The number of columns has increased significantly, increasing the chances of overfitting.
X_train_4.shape

(1458, 466)

#### Ridge Regression

In [36]:
# Create Ridge Regression CV of training data of Feature Set 4
ridge_alphas = np.logspace(0, 5, 200)

optimal_ridge = RidgeCV(alphas=ridge_alphas, cv=5)
optimal_ridge.fit(X_train_4, y_train)

print(optimal_ridge.alpha_)

121.7382727739662


In [37]:
# Create Ridge Regression model with optimal alpha
ridge = Ridge(alpha=optimal_ridge.alpha_)
ridge.fit(X_train_4, y_train)

Ridge(alpha=121.7382727739662)

In [38]:
# Check cross val score (RMSE)
abs(cross_val_score(ridge, X_train_4, y_train, cv=5, scoring='neg_root_mean_squared_error')).mean()

18581.69354963295

In [39]:
# Compare model predictions to test data (RMSE)
mean_squared_error(y_test, ridge.predict(X_test_4), squared=False)

18173.034009579802

The Ridge Regression model for Feature Set 4 performs better than Feature Set 3.

| Ridge          |     |                |     |                |     |                |     |
|----------------|-----|----------------|     |----------------|     |----------------|     |
|Feature Set 1   |RMSE |Feature Set 2   |RMSE |Feature Set 3   |RMSE |Feature Set 4   |RMSE |
|Cross validation|19950|Cross validation|20014|Cross validation|20165|Cross validation|18581|
|Test predictions|58395|Test predictions|19112|Test predictions|19213|Test predictions|18173|

#### Lasso Regression

In [40]:
# Create Lasso Regression CV of training data of Feature Set 4
optimal_lasso = LassoCV(n_alphas=500, cv=5)
optimal_lasso.fit(X_train_4, np.ravel(y_train))

print(optimal_lasso.alpha_)

374.7157283649545


In [41]:
# Create Lasso Regression model with optimal alpha
lasso = Lasso(alpha=optimal_lasso.alpha_)
lasso.fit(X_train_4, y_train)

Lasso(alpha=374.7157283649545)

In [42]:
# Check cross val score (RMSE)
abs(cross_val_score(lasso, X_train_4, y_train, cv=5, scoring='neg_root_mean_squared_error')).mean()

18783.71985654378

In [43]:
# Compare model predictions to test data (RMSE)
mean_squared_error(y_test, lasso.predict(X_test_4), squared=False)

18472.730984884824

The Lasso Regression of Feature Set 4 performs similarly to the Ridge Regression above.

| Lasso          |     |                |     |                |     |                |     |
|----------------|-----|----------------|     |----------------|     |----------------|     |
|Feature Set 1   |RMSE |Feature Set 2   |RMSE |Feature Set 3   |RMSE |Feature Set 4   |RMSE |
|Cross validation|20837|Cross validation|20151|Cross validation|20262|Cross validation|18783|
|Test predictions|50134|Test predictions|19172|Test predictions|19326|Test predictions|18472|

#### Elastic Net Regression

In [44]:
# Create Enet Regression CV of training data of Feature Set 4
l1_ratios = np.linspace(0.01, 1.0, 25)
optimal_enet = ElasticNetCV(l1_ratio=l1_ratios, n_alphas=100, cv=5)
optimal_enet.fit(X_train_4, np.ravel(y_train))

ElasticNetCV(cv=5,
             l1_ratio=array([0.01   , 0.05125, 0.0925 , 0.13375, 0.175  , 0.21625, 0.2575 ,
       0.29875, 0.34   , 0.38125, 0.4225 , 0.46375, 0.505  , 0.54625,
       0.5875 , 0.62875, 0.67   , 0.71125, 0.7525 , 0.79375, 0.835  ,
       0.87625, 0.9175 , 0.95875, 1.     ]))

In [45]:
print(optimal_enet.alpha_, optimal_enet.l1_ratio_)

374.7157283649545 1.0


In [46]:
# Enet Regression should perform similarly to Lasso, given optimal l1 = 1
enet = ElasticNet(alpha=optimal_enet.alpha_, l1_ratio=optimal_enet.l1_ratio_)
enet.fit(X_train_4, y_train)

ElasticNet(alpha=374.7157283649545, l1_ratio=1.0)

In [47]:
# Check cross val score (RMSE)
abs(cross_val_score(enet, X_train_4, y_train, cv=5, scoring='neg_root_mean_squared_error')).mean()

18783.71985654378

In [48]:
# Compare model predictions to test data (RMSE)
mean_squared_error(y_test, enet.predict(X_test_4), squared=False)

18472.730984884824

The Elastic Net regression of Feature Set 4 performs similarly to the Lasso Regression above, as l1_ratio = 1.

| Enet           |     |                |     |                |     |                |     |
|----------------|-----|----------------|     |----------------|     |----------------|     |
|Feature Set 1   |RMSE |Feature Set 2   |RMSE |Feature Set 3   |RMSE |Feature Set 4   |RMSE |
|Cross validation|20837|Cross validation|20151|Cross validation|20262|Cross validation|18783|
|Test predictions|50134|Test predictions|19172|Test predictions|19326|Test predictions|18472|

#### Summary (Feature Set 4)

Overall, Feature Set 4 performed slightly better than Feature Set 3, with the inclusion of interaction features.

| Ridge          |     |                |     |                |     |                |     |
|----------------|-----|----------------|     |----------------|     |----------------|     |
|Feature Set 1   |RMSE |Feature Set 2   |RMSE |Feature Set 3   |RMSE |Feature Set 4   |RMSE |
|Cross validation|19950|Cross validation|20014|Cross validation|20165|Cross validation|18581|
|Test predictions|58395|Test predictions|19112|Test predictions|19213|Test predictions|18173|

| Lasso          |     |                |     |                |     |                |     |
|----------------|-----|----------------|     |----------------|     |----------------|     |
|Feature Set 1   |RMSE |Feature Set 2   |RMSE |Feature Set 3   |RMSE |Feature Set 4   |RMSE |
|Cross validation|20837|Cross validation|20151|Cross validation|20262|Cross validation|18783|
|Test predictions|50134|Test predictions|19172|Test predictions|19326|Test predictions|18472|

| Enet           |     |                |     |                |     |                |     |
|----------------|-----|----------------|     |----------------|     |----------------|     |
|Feature Set 1   |RMSE |Feature Set 2   |RMSE |Feature Set 3   |RMSE |Feature Set 4   |RMSE |
|Cross validation|20837|Cross validation|20151|Cross validation|20262|Cross validation|18783|
|Test predictions|50134|Test predictions|19172|Test predictions|19326|Test predictions|18472|

### Feature Set 5

In [49]:
# Obtain new set of best Lasso coefficients
lasso_coefs = pd.DataFrame({'feature': poly.get_feature_names(X_train_3.columns),
                            'coef':lasso.coef_,
                            'abs_coef':np.abs(lasso.coef_)})
lasso_coefs.sort_values('abs_coef', inplace=True, ascending=False)
lasso_coefs.head(30)

Unnamed: 0,feature,coef,abs_coef
46,Gr Liv Area Bldg Type_1Fam,3995.50001,3995.50001
258,Total Bsmt SF BsmtFin Type 1_GLQ,3562.932523,3562.932523
145,Bsmt Exposure_Gd Total Bsmt SF,3404.750463,3404.750463
240,Overall Qual Exterior 1st_BrkFace,2647.844422,2647.844422
225,Neighborhood_Crawfor Overall Cond,2610.817782,2610.817782
239,Overall Qual Year Built,2265.466465,2265.466465
38,Gr Liv Area Overall Qual,2254.036962,2254.036962
9,Overall Qual,2093.026051,2093.026051
119,Neighborhood_NridgHt Overall Qual,2047.273924,2047.273924
236,Overall Qual Neighborhood_StoneBr,1993.223509,1993.223509


In [50]:
# Create feature_set_5 from the top 30 interaction features
feature_set_5 = list(lasso_coefs.head(30)['feature'])

# Create train and test datasets using Feature Set 5
X_train_5 = pd.DataFrame(X_train_4, columns=poly.get_feature_names(X_train_3.columns))[feature_set_5]
X_test_5 = pd.DataFrame(X_test_4, columns=poly.get_feature_names(X_test_3.columns))[feature_set_5]

#### Ridge Regression

In [51]:
# Create Ridge Regression CV of training data of Feature Set 5
ridge_alphas = np.logspace(0, 5, 200)

optimal_ridge = RidgeCV(alphas=ridge_alphas, cv=5)
optimal_ridge.fit(X_train_5, y_train)

print(optimal_ridge.alpha_)

4.768611697714471


In [52]:
# Create Ridge Regression model with optimal alpha
ridge = Ridge(alpha=optimal_ridge.alpha_)
ridge.fit(X_train_5, y_train)

Ridge(alpha=4.768611697714471)

In [53]:
# Check cross val score (RMSE)
abs(cross_val_score(ridge, X_train_5, y_train, cv=5, scoring='neg_root_mean_squared_error')).mean()

18415.197524725267

In [54]:
# Compare model predictions to test data (RMSE)
mean_squared_error(y_test, ridge.predict(X_test_5), squared=False)

18474.92082659599

The Ridge Regression model for Feature Set 5 performs similarly to Feature Set 4.

| Ridge          |     |                |     |                |     |                |     |                |     |
|----------------|-----|----------------|     |----------------|     |----------------|     |----------------|     |
|Feature Set 1   |RMSE |Feature Set 2   |RMSE |Feature Set 3   |RMSE |Feature Set 4   |RMSE |Feature Set 5   |RMSE |
|Cross validation|19950|Cross validation|20014|Cross validation|20165|Cross validation|18581|Cross validation|18415|
|Test predictions|58395|Test predictions|19112|Test predictions|19213|Test predictions|18173|Test predictions|18474|

#### Lasso Regression

In [55]:
# Create Lasso Regression CV of training data of Feature Set 5
optimal_lasso = LassoCV(n_alphas=500, cv=5)
optimal_lasso.fit(X_train_5, np.ravel(y_train))

print(optimal_lasso.alpha_)

374.7157283649552


In [56]:
# Create Lasso Regression model with optimal alpha
lasso = Lasso(alpha=optimal_lasso.alpha_)
lasso.fit(X_train_5, y_train)

Lasso(alpha=374.7157283649552)

In [57]:
# Check cross val score (RMSE)
abs(cross_val_score(lasso, X_train_5, y_train, cv=5, scoring='neg_root_mean_squared_error')).mean()

18660.02669649224

In [58]:
# Compare model predictions to test data (RMSE)
mean_squared_error(y_test, lasso.predict(X_test_5), squared=False)

18514.929784812284

The Lasso Regression of Feature Set 5 performs similarly to the Ridge Regression above.

| Lasso          |     |                |     |                |     |                |     |                |     |
|----------------|-----|----------------|     |----------------|     |----------------|     |----------------|     |
|Feature Set 1   |RMSE |Feature Set 2   |RMSE |Feature Set 3   |RMSE |Feature Set 4   |RMSE |Feature Set 5   |RMSE |
|Cross validation|20837|Cross validation|20151|Cross validation|20262|Cross validation|18783|Cross validation|18660|
|Test predictions|50134|Test predictions|19172|Test predictions|19326|Test predictions|18472|Test predictions|18514|

#### Elastic Net Regression

In [59]:
# Create Enet Regression CV of training data of Feature Set 5
l1_ratios = np.linspace(0.01, 1.0, 25)
optimal_enet = ElasticNetCV(l1_ratio=l1_ratios, n_alphas=100, cv=5)
optimal_enet.fit(X_train_5, np.ravel(y_train))

ElasticNetCV(cv=5,
             l1_ratio=array([0.01   , 0.05125, 0.0925 , 0.13375, 0.175  , 0.21625, 0.2575 ,
       0.29875, 0.34   , 0.38125, 0.4225 , 0.46375, 0.505  , 0.54625,
       0.5875 , 0.62875, 0.67   , 0.71125, 0.7525 , 0.79375, 0.835  ,
       0.87625, 0.9175 , 0.95875, 1.     ]))

In [60]:
print(optimal_enet.alpha_, optimal_enet.l1_ratio_)

374.7157283649552 1.0


In [61]:
# Enet Regression should perform similarly to Lasso, given optimal l1 = 1
enet = ElasticNet(alpha=optimal_enet.alpha_, l1_ratio=optimal_enet.l1_ratio_)
enet.fit(X_train_5, y_train)

ElasticNet(alpha=374.7157283649552, l1_ratio=1.0)

In [62]:
# Check cross val score (RMSE)
abs(cross_val_score(enet, X_train_5, y_train, cv=5, scoring='neg_root_mean_squared_error')).mean()

18660.02669649224

In [63]:
# Compare model predictions to test data (RMSE)
mean_squared_error(y_test, enet.predict(X_test_5), squared=False)

18514.929784812284

The Elastic Net regression of Feature Set 5 performs similarly to the Lasso Regression above, as l1_ratio = 1.

| Enet           |     |                |     |                |     |                |     |                |     |
|----------------|-----|----------------|     |----------------|     |----------------|     |----------------|     |
|Feature Set 1   |RMSE |Feature Set 2   |RMSE |Feature Set 3   |RMSE |Feature Set 4   |RMSE |Feature Set 5   |RMSE |
|Cross validation|20837|Cross validation|20151|Cross validation|20262|Cross validation|18783|Cross validation|18660|
|Test predictions|50134|Test predictions|19172|Test predictions|19326|Test predictions|18472|Test predictions|18514|

#### Summary (Feature Set 5)

Overall, Feature 5 has performed the best, given the low RMSE scores and low (30) number of features.

| Ridge          |     |                |     |                |     |                |     |                |     |
|----------------|-----|----------------|     |----------------|     |----------------|     |----------------|     |
|Feature Set 1   |RMSE |Feature Set 2   |RMSE |Feature Set 3   |RMSE |Feature Set 4   |RMSE |Feature Set 5   |RMSE |
|Cross validation|19950|Cross validation|20014|Cross validation|20165|Cross validation|18581|Cross validation|18415|
|Test predictions|58395|Test predictions|19112|Test predictions|19213|Test predictions|18173|Test predictions|18474|

| Lasso          |     |                |     |                |     |                |     |                |     |
|----------------|-----|----------------|     |----------------|     |----------------|     |----------------|     |
|Feature Set 1   |RMSE |Feature Set 2   |RMSE |Feature Set 3   |RMSE |Feature Set 4   |RMSE |Feature Set 5   |RMSE |
|Cross validation|20837|Cross validation|20151|Cross validation|20262|Cross validation|18783|Cross validation|18660|
|Test predictions|50134|Test predictions|19172|Test predictions|19326|Test predictions|18472|Test predictions|18514|

| Enet           |     |                |     |                |     |                |     |                |     |
|----------------|-----|----------------|     |----------------|     |----------------|     |----------------|     |
|Feature Set 1   |RMSE |Feature Set 2   |RMSE |Feature Set 3   |RMSE |Feature Set 4   |RMSE |Feature Set 5   |RMSE |
|Cross validation|20837|Cross validation|20151|Cross validation|20262|Cross validation|18783|Cross validation|18660|
|Test predictions|50134|Test predictions|19172|Test predictions|19326|Test predictions|18472|Test predictions|18514|

We will use Feature Set 5 to build our final model for Kaggle submission. Based on the cross validation and test prediction RMSE scores, the Ridge Regression model of Feature Set 5 seems to have performed the best out of the 3.

## Model Production and Kaggle Submission

In [64]:
# Read in data
train_X = pd.read_csv('../datasets/kaggle_submission/train_X_processed.csv', keep_default_na=False, index_col='Id')
train_y = pd.read_csv('../datasets/kaggle_submission/train_y.csv', keep_default_na=False, index_col='Id')
test_X = pd.read_csv('../datasets/kaggle_submission/test_X_processed.csv', keep_default_na=False, index_col='Id')

In [65]:
# Create train and test sets based on 30 features from Feature Set 3
train_X_3 = train_X[feature_set_3]
test_X_3 = test_X[feature_set_3]

In [66]:
# Get interaction features of train_X_3 and test_X_3
poly = PolynomialFeatures(interaction_only=True)
train_X_4 = poly.fit_transform(train_X_3)
test_X_4 = poly.transform(test_X_3)

In [67]:
# Create final train and test datasets using Feature Set 5
train_X_5 = pd.DataFrame(train_X_4, columns=poly.get_feature_names(train_X_3.columns))[feature_set_5]
test_X_5 = pd.DataFrame(test_X_4, columns=poly.get_feature_names(test_X_3.columns))[feature_set_5]

In [68]:
# Create Ridge Regression CV of training data of Feature Set 5
ridge_alphas = np.logspace(0, 5, 200)

optimal_ridge = RidgeCV(alphas=ridge_alphas, cv=5)
optimal_ridge.fit(train_X_5, train_y)

print(optimal_ridge.alpha_)

18.041864093920726


In [69]:
# Create Ridge Regression model with optimal alpha
ridge = Ridge(alpha=optimal_ridge.alpha_)
ridge.fit(train_X_5, train_y)

Ridge(alpha=18.041864093920726)

In [70]:
# Check cross val score (RMSE)
abs(cross_val_score(ridge, train_X_5, train_y, cv=5, scoring='neg_root_mean_squared_error')).mean()

18449.870248425577

In [71]:
submission_final = pd.DataFrame(test_X.index, columns=['Id'])

In [72]:
submission_final['SalePrice'] = ridge.predict(test_X_5)

In [73]:
submission_final.to_csv('../datasets/kaggle_submission/submission_final.csv', index=False)

Final submission to Kaggle acheived Private Score: 29167 and Public Score: 24979.