# Lab:  Model Selection for Boston housing data

In this lab, you will apply polynomial regression with model order selection to the Boston housing dataset. 

Before doing this lab, you should review the ideas in the [polynomial model selection demo](./polyfit.ipynb).  In addition to the concepts in that demo, you will learn to:
* Load data
* Fit a polynomial model for a given model order 
* Select the model order via K-fold cross-validation and the one-standard-error rule.
  

## Loading the data

We first load the standard packages.

In [12]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

We now load the Boston housing dataset, which happens to be pre-loaded into sklearn.

In [13]:
from sklearn.datasets import load_boston
boston = load_boston()

Let's see what the dataset contains.

In [None]:
print(boston.keys())

The structure `boston` is made up of
* `data`: which contains the information for various houses
* `target`: which contains their prices
* `feature_names`: which contains the names of the features, and
* `DESCR`: which describes the dataset
* `filename`: which shows where the file is stored on your local machine

Let's print the description.

In [None]:
print(boston.DESCR)

Let's turn this dataset into a `pandas` dataframe for ease of handling.

In [None]:
import pandas as pd
df = pd.DataFrame(boston.data)
df.columns = boston.feature_names
df['MEDV'] = boston.target
df.head(6)

Now, create the target vector `y` using the values in the column `MEDV` using the `numpy.array` command. 
Similarly, create the feature vector `x` using the values in the `LSTAT` column.

In [6]:
# TODO
# y = ...
# X = ...


# Fitting Models with Different Orders
We will now fit the polynomial model to the data using order d = 1 and order d = 15, for illustration.

For this, we'll use the `polyfit` command of the `numpy.polynomial.polynomial` library.

In [14]:
# TODO 
# Import polynomial library
# d1 = 1
# d2 = 15
# beta1 = 
# beta2 =
# xp = # grid of feature values
# yp_hat1 = # target prediction on grid using polynomial order d1
# yp_hat2 = # target prediction on grid using polynomial order d2
# Make a scatterplot and superimpose prediction curves for d1 and d2
# Add grid lines, axis labels, and a legend


How do these two model orders perform?  Are the results as expected? Can we do better?

## K-fold Cross-Validation

We now optimize the polynomial model order using 5-fold cross-validation.  You can follow the method used in the polynomial demo.  The first step is to create a matrix of test RSS values over all hypothesized model orders and all splits.

In [8]:
from sklearn import model_selection

# TODO
# Create a k-fold object
# k = 5
# kfo = ...
# Model orders to be tested
# dtest = 
# nd = len(dtest)
# RSSts = np.zeros((nd,k))
# Loop over the folds
    # Get the training data in the split
    # Loop over the model order
        # Fit data on training data
        # Measure RSS on test data
        # RSSts[it,itsplit] = 



Next, compute the mean and standard error of the RSS over the folds for each model order.  The standard error is defined as the standard deviation divided by $\sqrt{K}$, where $K$ is the number of folds.  You can use `ddof=1` in `np.std` to get an unbiased SE estimate.

With these RSS statistics, use the one-standard-error rule to find the best model order.  Print out the model order that minimizes mean test RSS, as well as the model order estimated by the one-standard-error rule.

In [17]:
# TODO
# compute mean and standard error of RSS
# find model order that minimizes test RSS
# print("The model order that minimizes mean test RSS is ...")
# estimate model order according to the one-standard-error rule
# print("The model order estimated by the one-standard-error rule is ...")

Next, illustrate the one-standard-error-rule procedure by making a plot that shows the following:
* the mean test RSS curve with errorbars
* a dashed line showing the model order yielding minimum mean test RSS
* a dashed line showing the target RSS
* a dashed line showing the model order estimated by the one-standard-error rule

Also, add a grid and axis labels to your plot.  Use ylim if needed to zoom into the relevant range.

In [18]:
# TODO


Finally, make a scatter plot of the data and superimpose
* the polynomial prediction curve with the order that minimizes mean RSS
* the polynomial prediction curve with the order selected by the one-standard-error rule

As usual, add grid lines, axis labels, and a legend to your curve

In [19]:
# fit the model
# compute the polynomial prediction curve 
# make scatterplot and superimpose curves
# add legend, axis labels, grid