<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Instructions" data-toc-modified-id="Instructions-1">Instructions</a></span></li><li><span><a href="#Challenge-Prompts" data-toc-modified-id="Challenge-Prompts-2">Challenge Prompts</a></span></li></ul></div>

Instructions
------

- Complete this individually. 

    You must be able to fit fundamental machine learning models by yourself.
    <br>
- Type every command. 

    You learn almost nothing by copy n' pasting code. Typing the commands will build procedural fluency and you will make small typos that will force you to debug common mistakes. Tab complete is awesome, use it!
<br>
- Complete the activity in Deepnote. 

    After completion, send the link as a private message in Zoom to Brian. Time permitting, Brian might give you a quick code review. It also signals who in the class is done.
    <br>
- Any random seed should be set to `42` so we can compare results amongst ourselves.

In [17]:
reset -fs

In [18]:
# TODO: 
# - Write import statement to load boston housing data
# - Call the following on the data set to 
#    - help function on load
#    - keys
#    - DESCR
#    - feature_names
#    - shape
# - Load boston housing data into correctly named variables 



In [19]:
## Solutions

from sklearn.datasets import load_boston 
help(load_boston)
X, y = load_boston(return_X_y=True)

Help on function load_boston in module sklearn.datasets._base:

load_boston(*, return_X_y=False)
    Load and return the boston house-prices dataset (regression).
    
    Samples total               506
    Dimensionality               13
    Features         real, positive
    Targets           real 5. - 50.
    
    Read more in the :ref:`User Guide <boston_dataset>`.
    
    Parameters
    ----------
    return_X_y : bool, default=False.
        If True, returns ``(data, target)`` instead of a Bunch object.
        See below for more information about the `data` and `target` object.
    
        .. versionadded:: 0.18
    
    Returns
    -------
    data : :class:`~sklearn.utils.Bunch`
        Dictionary-like object, with the following attributes.
    
        data : ndarray of shape (506, 13)
            The data matrix.
        target : ndarray of shape (506, )
            The regression target.
        filename : str
            The physical location of boston csv dataset.
   

In [20]:
# Tests to keep you honest 
 
assert X.shape == (506, 13)
assert y.shape == (506,)

In [21]:
# TODO: 
# - Write import statement to split your data into train and validation 
# - Assign train and validation data. Use seed set to 42 so you can compare answers with classmates.

In [22]:
## Solutions

from sklearn.model_selection import train_test_split

X_train, X_validation, y_train, y_validation= train_test_split(X, y, random_state=42)

In [23]:
# TODO: Fit a baseline model 
# - Import linear regression algorithm, MSE metric, pipeline, and standardize function
# - Create a pipeline
# - Train algorithm in pipeline 
# - Find performance metric on validation data

In [24]:
## Solutions

from sklearn.linear_model  import LinearRegression
from sklearn.metrics       import mean_squared_error
from sklearn.pipeline      import Pipeline
from sklearn.preprocessing import StandardScaler

pipe = Pipeline([('scaler', StandardScaler()), 
                 ('lr',     LinearRegression())])

pipe.fit(X_train, y_train)
y_pred = pipe.predict(X_validation)

mse = mean_squared_error(y_validation, y_pred)
print(f"Mean squared error: {mse:,.2f}")  

Mean squared error: 22.10


In [25]:
# TODO: Find out which the following linear_model algorithms is the best:
# - Lasso
# - Ridge
# - ElasticNet
# - HuberRegressor
# - Another one of your choice that is appropiate for the data 

# Make your personal prediction before running the code. Which one do you apriori think will perform best?

In [26]:
## Solutions

from sklearn.linear_model  import Lasso, Ridge, ElasticNet, HuberRegressor, BayesianRidge

# Programmatically fit 
algorithms = [LinearRegression(), Lasso(), Ridge(), ElasticNet(), HuberRegressor(), BayesianRidge()]
results = dict()

for algo in algorithms:
    pipe = Pipeline([('scaler', StandardScaler()), 
                     ('lm',     algo)])

    pipe.fit(X_train, y_train)
    y_pred = pipe.predict(X_validation)
    mse = mean_squared_error(y_validation, y_pred)
    print(f"{algo.__class__.__name__:<17} - mean squared error: {mse:,.2f}")

LinearRegression  - mean squared error: 22.10
Lasso             - mean squared error: 26.01
Ridge             - mean squared error: 22.12
ElasticNet        - mean squared error: 26.54
HuberRegressor    - mean squared error: 26.23
BayesianRidge     - mean squared error: 22.21


In [27]:
# TODO: Finalize model 
# - Retrain best algorithm on all the data 
# - Print the parameters of the final model with associated feature names

In [36]:
## Solutions

pipe = Pipeline([('scaler', StandardScaler()), 
                 ('regr',  LinearRegression())])
pipe.fit(X, y)

boston_data = load_boston() # Get data bunch to display feature names

print(f"Intercept:              {pipe['regr'].intercept_:.2f}")
print(f"Coefficients:", end=" ")
for feature_name, coef in zip(boston_data.feature_names, pipe['regr'].coef_):
    print(f"{feature_name.title():>7} {coef:>7,.2f}", end="\n              ")

Intercept:              22.53
Coefficients:    Crim   -0.93
                   Zn    1.08
                Indus    0.14
                 Chas    0.68
                  Nox   -2.06
                   Rm    2.67
                  Age    0.02
                  Dis   -3.10
                  Rad    2.66
                  Tax   -2.08
              Ptratio   -2.06
                    B    0.85
                Lstat   -3.74
              

In [None]:
# TODO: Why was the champion algorithm the best?

In [None]:
## Solutions

"""
The dataset has relatively few features and each one contributes to the final performance of the model.
Thus, there is no need to regularize to increase performance.

We should proceded with a regular linear regression. 
"""

Challenge Prompts
------

If you have extra time, try these:

- Understand the data and how `StandardScaler` changes it. Visualize the univariate distribution of features before and after applying `StandScaler`.
- Start hand-tuning the hyperparameters for algorithms. See which hyperparameter combination yields a higher score on the validation data.

<br>
<br> 
<br>

----