# Boosting, Classification Metrics and Advanced Sklearn

<b>Goals</b>

- Follow up our lesson on ensemble methods with boosting, what it is and how it works.
- Use the Adaptive (ADA) Boosting Classifier.
- Refresher lesson on model evaluation tolls beyond accuracy score: sensitivity, recall, precision, and roc_auc
- How to use high-powered tools in sklearn to optimize your models and minimize your work load and time

## Boosting

- Boosting is an ensemble method where a model is comprised of a sequence of models, as opposed to a set of parallel models as with Random Forest.
- Unlike bagging, boosting uses random subsets of training data <b>WITHOUT</b> replacement.
- It is an iterative process. Begins by training simple model on the whole data, pinpoints the inaccuracies, and trains a new model to target those inaccuracies (misclassification rate, residuals.) The new models try to predict what the previous ones were unable to correctly predict. Repeat until reaching a stopping point parameter. The whole set of models is what's used to make predictions.
- Boosting process:
    - Randomly select a batch of data from training dataset without replacement to train "weak learner."
    - Randomly select a second batch of data from training dataset without replacement AND add around half of the samples that were misclassified from the previous model.
    - Go back to the original training dataset and retrieve the data points in which the two models had differing classifications.
    - Make predictions by combining the system of weak learners and takin the vote (classification) or avearge (regression.)
    
- Can be used both for regression and classification

### AdaBoost Classifier

- The AdaBoost (Adaptive Boosting) algorithm fits sequential weak classifiers, which are classifiers that are slightly better than random chance. These classifiers are usually tree-based models with a lower depth level. Adaboost actually uses the whole training dataset instead of sample. The data is weighted in each iteration of modeling to help it learn from the mistakes of the previous models.

- The weak learners in the AdaBoost algorithm are Decision Trees with one depth-level aka "Decision Stumps." They literally only use one decision.

- Each data point in the training data is assigned a weight. In the first model, every point has the same weight value which equal 1/number of values. 

- The first Decision Stump is fit on the whole data using weighted samples. Only works with binary clasification problems. The model outputs either a 1 or - 1, irregardless of the class labels in the target variable. 

- Error determined by the misclassification rate, which is 1 - accuracy score. Accuracy score of 0.71 means error rate of 0.29.

- However error significantly changes when differents are introduced. 

- With weights, error = sum(w(i) * terror(i)) / sum(w). If terror is 1, then equals wrong prediction, 0 if correct.

AdaBoost visually explained:

![a](https://www.analyticsvidhya.com/wp-content/uploads/2015/11/bigd.png)

Source: [Analytics Vidhya](https://www.analyticsvidhya.com/blog/2015/11/quick-introduction-boosting-algorithms-machine-learning/)

Box 1: Each data point has equal weighting is fit on a decision stump which is a vertical line.

<br>

Box 2: The three plus signs that were incorrectly classified in Box 1 have been enlarged (weighted) and the model has been retrained.

<br>

Box 3: Three minus signs have been given bigger weight values and the new model (horizontal line) has been fit to account for that.

<br>

Box 4: Combines the three Decision stump models, which vastly outperforms any of the three stumps.

Let's manually calculate weights

In [None]:
#List of weights
w = [0.2, 0.2, 0.2, 0.2, 0.2]
#List of actual values
y = [1,  1, -1, 1, -1]
#List of predictions
p = [-1, 1, 1, 1, -1]
#List or terrors
t = [1, 0, 1, 0 , 0]

In [None]:
#Regular error rate calculation
(1 + 0 + 1 + 0 + 0)/(1 + 1 + 1 + 1 + 1.)

In [None]:
#Error calculation with weights (same product as above)
e = (0.2 * 1 + 0.2 * 0 + 0.2 * 1 + 0.2 * 0 + 0.2 * 0)/ (0.2 + 0.2 + 0.2 + 0.2 + 0.2)
e

In the next part we pass in the error rate through this function: 0.5 * log((1-e)/e)

This gives a coefficient: a

In [None]:
#Import numpy
import numpy as np

a = 0.5 * np.log((1 - e)/ e)
a

We use this value to update our new weights.

Formula is old weight value times the exponent of the negative value of a times prediction times actual value

In [None]:
#First value

w1 = 0.2 * np.exp(-a * 1 * -1)


w2 = 0.2 * np.exp(-a * 1 * 1)


w3 = 0.2 * np.exp(-a * 1 * -1)


w4 = 0.2 * np.exp(-a * 1 * 1)


w5 = 0.2 * np.exp(-a * 1 * 1)


print (w1, w2, w3, w4, w5)

Weights go up for wrong predictions and go down for correct ones.

We're not finished yet.

Then we normalize the weight by diving each weight by the sum of weights

In [None]:
weight_sum = w1 + w2 + w3 + w4 + w5

#New weights
w1 = w1/(weight_sum)
w2 = w2/(weight_sum)
w3 = w3/(weight_sum)
w4 = w4/(weight_sum)
w5 = w5/(weight_sum)

print (w1, w2, w3, w4, w5)

These are our new weights which we'll use in the next round of modeling

In the follow - up model, a second Decision Stump model is trained using our new weights. The weights are used to determine the split in the decision tree. This process continues until we reach the n_estimators parameter we set.

Increasing the weights for the mis-classified data points forces the models to train more heavily on the data it incorrectly classified.

### Predictions

- AdaBoost makes predictions by calculating the weighted average of the sequence of Decision Stumps. 

- When you pass in a new data point, the model predicts 1 or -1.

- The weights of each model by each one's stage value. The prediction is derived from the sum of the of the weighted predictions. If sum > 0 then return the first class else return second class.

In [None]:
#Five model predictions
preds = np.array([-1, -1, 1, -1, 1])
preds.sum()

Without weighting the prediction would be -1.

In [None]:
weights = np.array([.2, .4, .8, .3, .9])

sum(weights * preds)

Prediction with weighted models equals 1.

<b>Warnings</b>

- Requires rich data noisy data by design can negatively influence model.
- Same goes with outliers, the model will chase outliers.

### Coding AdaBoost

1. Visualize the decision boundaries of AdaBoost

2. Use AdaBoost on the spotify dataset

In [None]:
#Imports
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.cross_validation import train_test_split, cross_val_score
from sklearn.datasets import make_classification
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import confusion_matrix, recall_score, precision_score, roc_auc_score, roc_curve

Generate and visualize fake data

In [None]:

#Generate fake data that is 400 x 2.
data = make_classification(n_samples=400, n_features=2, n_informative=2, n_redundant=0, 
                    class_sep=.54, random_state = 8)

df = pd.DataFrame(data[0], columns=["feature1", "feature2"])
#Add target variable to df 
df["target"] = data[1]

#Call scatter plot of feature1 vs feature2 with color-encoded target variable
plt.style.use("fivethirtyeight")
plt.figure(figsize=(11, 8))
#Color encode target variable
colors = df.target.map({0:"b", 1:"r"})
plt.scatter(df.feature1, df.feature2, c = colors, s = 100, alpha=.5);

In [None]:
#Assign X and y
X = 
y = 

#Fit a Decision Tree model with max_depth = 2 on the data.

dt = 



In [None]:
#Decision boundary function
def plot_decision_boundary(model, X, y):
    X_max = X.max(axis=0)
    X_min = X.min(axis=0)
    xticks = np.linspace(X_min[0], X_max[0], 100)
    yticks = np.linspace(X_min[1], X_max[1], 100)
    xx, yy = np.meshgrid(xticks, yticks)
    ZZ = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = ZZ >= 0.5
    Z = Z.reshape(xx.shape)
    plt.rcParams["figure.figsize"] = (10,7)
    fig, ax = plt.subplots()
    ax = plt.gca()
    ax.contourf(xx, yy, Z, cmap=plt.cm.bwr, alpha=0.2)
    ax.scatter(X[:,0], X[:,1], c=y, alpha=0.4, s = 50)

In [None]:
#Feed dt model, features and colors
;

In [None]:
#Train AdaBoost model on the same data and visualize it

#Intialize AdaBoost model with 20 estimators
ada = 

#Fit model


#Visualize model boundaries


## <b> Using Spotify data to predict whether or not I will like a song? </b>

<b> Attributes </b>


    Acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.
    
    Danceability: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.

    Instrumentalness: Predicts whether a track contains no vocals. "Ooh" and "aah" sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly "vocal". The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.
    
    Loudness: The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db.
    
    Mode: Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.

    Valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).
    
    Tempo: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.

    Energy: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.
    
More information here https://developer.spotify.com/web-api/get-audio-features/

Link to my article about the project: https://opendatascience.com/blog/a-machine-learning-deep-dive-into-my-spotify-data/

In [None]:
#Import spotify data

spotify = pd.read_csv("../../data/spotify_data.csv", index_col=[0])


Compare and contrast Decision Trees and AdaBoost

In [None]:

#Intialize AdaBoost with 300 estimators
ada = 

In [None]:
#Assign X and y

X = 

y = 

In [None]:
#Null accuracy



In [None]:
#Train test split with random state = 20



In [None]:
#Fit Ada boost model on training data and score it on testing





Cross validate 

Cross validation shows AdaBoost is a decent but not great model.

Perhaps we chose the wrong estimator value.

Let's make a validation curve to determine the best value for the estimator.

<br>

This will take a while.

In [None]:
#We're going to time our code

#Import time tool
from time import time

In [None]:
#Intialize time variable
t = time()

#Create list of estimator values
estimators = range(50, 1050, 100)

#Intialize cross validation scores list


#Iterate over estimators values, fit models, and then append scores to cv_scores

    
    
#Print difference in time

print (time() - t)

In [None]:
    
#Plot estimators versus scores

plt.figure(figsize= (8, 7))
plt.plot(estimators, cv_scores, linewidth = 3)
plt.xlabel("N Estimators")
plt.ylabel("Cross Validated Accuracy Scores");

In [None]:
#Derive best estimator value


## Classification Model Evaluation Techniques

![s](http://www.dataschool.io/content/images/2015/01/confusion_matrix2.png)

Source: Data Schoool

**True Positives (TP):** Number of correct positive predictions

**True Negatives (TN):** Number of correct negative predictions

**False Positives (FP):** Number incorrect positive predictions

**False Negatives (FN):** Number of incorrect negative predictions

**Recall aka sensitivity aka the True Positive Rate:** The number of correct positive predictions divided by number of positive instances

**Precision:** The number of correct positive predictions divided by number of positive predictions

**False Positive Rate aka Fall Out:** The number of incorrect positive predictions divided by number of negative instances

**True Negative Rate aka Specificity:** The number of correct negative predictions divided by number of negative instances 

Formula table:
![a](http://www.chioka.in/wp-content/uploads/2013/08/Metrics-Table.png)

Confusion matrix with metrics:

![s](https://eus-www.sway-cdn.com/s/4YEmvTlyess2YF1M/images/VfcIF1yrYJrvLl?quality=1071&allowAnimation=true)

Super confusion matrix:
![q](https://image.ibb.co/bXkGxm/Screen_Shot_2017_11_28_at_12_03_48_PM.png)

Source: Wikipedia

Think about how these metrics can tell us more about the efficacy of a model as opposed to accuracy score.

Is one metrics more useful than others? In which context would it make sense to evaluate a model based on FPR vs FNR?

In [None]:
#Train an Adaboost model with 50 estimators and make predictions using test set

model = 

preds =

In [None]:
#Null accuracy of y_test


In [None]:
#Pass the predictions and y_test into a confusion matrix
cm = 
cm

Let's try calculating the TPR, TNR, FPR, and FNR rates manually

In [None]:
#TPR
cm[1,1]/float(cm.sum(axis=1)[1])

In [None]:
#TNR
cm[0,0]/float(cm.sum(axis=1)[0])

In [None]:
#FPR
cm[0,1]/float(cm.sum(axis=1)[0])

In [None]:
#FNR
cm[1,0]/float(cm.sum(axis=1)[1])

If you were a spotify data scientist would you want a model that produces more false negatives or false positives?

In [None]:
#Calculate precision and recall scores with sklearn
ps = 
rs = 

print ("The precision score is {:.2f} and the recall score is {:.2f}".format(ps, rs))

Cross validate with precision and recall

In [None]:
#Precision


In [None]:
#Recall


![w](https://chrisalbon.com/images/machine_learning_flashcards/Receiver_Operating_Characteristic_print.png)

ROC (receiver operating characteristic) curve is a commonly used way to visualize the performance of a binary classifier.

AUC (area under curve) is arguably the best way to summarize a model performance's in a single number.

In [None]:
#Derive probabilities of class 1 from the test set
test_probs = 
#Pass in the test_probs variable and the true test labels aka y_test in the roc_curve function
fpr, tpr, thres = 
#Outputs the fpr, tpr, for varying thresholds

In [None]:
#Plotting False Positive Rates vs the True Positive Rates
#Dotted line represents a useless model
plt.figure(figsize=(13,10))
plt.plot(fpr, tpr, linewidth=8)
#Line of randomness
plt.plot([0,1], [0,1], "--", alpha=.7)
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.show()

How do you rate this model?

In [None]:

#Caculate the area under the curve score using roc_auc_score


In [None]:
#Cross validated roc_auc score



Plotting threshold vs FPR/TPR on the same plot

In [None]:
plt.figure(figsize=(12,9))
plt.plot(thres, fpr, linewidth=5, label = "FPR Line")
plt.plot(thres, tpr, linewidth=5, label = "TPR line")
plt.xlabel("Thresholds")
plt.ylabel("False Positive Rate")
plt.legend()
plt.show();

What do you see here? Why are is there a negative correlation in both lines?

## Advanced Sklearn tools

Overview:

- Grid search
- Pipelining
- Imputation
- Feature unions
- Feature selections

In [None]:
#More imports
from sklearn.neighbors import KNeighborsClassifier
from sklearn.grid_search import GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline, make_pipeline

### Gridsearch

Algorithm that tests every combination of model parameters to find the best one.

Let's use GridSearch to find the best K value for a KNN model and Spotify data

In [None]:
#Intialize parameter grid

#Range of neighbors to test
neighbors_range = 

#Dictionary of parameter values 
param_grid_knn = 


param_grid_knn

In [None]:
#Intialize Grid

grid_knn =

#Fit grid on data



In [None]:
#Scores


In [None]:
#Whats the best cross validated accuracy score



In [None]:
#Find the best parameters



This simple technique gives us the best K value.

We can use the best model from grid_knn to make predictions.

In [None]:
#Input 
x = [[0.2, .15, 0.68, 0.05, 0.328]]

#Make prediction 


In [None]:
#Get probability


Quick exercise:
<br>
Use grid search to determine the best depth value in a decision tree. Use depths from 2 - 20.

In [None]:


#Range of neighbors to test
depths_range = 

#Dictionary of parameter values 
param_grid_dt = {}


param_grid_dt

The CV in GridSearchCV stands for cross validation which means we have to set a cv and scoring value.

In [None]:
#Intialize Grid

grid_dt = 

#Fit grid on data



In [None]:
#Best score for DT model


In [None]:
#Best parameter for DT model


In [None]:
#Make prediction



So far our grids have been one-dimensional, now let's try using multiple dimensions

In [None]:
#Param grid with test different split criteria as well.
param_grid_dt = {"criterion": ["gini", "entropy"], "max_depth": depths_range}

param_grid_dt

It's going to cross-validate every combination between the criterion parameters and depth parameters.

In [None]:
#Intialize Grid
grid_dt = GridSearchCV(estimator = DecisionTreeClassifier(), 
                        param_grid = param_grid_dt, cv = 5, scoring = "accuracy")
#Fit grid on data
grid_dt.fit(X, y)

In [None]:
#Best parameter

grid_dt.best_params_

In [None]:
#Best score

grid_dt.best_score_

How many models did this grid search function conduct?

Let's add in some parameters

In [None]:

param_grid_dt["min_samples_split"] =[2, 10, 20]
param_grid_dt["max_features"] = [1, 2, 3, 4, 5]

In [None]:
#Intialize Grid
grid_dt = 

#Time the code 

t = time()

#Fit grid on data


#Print time difference

print (time() - t)

In [None]:
#Best parameter



In [None]:
#Best score



In [None]:
#Make prediction



Obviously grid search takes a long time and in some case can cause memory errors. This is where RandomizedSearchCV comes in.

In [None]:
#Import 
from sklearn.grid_search import RandomizedSearchCV

Functions just like GridSearchCV, except we have to choose a value n_iter which is the random number of combinations we testing and set param_distributions instead of param_grid.

In [None]:
#Intialize RandomizedSearchCV grid with n_iter = 20
grid_dt = 

#Time the code 

t = time()

#Fit grid on data


#Print time difference

print (time() - t)

Reduced run time by a huge percentage!

But now let's see if we sacrificed performance.

In [None]:
#Check best score

grid_dt.best_score_

### Pipelines

Let's go back to using the KNN model.


We know that we need to scale our data for the KNN algorithm right?

In [None]:
#Scale data and fit it a Grid search function it.

#Intialize scalar
scale = 

#Fit and transform scaler on the data
Xs = 

In [None]:
#Intialize Grid

grid_knn_s = GridSearchCV(estimator = KNeighborsClassifier(), 
                        param_grid = param_grid_knn, cv = 5, scoring = "accuracy")

#Fit grid on scaled data

grid_knn_s.fit(Xs, y)

In [None]:
#Best score
grid_knn_s.best_score_

In [None]:
#Best K value
grid_knn_s.best_params_

In [None]:
#Make prediction

#First transform predict using scaler

xs = 

#Pass in xs to grid model



Time to make a pipeline.

In [None]:
#Pass scaler and knn classifier objects into make_pipeline function
pipe = 

In [None]:
#Create new param_grid
neighbors_range = range(2, 21)
param_grid_knn = {}
param_grid_knn["kneighborsclassifier__n_neighbors"] = neighbors_range
param_grid_knn

In [None]:

#Pass in pipe into GridSearchCV function, 
grid_knn_pipe = GridSearchCV(pipe, param_grid=param_grid_knn, cv=5, scoring='accuracy')

#Fit on original versions of data
grid_knn_pipe.fit(X, y)

#Best scores and params
print grid_knn_pipe.best_score_, grid_knn_pipe.best_params_

You can also pass in the pipe object into a cross_val_score function

In [None]:
# Use the cross-validation process using Pipeline
pipe = make_pipeline(StandardScaler(), KNeighborsClassifier(n_neighbors=3))
cross_val_score(pipe, X, y, cv=5, scoring='accuracy').mean()

Class exercise: Use grid search to model the 2016 Democratic primary data with the person who is next to you at your table. 

In [None]:
#Load in data files
primary = pd.read_csv("../data/primary_data//primary_results.csv")
county = pd.read_csv("../data/primary_data/county_facts.csv")
county_dict = pd.read_csv("../data/primary_data/county_facts_dictionary.csv")

subset_col_index = [0,3,5,9,10,12,18,20,23,25,33,34,53]

county = county.iloc[:,subset_col_index].copy()

subset_cols = ["fips","population", "pop_change", "senior_pop_per", "female_pop_per", "black_pop_per",
               "white_pop_per", "foreign_pop_per", "college_degree_pop_", "commute_time", "median_income",
               "poverty_rate", "pop_density"]

col_dict = dict(zip(county.columns, subset_cols))
#Use dictionary to rename the columns
county.rename(columns=col_dict, inplace=True)
primary.dropna(inplace=True)
bern = primary[primary.candidate== "Bernie Sanders"]
hill = primary[primary.candidate== "Hillary Clinton"]
bern = bern[["fips", "candidate", "votes"]]
dem = pd.merge(hill, bern, on="fips")
dem.rename(columns={"votes_x":"clinton_votes", "votes_y":"sanders_votes"}, inplace=True)
dem["winner"] = dem.clinton_votes - dem.sanders_votes
def vote_winner(x):
    if x >0:
        return "H"
    elif x == 0:
        return "TIE"
    else:
        return "B"
    
dem["winner"] = dem.winner.apply(vote_winner)

dem = dem[dem.winner!= "TIE"]
dem = dem[["fips", "winner"]]
df = pd.merge(county, dem, on="fips")
df.set_index("fips", inplace=True)
df.head()

In [None]:
#Answer



Now let's make a pipeline using regression

In [None]:
#Imports
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression

In [None]:
#Load in boston dataset
from sklearn.datasets import load_boston
boston = load_boston()
df = pd.DataFrame(boston["data"])
df.columns = boston["feature_names"]
df["MEDV"] = boston["target"]
df.head()

In [None]:
boston["DESCR"].split("\n")

In [None]:
#Assign X and y

X = df.drop("MEDV", axis =1)
y = df.MEDV

In [None]:
#Use a Pipeline Class instead of function to establish pipeline
pipe_poly = 

In [None]:
#Select a few features from X
XX = X[["RM", "DIS", "NOX", "CRIM"]].copy()

In [None]:
#Intialize range values for poly
poly_range = [1, 2, 3, 4, 5, 6, 7]

#Intialize grid dictionary
param_grid_poly = {}

#Input grid values
param_grid_poly["polynomialfeatures__degree"] = poly_range

#Establish the grid
poly_grid = GridSearchCV(pipe_poly, 
                         param_grid = param_grid_poly, cv=5, 
                         scoring='neg_mean_squared_error')

In [None]:
# Fit on data

poly_grid.fit(XX, y)

Randomized Grid Search with Ridge regression

In [None]:
from sklearn.linear_model import Ridge

In [None]:
pipe_poly = Pipeline(steps=[('polynomialfeatures', PolynomialFeatures()),
                            ('ridgeregression', Ridge())]) 

param_grid_ridge = {'polynomialfeatures__degree': [1, 2, 3, 4, 5],
              'ridgeregression__alpha': [0.001, 0.01, 0.1, 1, 10, 100]}

grid_ridge = RandomizedSearchCV(pipe_poly, param_distributions=param_grid_ridge, 
                                n_iter = 5 , cv = 5, scoring='neg_mean_squared_error')
grid_ridge.fit(XX, y)

In [None]:
print (grid_ridge.best_params_, grid_ridge.best_score_)

## Resources

<b> Boosting: </b>

- https://machinelearningmastery.com/gentle-introduction-gradient-boosting-algorithm-machine-learning/

- https://www.analyticsvidhya.com/blog/2015/11/quick-introduction-boosting-algorithms-machine-learning/

- http://blog.kaggle.com/2017/01/23/a-kaggle-master-explains-gradient-boosting/

- https://www.youtube.com/watch?v=Rm6s6gmLTdg&list=PLaslQpv_LmSKxSCBPdKWEI7lLHrTCeewl

- https://www.coursera.org/learn/practical-machine-learning/lecture/9mGzA/boosting

<b> Grid Search and Pipelines </b>

- https://chrisalbon.com/machine-learning/cross_validation_parameter_tuning_grid_search.html
- https://machinelearningmastery.com/how-to-tune-algorithm-parameters-with-scikit-learn/
- https://www.youtube.com/watch?v=Gol_qOgRqfA
- https://chrisalbon.com/machine-learning/pipelines_with_parameter_optimization.html
- https://chrisalbon.com/machine-learning/hyperparameter_tuning_using_random_search.html
- https://machinelearningmastery.com/automate-machine-learning-workflows-pipelines-python-scikit-learn/
- https://www.civisanalytics.com/blog/workflows-in-python-using-pipeline-and-gridsearchcv-for-more-compact-and-comprehensive-code/


Evalution:
- http://www.dataschool.io/roc-curves-and-auc-explained/
- http://people.inf.elte.hu/kiss/13dwhdm/roc.pdf




# In class work.

Choose one of the following dataset to work on modeling for the rest of class using the new models and tools we've learned in the past couple weeks.

<br>
Spotify, Dem Primary, KC Housing, Movie Metadata, HR Employee, Breast Cancer, Default, Mushrooms, Red win quality, Zillow starter, or Pokemon.