<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Ensembles & Voting

_By: Jeff Hale - Penelope adapted from other materials_
___

### Learning Objectives
After this lesson students will be able to:
- Explain the difference between hard and soft voting
- Use a scikit-learn VotingClassifier and VotingRegressor 
- Describe calibration


### Prior Knowledge Required:
- Python basics
- Pandas basics
- Scikit-learn basics

## Ensemble Methods

Ensembling is building multiple models and then combining their results in some way to create predictions.

## Why would we build an "ensemble model?"

We can summarize this as the **wisdom of the crowd**.

## Wisdom of the Crowd: Guess the weight of Penelope

![](./images/penelope.jpg)

[Image source: https://www.npr.org](https://www.npr.org/sections/money/2015/07/17/422881071/how-much-does-this-cow-weigh)

In [1]:
first_guess = 1000

In [2]:
guesses = [1000, 1100, 1650, 1600, 1500, 1200, 1100, 1300, 1250]

In [4]:
np.mean(guesses)

1300.0

#### Imports

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsRegressor, KNeighborsClassifier
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.metrics import ConfusionMatrixDisplay
from sklearn.metrics import mean_squared_error

### Regression

Carvana car price prediction

In [5]:
df_cars = pd.DataFrame(dict(
    price=
    [34990, 32590, 25990, 32590, 30990, 36990, 44990, 28990, 39990, 
     30990, 31990, 28590, 15990, 21990, 35590, 27990, 21990],
    miles=
    [11791, 14893, 13256, 37654, 38127, 42904, 1358, 10659, 
    9255, 32743, 15990, 17428, 14833, 25848, 12505, 6877, 82197],
    year=
    [2019, 2018, 2019, 2015, 2018, 2017, 2020, 2019, 2019, 
    2014, 2019, 2019, 2010, 2018, 2018, 2019, 2014]
))

In [6]:
df_cars

Unnamed: 0,price,miles,year
0,34990,11791,2019
1,32590,14893,2018
2,25990,13256,2019
3,32590,37654,2015
4,30990,38127,2018
5,36990,42904,2017
6,44990,1358,2020
7,28990,10659,2019
8,39990,9255,2019
9,30990,32743,2014


### Set up X & y, tts, standardize.

Get the RMSE for a LinearRegression model, a KNN model, and a baseline model

In [7]:
X = df_cars.drop('price', axis=1)

In [8]:
y = df_cars['price']

In [9]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=23)

In [10]:
X_train

Unnamed: 0,miles,year
5,42904,2017
12,14833,2010
13,25848,2018
3,37654,2015
14,12505,2018
7,10659,2019
11,17428,2019
15,6877,2019
16,82197,2014
9,32743,2014


In [11]:
y_test

0     34990
10    31990
2     25990
1     32590
4     30990
Name: price, dtype: int64

### Baseline null model

In [None]:
#guess the mean -- (training data)

In [18]:
baseline_test_predictions = np.ones(y_test.shape)*np.mean(y_train)
baseline_test_predictions

array([30556.66666667, 30556.66666667, 30556.66666667, 30556.66666667,
       30556.66666667])

In [17]:
mean_squared_error(y_test, baseline_test_predictions, squared = False) #root mean squared error


#on average, off by $3000 - want to do better than this

3062.206902074239

### Standardize with 0 mean and unit variance

In [None]:
# since we are going to be using KNN - want to scale 

In [19]:
#instantiate a standard scaler 

scaler = StandardScaler()

#fit transform the train data 

X_train_scaled = scaler.fit_transform(X_train)

#transform the test data 

X_test_scaled = scaler.transform(X_test)

### Linear Regression model RMSE

In [20]:
#linear regression model 

lr = LinearRegression()

In [21]:
lr.fit(X_train_scaled, y_train)

LinearRegression()

In [24]:
#make predictions 

lr_preds = lr.predict(X_test_scaled)

#use predictions to evaluate MSE
lr_rmse = mean_squared_error(y_test, lr_preds, squared = False)
print(lr_rmse) #worse than the baseline by $900 

3937.6798105946023


### KNN model RMSE

In [25]:
#knn instantiate (leave at default - 8 neighbors) 

knn_reg = KNeighborsRegressor()

#fit 

knn_reg.fit(X_train_scaled, y_train)

#make predictions 
knn_preds = knn_reg.predict(X_test_scaled)



In [54]:
#RMSE 

mean_squared_error(y_test, knn_preds, squared = False)

#better than linear regressor but still not better than just guessing the mean 

3132.794279872204

# Ensemble! 🎻🎺

**ensemble:** "a group of items viewed as a whole rather than individually." [Source](https://languages.oup.com/google-dictionary-en/)

In machine learning, when you combine several models to form an _ensemble_ model.

![](./images/Ensemble.png)

Let's combine predictions from our KNN and Linear Regression models and weight them equally.

In [27]:
np.mean([knn_preds, lr_preds], axis = 0)

array([33312.42573053, 33273.32282743, 33298.78301726, 31852.52095791,
       31696.15590652])

In [28]:
pd.DataFrame([knn_preds, lr_preds]).T

Unnamed: 0,0,1
0,32230.0,34394.851461
1,32230.0,34316.645655
2,32230.0,34367.566035
3,31030.0,32675.041916
4,31150.0,32242.311813


In [29]:
ensemble_preds = np.mean([knn_preds, lr_preds], axis = 0)

In [30]:
mean_squared_error(y_test, ensemble_preds, squared = False)

3432.8417849984708

In this case, we'd be better off just sticking with the KNN model - but some models perform better on some datapoints, so combining them can be superior to either. (caveat here: very small sample size).

## Weights

We can also give more weight to one algorithm.

![Weights](./images/weights.jpg)

Let's weight the model predictions 80% KNN and 20% Linear Regression.

In [31]:
weighted_preds = .8*knn_preds + .2*lr_preds

In [33]:
mean_squared_error(weighted_preds, y_test, squared = False)

3223.707658340071

In [34]:
#do it with sklearn - built in estimator to do this 

from sklearn.ensemble import VotingRegressor

In [37]:
voter_1 = VotingRegressor([
    ('knn', KNeighborsRegressor()), 
    ('lr', LinearRegression())
])

In [38]:
voter_1.fit(X_train_scaled, y_train)

VotingRegressor(estimators=[('knn', KNeighborsRegressor()),
                            ('lr', LinearRegression())])

In [39]:
voter_1_preds = voter_1.predict(X_test_scaled)

In [40]:
mean_squared_error(voter_1_preds, y_test, squared = False)

#same as what we got when we did it by hand above 

3432.8417849984708

#### Add a decision tree

In [42]:
from sklearn.tree import DecisionTreeRegressor

In [43]:
voter_2 = VotingRegressor([
    ('knn', KNeighborsRegressor()), 
    ('lr', LinearRegression()),
    ('dtree', DecisionTreeRegressor(max_depth=2))
])

In [47]:
voter_2.fit(X_train_scaled, y_train)

VotingRegressor(estimators=[('knn', KNeighborsRegressor()),
                            ('lr', LinearRegression()),
                            ('dtree', DecisionTreeRegressor(max_depth=2))])

In [48]:
voter_2_preds = voter_2.predict(X_test_scaled)

In [49]:
mean_squared_error(voter_2_preds, y_test, squared = False)

#Looks like decision tree added some more value, let's give it some more weight

3139.404357709738

In [51]:
voter_3.fit(X_train_scaled, y_train)

VotingRegressor(estimators=[('knn', KNeighborsRegressor()),
                            ('lr', LinearRegression()),
                            ('dtree', DecisionTreeRegressor(max_depth=2))],
                weights=[0.3, 0.1, 0.6])

In [52]:
voter_3_preds = voter_3.predict(X_test_scaled)

In [53]:
mean_squared_error(voter_3_preds, y_test, squared = False)

#better than equally weighted! 

3013.7999729803996

#### The voting regressor can take a list of weights for each model

In [50]:
voter_3 = VotingRegressor([
    ('knn', KNeighborsRegressor()), 
    ('lr', LinearRegression()),
    ('dtree', DecisionTreeRegressor(max_depth=2))
], weights = [.3, .1, .6]) #could use a regressor to fine tune weights 

## Take aways

- Ensembling can lead to better predictions
- You can weight model predictions to give more importance to one model

## Classification Ensemble
#### Let's use the penguins dataset 🐧

![penguin parent and child](./images/penguins.jpg)

In [None]:
url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv"

In [None]:
df_pens = pd.read_csv(url)
df_pens

In [None]:
df_pens.info()

### Quick drop

The problem is too easy with all the columns. Let's make it harder by just using bill length.

In [None]:
df_pens = df_pens.loc[:, ['species', 'bill_length_mm']]

Drop missing values

In [None]:
df_pens = df_pens.dropna()

In [None]:
df_pens.info()

#### Target

In [None]:
df_pens['species'].value_counts()

In [None]:
df_pens['species'].value_counts(normalize=True)

### Split into X and y, then training and test

In [None]:
X = df_pens.drop('species', axis=1)

In [None]:
X

In [None]:
y = df_pens['species']

In [None]:
y

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=111)

### Null  baseline

#### What is our null prediction for each observation?

#### How does that prediction perform?

If only looking at accuracy, you can shortcut to your answer:

### KNN Model

#### Plot confusion matrix and score on accuracy

#### Make a function to show scores

In [None]:
def model_score(classifier, X, y):
    """fit and score a model - print and return accuracy and predict_proba
    
    Args:
        classifier: an instance of a scikit-learn classification estimator
        X (2d pd.DataFrame or np.ndarray): features 
        y (1d pd.Series on np.ndarry): outcome variable
    
    Returns: 
        accuracy score (float): accuracy on the X_test
        predict_proba (array of floats): predicted probabilities for each class for each sample
    """


#### Pass our new function a LogisticRegression algorithm and data

## Voting classifier ensemble

---
## Hard vs soft voting for classifiers

## Hard vs soft voting 

### Hard voting 
Each classifier predicts the class (0, 1, or 2). Then take the majority.

### Soft voting
Each classifier predicts the probabilities of each class. Sum the probabilities for each class. The class with the highest total is the prediction. 

### Ensemble classifier with soft voting

---
## Summary

You've seen how to put models you create into a voting regressor or voting classifier.

Ensembles give you the wisdom of the crowds.

You're about to see ensembles of decision trees that are among the most powerful algorithms available.

### Check for understanding

- What's the difference between hard voting and soft voting? 
- What type of machine learning problems do hard and soft voting apply to?
