# **DSFM Illustration**: Weighted-average ensemble

Creator: [Data Science for Managers - EPFL Program](https://www.dsfm.ch)  
Source:  [https://github.com/dsfm-org/code-bank.git](https://github.com/dsfm-org/code-bank.git)  
License: [MIT License](https://opensource.org/licenses/MIT). See open source [license](LICENSE) in the Code Bank repository. 

Example adapted from: https://machinelearningmastery.com/weighted-average-ensemble-for-deep-learning-neural-networks/

-------------

## Overview

Show a simple, weighted-average ensemble of multiple random forests. 

-------------

## **Part 0**: Setup

In [None]:
# import all packages 
import numpy as np
import pandas as pd
import warnings
warnings.simplefilter("ignore")

# scikit-learn and keras (with tensorflow)
from sklearn.datasets.samples_generator import make_blobs
from sklearn.metrics                    import accuracy_score
from sklearn.linear_model               import LogisticRegression
from sklearn.tree                       import DecisionTreeClassifier
from sklearn.ensemble                   import RandomForestRegressor, RandomForestClassifier, GradientBoostingClassifier
from scipy.optimize                     import differential_evolution
from tensorflow.keras.utils             import to_categorical

# plotting 
import matplotlib.pyplot as plt
%matplotlib inline


In [None]:
# plotting constants
FIGSIZE   = (12, 8)

# modeling constants
MAXDEPTH  = 3


## **Part 1**: Generate toy data

In [None]:
# generate 2d classification dataset
X, y = make_blobs(n_samples=1100, centers=7, n_features=2, cluster_std=2, random_state=7)

# split into train and test
n_train = 100
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]
print('TrainX: {}, TestX: {}'.format(trainX.shape, testX.shape))

# plot data
df = pd.DataFrame(dict(x=X[:,0], y=X[:,1], label=y))
colors = {0:'red', 1:'blue', 2:'green', 3:'lightgreen', 4:'orange', 5:'purple', 6:'pink'}
fig, ax = plt.subplots()
grouped = df.groupby('label')
for key, group in grouped:
    group.plot(ax=ax, kind='scatter', x='x', y='y', label=key, color=colors[key], figsize=FIGSIZE)
plt.show()

## **Part 2**: Fit a single random forest model

In [None]:
# fit model on dataset
def fit_model_rf(trainX, trainy):
    
    # convert list of targets to one-hot-encoded matrix
    trainy_enc = to_categorical(trainy)
    
    # fit model
    m = RandomForestRegressor(max_depth = MAXDEPTH)
    m.fit(trainX, trainy_enc)
    
    return m

In [None]:
# make predictions and evaluate 
m = fit_model_rf(trainX, trainy)
yhat = np.argmax(m.predict(testX), axis = 1)

print('Single model accuracy: {}'.format(accuracy_score(testy, yhat)))


## **Part 3**: Fit many models and aggregate with a simple average

In [None]:
# fit all models
n_members = 20
members = [fit_model_rf(trainX, trainy) for _ in range(n_members)]

# make an ensemble prediction for multi-class classification
def ensemble_predictions(members, testX, weights = None):
        
    # make predictions
    yhats = [model.predict(testX) for model in members]
    yhats = np.array(yhats)
    
    # weighted sum across ensemble members
    if type(weights) ==np.ndarray: 
        summed = np.tensordot(yhats, weights, axes=((0),(0)))
    else:
        summed = np.sum(yhats, axis=0)
    
    # argmax across classes
    result = np.argmax(summed, axis=1)
    
    return result

def evaluate_n_members(members, n_members, testX, testy):
    
    # select a subset of members
    subset = members[:n_members]
    
    # make prediction
    yhat = ensemble_predictions(subset, testX)
    
    return accuracy_score(testy, yhat)

# evaluate different numbers of ensembles on hold out set
single_scores, ensemble_scores = list(), list()
for i in range(1, len(members)+1):
    
    # evaluate model with up to i members
    ensemble_score = evaluate_n_members(members, i, testX, testy)
    
    # evaluate the i'th model standalone
    yhat = np.argmax(members[i-1].predict(testX), axis=1)
    single_score = accuracy_score(testy, yhat)

    # summarize this step
#     print('Models <= {}: \tsingle={}, \tensemble={}'.format(i, round(single_score, 3), ensemble_score))
    ensemble_scores.append(ensemble_score)
    single_scores.append(single_score)
    
# View results 
print(' Models'.ljust(5), '\t', 'Single'.center(8), '  ', 'Ensemble'.center(11), '\n', '=' * 40)

for i, j in enumerate(zip(ensemble_scores, single_scores)):
    ensemble, single = j
    print(' 1 to {0}'.format(i+1).ljust(5), '\t',  '{0:.4f}'.format(single).center(8), '  ', '{0:.4f}'.format(ensemble).center(11))   
    
# summarize average accuracy of a single final model
print('\nAvg. accuracy single model: {} (std. {})'.format(round(np.mean(single_scores), 3), round(np.std(single_scores), 3)))
print('Avg. accuracy ensemble model: {} (std. {})'.format(round(np.mean(ensemble_scores), 3), round(np.std(ensemble_scores), 3)))

## **Part 4**: Fit many models and aggregate with a weighted average

One alternative to simple averaging is weighted averaging, where each model is assigned a different weight. Each weight represents the "confidence" we have in those model's predictions. However, it's difficult a priori to evaluate confidence in individual models - high accuracy might be achieved just by being lucky. 

A more principled approach is to learn weights. Instead of exhaustively searching a space of possible weight combinations, we use the available information to make the next step in the search towards weights with lower error. 

In [None]:
# evaluate a specific number of members in an ensemble
def evaluate_ensemble(members, weights, testX, testy):
    
    # make prediction
    yhat = ensemble_predictions(members, testX, weights)
    
    # calculate accuracy
    acc = accuracy_score(testy, yhat)
    
    return acc

# normalize a vector to have unit norm
def normalize(weights):
    
    # calculate l1 vector norm
    result = np.linalg.norm(weights, 1)

    # check for a vector of all zeros
    if result == 0.0:
        return weights
    
    # return normalized vector (unit norm)
    return weights / result

# loss function for optimization process, designed to be minimized
def loss_function(weights, members, testX, testy):

    # normalize weights
    normalized = normalize(weights)
    
    # calculate error rate
    error = 1.0 - evaluate_ensemble(members, normalized, testX, testy)
    
    return error

# fit all models
n_members = 5
members = [fit_model_rf(trainX, trainy) for _ in range(n_members)]

# define bounds on each weight
bound_w = [(0.0, 1.0) for _ in range(n_members)]

# arguments to the loss function
search_arg = (members, testX, testy)

# global optimization of ensemble weights
result = differential_evolution(loss_function, bound_w, search_arg, maxiter=1000, tol=1e-7, workers=-1)

# get the chosen weights
weights = normalize(result['x'])

# View results 
print(' Model'.rjust(5), '   ', 'Weight'.center(8), '\n', '=' * 20)
for model, weight in enumerate(weights):
    print( '{0}'.format(model).center(5), '   ',  '{0:.4f}'.format(weight).center(8))   

# evaluate chosen weights
score = evaluate_ensemble(members, weights, testX, testy)
print('\nOptimized Weights Score: {}'.format(score))

Depending on the initializations, optimizing the weights can increase the accuracy for 1-5%. The final ensemble model assigns the most weight to _______ (find the highest-weighted model above). 

Note that for simplicity, we have treated the test set as though it were a validation set. This makes the illustration simpler. In practice, however, we would need to choose and tune the weights on a validation set and then compare models on a separate test set. 

## **Part 5**: Fit different models and aggregate with weighted average 

Instead of re-fitting the same random forest model, we can aggregate the predictions of different types of models. To do so, we also fit a decision tree classifier. The final prediction will then be a weighted average aggregation across two decision tree and two random forest models. 

In [None]:
# fit decision tree model on dataset
def fit_model_dt(trainX, trainy):
    
    # convert list of targets to one-hot-encoded matrix
    trainy_enc = to_categorical(trainy)
    
    # fit model
    m = DecisionTreeClassifier()
    m.fit(trainX, trainy_enc)
    
    return m

# fit all models: 2 decision tree models and 2 random forest models
n_members = 4
members = [fit_model_dt(trainX, trainy), fit_model_dt(trainX, trainy), fit_model_rf(trainX, trainy), fit_model_rf(trainX, trainy)]

# define bounds on each weight
bound_w = [(0.0, 1.0) for _ in range(n_members)]

# arguments to the loss function
search_arg = (members, testX, testy)

# global optimization of ensemble weights
result = differential_evolution(loss_function, bound_w, search_arg, maxiter=1000, tol=1e-7, workers=-1)

# get the chosen weights
weights = normalize(result['x'])

# View results 
print(' Model'.rjust(5), '   ', 'Weight'.center(8), '\n', '=' * 20)
for model, weight in enumerate(weights):
    print( '{0}'.format(model).center(5), '   ',  '{0:.4f}'.format(weight).center(8))   

# evaluate chosen weights
score = evaluate_ensemble(members, weights, testX, testy)
print('\nOptimized Weights Score: {}'.format(score))


Depending on the initializations, using a different type of model can increase the accuracy for another 1-3%. The final ensemble model assigns the most weight to _______ (find the highest-weighted model above). 

## **Bonus**: Further Reading



- Ensemble tutorial in sklearn: https://sebastianraschka.com/Articles/2014_ensemble_classifier.html
- Finding ensemble weights for aggregating different models: https://www.kaggle.com/hsperr/finding-ensamble-weights