# **DSFM Workshop**: Model Interpretation

---

## **Section 1**: Evaluation metrics and interpretable models

Creator: [Data Science for Managers - EPFL Program](https://www.dsfm.ch)  
Source:  [https://github.com/dsfm-org/code-bank.git](https://github.com/dsfm-org/code-bank.git)  
License: [MIT License](https://opensource.org/licenses/MIT). See open source [license](LICENSE) in the Code Bank repository. 

---

## **Workshop introduction**

Algorithms take an increasingly prominent place in our personal and professional environment. To ensure that the decisions that these algorithms contribute to are based on fair, trustworthy, and compliant predictions, we have to interpret prediction models. In other words, we open the black box.

Model interpretation can mean different things to different people. We will focus on methods and tools that help us better understand how a model makes a prediction and why. The benefits to interpreting models can be numerous, for example:

- **Feature selection**: understand which features are most predictive; focus your resources accordingly
- **Debugging**: understand why the model makes particulary prediction errors
- **Fairness**: detect whether the model systematically discriminates in an undesirable way
- **Regulatory compliance**: ensure that the model satisfies legal requirements
- **Trust**: increase stakeholders' trust into the model's predictions

We split this workshop into four sections - we are currently in Section 1. Each section contains one fully-worked Jupyter Notebook with different data to illustrate the concepts.

## **Learning goals**

- Review common evaluation metrics for regression: RMSE, MAE, MAPE
- Learn about interpretable models: regression coefficients, lasso, decision trees, and explainable boosting regression
- Experiment with `InterpretML`, a powerful package for explainable data science

## **Useful resources**

- Chapter 2 and 4, Molnar (2019), i.e. Molnar, Christoph. "Interpretable machine learning. A Guide for Making Black Box Models Explainable", 2019. https://christophm.github.io/interpretable-ml-book/.

- A brief overview of the [main model explainability methods](https://everdark.github.io/k9/notebooks/ml/model_explain/model_explain.nb.html)
- Microsoft's `InterpretML` GitHub repository containing many [example notebooks](https://github.com/interpretml/interpret)

---

<center><img src="https://images.unsplash.com/photo-1458086294493-3a5a041289ff?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=2250&q=80
" width=600></center>

A red brick house in Boston. [Image source](https://images.unsplash.com/photo-1458086294493-3a5a041289ff?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=2250&q=80)



In [None]:
# Ensure that all packages are installed 
# import sys
# !{sys.executable} -m pip install interpret

## **Part 1:** Data and EDA

We will use a common dataset for this notebook, called the "Boston Housing" dataset. It contains information collected by the U.S Census Service on housing the Boston area. The target variable is `MEDV`, the median value of owner-occupied homes in $1000's.

In [None]:
import pandas as pd
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

boston = load_boston()
feature_names = list(boston.feature_names)
df = pd.DataFrame(boston.data, columns=feature_names)
df["target"] = boston.target

train_cols = df.columns[0:-1]
label = df.columns[-1]
X = df[train_cols]
y = df[label]

SEED = 1
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = SEED)
X_train.head()

In [None]:
print('Original shape: {}'.format(X.shape))
print('Train shapes: {} {} '.format(X_train.shape, X_test.shape) + 'Test shapes: {} {}'.format(y_train.shape, y_test.shape))

In [None]:
from interpret import show, preserve
from interpret.data import Marginal

marginal = Marginal().explain_data(X_train, y_train, name = 'Train Data')

# Use show(.) on your local machine
# We use preserve(.) because the Google VM does not allow background server access
# show(marginal)
preserve(marginal, file_name='outputs/1-marginal.html')

## **Part 2:** Linear regression and lasso

The linear regression and lasso regression are useful and interpretable baseline models.

Do we need to standardize data? No, if we just care about making predictions. Yes, if we also care about comparing coefficient magnitudes.

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import explained_variance_score, mean_absolute_error, mean_squared_error
from math import sqrt

# Build pipeline
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('lr', LinearRegression()))
pipeline_lr = Pipeline(estimators)

lr = pipeline_lr.fit(X_train, y_train)
lr_pred = lr.predict(X_test)

lr_r2   = explained_variance_score(y_test, lr_pred)
lr_mae  = mean_absolute_error(y_test, lr_pred)
lr_rmse = sqrt(mean_squared_error(y_test, lr_pred))

print('EV:   {}'.format(round(lr_r2, 4)))
print('MAE:  {}'.format(round(lr_mae, 4)))
print('RMSE: {}'.format(round(lr_rmse, 4)))

In [None]:
from sklearn.linear_model import LassoCV, Lasso

# Build pipeline
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('lasso', Lasso(alpha = 0.03)))
pipeline_lasso = Pipeline(estimators)

lasso = pipeline_lasso.fit(X_train, y_train)
lasso_pred = lasso.predict(X_test)

lasso_r2   = explained_variance_score(y_test, lasso_pred)
lasso_mae  = mean_absolute_error(y_test, lasso_pred)
lasso_rmse = sqrt(mean_squared_error(y_test, lasso_pred))

print('EV:   {}'.format(round(lasso_r2, 4)))
print('MAE:  {}'.format(round(lasso_mae, 4)))
print('RMSE: {}'.format(round(lasso_rmse, 4)))

In [None]:
print('REGULARIZATION:'.center(22), 'NONE'.center(10), 'LASSO'.center(10))
print('=' * 50)
for (varname, lm_coef, lml1_coef) in zip(feature_names, pipeline_lr['lr'].coef_, pipeline_lasso['lasso'].coef_):
    lm_coeff  = "{0:.4f}".format(lm_coef).rjust(10)
    lml1_coef = "{0:.4f}".format(lml1_coef).rjust(10) if abs(lml1_coef) > 0.0001 else ""
    print(str(varname).center(20), lm_coeff, lml1_coef)

## **Part 3:** Decision tree

Tree-based models are another example of interpretable models. A tree makes its first split on the feature that it finds most useful in predicting the target variable and continues splitting recursively.

Do we need to standardize data? No, each tree node can find an appropriate splitting point across the entire range of values for each feature.

In [None]:
from sklearn.tree import DecisionTreeRegressor

# Fit decision tree  
dt = DecisionTreeRegressor(criterion='mse', max_depth = 4)  # just one tree, so no pipeline needed
dt.fit(X_train, y_train)
dt_pred = dt.predict(X_test)

dt_r2   = explained_variance_score(y_test, dt_pred)
dt_mae  = mean_absolute_error(y_test, dt_pred)
dt_rmse = sqrt(mean_squared_error(y_test, dt_pred))

print('EV:   {}'.format(round(dt_r2, 4)))
print('MAE:  {}'.format(round(dt_mae, 4)))
print('RMSE: {}'.format(round(dt_rmse, 4)))

In [None]:
import numpy as np

print('Feature'.center(12), '   ',  'Importance')
print('=' * 30)
for index in reversed(np.argsort(dt.feature_importances_)):
    print(str(feature_names[index]).center(12) , '   ', '{0:.4f}'.format(dt.feature_importances_[index]).center(8))

In [None]:
from graphviz import Source
from sklearn.tree import export_graphviz

Source(export_graphviz(dt, out_file=None, feature_names=feature_names))

In [None]:
# Example 
print(X_test.iloc[0, ])
print()
print('Prediction: {}'.format(dt_pred[0]))

## **Part 4:** Explainable boosting machine (EBM)

Microsoft Research has developed an open-source tool called [InterpretML](https://github.com/interpretml/interpret), which combines data science techniques like bagging, gradient boosting, and automatic interaction detection. Explainable boosting machines (EBMs) produce lossless explanations while performing as well as gradient boosting and random forest methods. For more details on the EBM algorithm watch [this video](https://www.youtube.com/watch?v=MREiHgHgl0k).

### EBM performance

In [None]:
from interpret.glassbox import ExplainableBoostingRegressor, LinearRegression, RegressionTree
from interpret import show
from interpret.perf import RegressionPerf

ebm = ExplainableBoostingRegressor()
ebm.fit(X_train, y_train)
ebm_pred = ebm.predict(X_test)

ebm_perf = RegressionPerf(ebm.predict).explain_perf(X_test, y_test, name='EBM')

# show(ebm_perf)
preserve(ebm_perf, file_name='outputs/1-ebm_perf.html')

In [None]:
ebm_r2   = explained_variance_score(y_test, ebm_pred)
ebm_mae  = mean_absolute_error(y_test, ebm_pred)
ebm_rmse = sqrt(mean_squared_error(y_test, ebm_pred))

print('EV:   {}'.format(round(ebm_r2, 4)))
print('MAE:  {}'.format(round(ebm_mae, 4)))
print('RMSE: {}'.format(round(ebm_rmse, 4)))

### Global Explanations: What the model learned overall

In [None]:
ebm_global = ebm.explain_global(name='EBM')

# show(ebm_global)
preserve(ebm_global, file_name='outputs/1-ebm_global.html')

In [None]:
from interpret.glassbox import RegressionTree

rt = RegressionTree()
rt.fit(X_train, y_train)

rt_global = rt.explain_global(name='Regression Tree')
rt_perf = RegressionPerf(rt.predict).explain_perf(X_test, y_test, name='Regression Tree')

# Unfortunately, visualizing the decision tree only works on a local machine 
# show(rt_global)


## Local Explanations: How an individual prediction was made

In [None]:
ebm_local = ebm.explain_local(X_test[:5], y_test[:5], name='EBM')

# show(ebm_local)
preserve(ebm_local, selector_key = ebm_local.selector[ebm_local.selector.columns[0]], file_name='outputs/1-ebm_local.html')

## Dashboard: look at everything at once

In [None]:
# Do everything in one shot with the InterpretML Dashboard by passing a list into show

# Unfortunately, visualizing the dashboard only works on a local machine 
# show([marginal, rt_global, rt_perf, ebm_global, ebm_perf])

## **Summary of RMSE scores**

In [None]:
# Print summary of RMSE scores 
width     = 30
width_box = 100
models    = ['Linear Regression', 'Lasso', 'Decision Tree', 'EBM']
results   = [lr_rmse, lasso_rmse, dt_rmse, ebm_rmse]

print(str('=' * width).center(width_box))
print('Summary of RMSE Scores'.center(width_box))
print(str('=' * width).center(width_box))
for i in range(len(models)):
    line = models[i].center(width - 8) + '{0:.4f}'.format(results[i])
    print(line.center(width_box))
print()

## **Bonus questions**:

1. What's generally an advantage of how RMSE calculates the prediction error?
2. Can you think of scenarios in which using the RMSE might not be appropriate?

- Answer 1: large residuals/errors are punished more
- Answer 2: when prediction errors cost more for large values (e.g. reputation costs when making valuation errors for expensive houses)