<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Feature-Importances-in-scikit-learn" data-toc-modified-id="Feature-Importances-in-scikit-learn-1">Feature Importances in scikit-learn</a></span></li><li><span><a href="#Learning-Outcomes" data-toc-modified-id="Learning-Outcomes-2">Learning Outcomes</a></span></li><li><span><a href="#What-is-the-primary-goal-of-machine-learning?" data-toc-modified-id="What-is-the-primary-goal-of-machine-learning?-3">What is the primary goal of machine learning?</a></span></li><li><span><a href="#Prediction-is-sometimes-not-enough" data-toc-modified-id="Prediction-is-sometimes-not-enough-4">Prediction is sometimes not enough</a></span></li><li><span><a href="#Permutation-feature-importance" data-toc-modified-id="Permutation-feature-importance-5">Permutation feature importance</a></span></li><li><span><a href="#Automatically-select-features-based-on-feature-importance" data-toc-modified-id="Automatically-select-features-based-on-feature-importance-6">Automatically select features based on feature importance</a></span></li><li><span><a href="#Sources-of-Inspiration" data-toc-modified-id="Sources-of-Inspiration-7">Sources of Inspiration</a></span></li><li><span><a href="#Bonus-Material" data-toc-modified-id="Bonus-Material-8">Bonus Material</a></span></li><li><span><a href="#Feature-Importances-in-Pipeline" data-toc-modified-id="Feature-Importances-in-Pipeline-9">Feature Importances in Pipeline</a></span></li><li><span><a href="#Permutation-Importance-with-Multicollinear-or-Correlated-Features" data-toc-modified-id="Permutation-Importance-with-Multicollinear-or-Correlated-Features-10">Permutation Importance with Multicollinear or Correlated Features</a></span></li><li><span><a href="#Permutation-Importance-vs-Random-Forest-Feature-Importance-(MDI)" data-toc-modified-id="Permutation-Importance-vs-Random-Forest-Feature-Importance-(MDI)-11">Permutation Importance vs Random Forest Feature Importance (MDI)</a></span></li></ul></div>

<center><h2>Feature Importances in scikit-learn</h2></center>

<center><h2>Learning Outcomes</h2></center>

__By the end of this session, you should be able to__:

- Explain why features importance is useful in your own words.
- Get permutation feature importances for __any__ estimator in scikit-learn.
- Automatically select most important features based on importance.

<center><h2>What is the primary goal of machine learning?</h2></center>

Learn a function from data that can generalize to predict unseen data.

<center><h2>Prediction is sometimes not enough</h2></center>

Predictive performance is often the main goal of developing machine learning models. 

Yet summarising performance with an evaluation metric is often insufficient:   
It assumes that the evaluation metric and test dataset perfectly reflect the target domain, which is rarely true. 

In certain domains, a model needs a certain level of interpretability before it can be deployed. A model that is exhibiting performance issues needs to be debugged for one to understand the model’s underlying issue. 

The [sklearn.inspection](https://scikit-learn.org/stable/inspection.html) module provides tools to help understand the predictions from a model and what affects them. 

This can be used to:

- Evaluate assumptions and biases of a model
- Design a better model
- Diagnose issues with model performance



<center><h2>Permutation feature importance</h2></center>

> Permutation feature importance is a model inspection technique that can be used for __any__ fitted estimator when the data is tabular. 


> The permutation feature importance is defined to be the decrease in a model score when a single feature value is randomly shuffled. This procedure breaks the relationship between the feature and the target, thus the drop in the model score is indicative of how much the model depends on the feature. 

> This technique benefits from being model agnostic and can be calculated many times with different permutations of the feature.

Source: https://scikit-learn.org/stable/modules/permutation_importance.html

Now try it on your own with student activity. 

<center><h2>Automatically select features based on feature importance</h2></center>

In [9]:
reset -fs

In [10]:
from sklearn.datasets        import load_diabetes
from sklearn.model_selection import train_test_split

diabetes = load_diabetes()
X_train, X_val, y_train, y_val = train_test_split(diabetes.data, diabetes.target, random_state=42)

In [11]:
from sklearn.feature_selection import SelectFromModel
from sklearn.ensemble          import RandomForestRegressor

fs = SelectFromModel(RandomForestRegressor(), 
                     max_features=3)

fs.fit_transform(X_train, y_train) # Note only 3 features are outputed

array([[ 4.22955892e-02,  4.94153205e-02,  5.22799998e-02],
       [-5.03962492e-02,  1.07944122e-01,  5.80391277e-02],
       [ 5.52293341e-02, -5.67061055e-03,  5.56835477e-02],
       [ 1.42724753e-02,  1.21513083e-03,  7.49683360e-02],
       [-1.15950145e-02, -3.66564468e-02,  2.26920226e-02],
       [-1.05172024e-02, -3.66564468e-02, -1.81182673e-02],
       [-9.43939036e-03,  1.49866136e-02, -3.32487872e-02],
       [-4.05032999e-03, -4.00993175e-02, -5.14005353e-02],
       [ 1.53502873e-02, -3.32135761e-02,  4.50661683e-02],
       [ 2.39727839e-02,  8.10087222e-03, -1.59982678e-02],
       [-3.74625043e-02, -6.07565417e-02, -3.07512099e-02],
       [-3.20734439e-02, -2.28849640e-02, -1.26097386e-01],
       [ 1.85837236e-02,  3.90867085e-02,  1.63049528e-02],
       [-1.59062628e-02,  1.72818607e-02, -4.68794828e-02],
       [-6.20595414e-03,  6.31868033e-02,  5.94238004e-02],
       [ 8.88341490e-03, -5.04279296e-02,  1.48227108e-02],
       [-3.74625043e-02, -4.69850589e-02

In [12]:
# Find which columns that are selected
fs.get_support()

array([False, False,  True,  True, False, False, False, False,  True,
       False])

In [13]:
# Let's do the feature selection based on importance with a linear model and automatic CV
import numpy as np
from sklearn.linear_model import LassoCV

# Conduct automatic CV to find best regularization term
lasso = LassoCV().fit(X_train, y_train) 

# Dynamically define treshold for specific dataset and model
importance = np.abs(lasso.coef_)
threshold = np.sort(importance)[-3] + 0.01

# Select best features 
fs = SelectFromModel(lasso, 
                     threshold=threshold)
fs.fit_transform(X_train, y_train)
fs # Note only 2 features are outputed

SelectFromModel(estimator=LassoCV(), threshold=536.6498801937067)

In [14]:
# Find which columns that are selected
fs.get_support()

array([False, False, False, False,  True, False, False, False,  True,
       False])

 Source: https://scikit-learn.org/stable/auto_examples/feature_selection/plot_select_from_model_diabetes.html

<center><h2>Sources of Inspiration</h2></center>

- https://machinelearningmastery.com/calculate-feature-importance-with-python/
- https://christophm.github.io/interpretable-ml-book/feature-importance.html#software-and-alternatives-3

<center><h2>Bonus Material</h2></center>

[Common pitfalls in interpretation of coefficients of linear models](https://scikit-learn.org/stable/auto_examples/inspection/plot_linear_model_coefficient_interpretation.html)

<center><h2>Feature Importances in Pipeline</h2></center>


1. [Blogpost]( https://towardsdatascience.com/extracting-plotting-feature-names-importance-from-scikit-learn-pipelines-eb5bfa6a31f4)
2. https://www.kaggle.com/kylegilde/extracting-scikit-feature-names-importances
3. https://www.kaggle.com/kylegilde/feature-importance

<center><h2>Permutation Importance with Multicollinear or Correlated Features</h2></center>

> [In data sets with] multicollinear features, the permutation importance will show that none of the features are important.

> One approach to handling multicollinearity is by performing hierarchical clustering on the features’ Spearman rank-order correlations, picking a threshold, and keeping a single feature from each cluster.

https://scikit-learn.org/stable/auto_examples/inspection/plot_permutation_importance_multicollinear.html

<center><h2>Permutation Importance vs Random Forest Feature Importance (MDI)</h2></center>

sklearn.ensemble.RandomForestClassifier.feature_importances_

> The impurity-based feature importances.

> The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.

> Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance as an alternative.

<br>
<br> 
<br>

----