# Model Tuning, Interpretation and Deployment

## Overview

This tutorial covers essential aspects of the machine learning workflow after initial model building, focusing on model interpretation techniques, tuning approaches, and deployment considerations. We'll explore how to make machine learning models more interpretable, optimize their performance through tuning, and prepare them for real-world deployment.

## Learning Objectives

- Understand the importance of model interpretation in machine learning
    - Learn how interpretability benefits analytics teams and stakeholders
    - Recognize how interpretability bridges technical and business understanding
- Master techniques for model tuning and optimization
- Learn best practices for model deployment
- Develop skills to explain model decisions to non-technical stakeholders

### Tasks to complete

- Implement model interpretation techniques
- Perform model tuning exercises
- Practice model deployment steps
- Create interpretability visualizations

## Prerequisites

- A working Python environment and familiarity with Python
- Basic understanding of machine learning concepts
- Familiarity with pandas and numpy libraries
- Knowledge of basic statistical concepts



## Get Started

To start, we install required packages and import the necessary libraries.

### Install packages

If you don't already have skater installed, this step can take a few minutes (five minutes on my machine).

In [None]:
# TODO

### Import libraries

In [None]:
import warnings

import joblib
import numpy as np
import pandas as pd
import scipy
from skater.core.explanations import Interpretation
from skater.core.local_interpretation.lime.lime_tabular import LimeTabularExplainer
from skater.model import InMemoryModel
from sklearn import linear_model, metrics
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, train_test_split
from sklearn.svm import SVC

# Model Tunning, Interpretation and Deployment

Adapted from Dipanjan Sarkar et al. 2018. [Practical Machine Learning with Python](https://link.springer.com/book/10.1007/978-1-4842-3207-1).

In this tutorial, we will learn:
* How to tune the hyperparameters of Machine Learning algorithms
* How to interpret models using open source frameworks
* How to persist and deploy the developed models 

## Model tuning

Model tuning is one of the
most important concepts of Machine Learning and it does require some knowledge of the underlying math
and logic of the algorithm in focus. In this tutorial, we will delve deeper into the models that we are targeting, look at the knobs
that can be tuned and set to extract the best performance out of any given models. This process of iterative
experimentation with dataset, model parameters, and features is the very core of the model tuning process.

### Build and Evaluate Default Model

We will use Wisconsin Breast Cancer Dataset as an example. We first split the breast cancer datast variables X and y into train and test datasets and build an SVM model with default parameters. Then we will evaluate its performance on the test dataset.

In [None]:
# Load Wisconsin Breast Cancer Dataset

# load data
bc = load_breast_cancer()

X = bc.data
y = bc.target

print(X.shape, bc.feature_names)

In [None]:
# Utility functions for model evaluation

# Get model performance evaluation matrics
def get_metrics(true_labels, predicted_labels):
    print(
        "Accuracy:", np.round(metrics.accuracy_score(true_labels, predicted_labels), 4)
    )
    print(
        "Precision:",
        np.round(
            metrics.precision_score(true_labels, predicted_labels, average="weighted"),
            4,
        ),
    )
    print(
        "Recall:",
        np.round(
            metrics.recall_score(true_labels, predicted_labels, average="weighted"), 4
        ),
    )
    print(
        "F1 Score:",
        np.round(
            metrics.f1_score(true_labels, predicted_labels, average="weighted"), 4
        ),
    )


# Show the classification report
def display_classification_report(true_labels, predicted_labels, classes=[1, 0]):
    # Build a text report showing the main classification metrics
    # The reported averages include macro average (averaging the unweighted
    # mean per label), weighted average (averaging the support-weighted mean
    # per label), and sample average (only for multilabel classification).
    # Micro average (averaging the total true positives, false negatives and
    # false positives) is only shown for multi-label or multi-class
    # with a subset of classes, because it corresponds to accuracy
    # otherwise and would be the same for all metrics.
    report = metrics.classification_report(
        y_true=true_labels, y_pred=predicted_labels, labels=classes
    )
    print(report)


# Show the confusion matrix
def display_confusion_matrix(true_labels, predicted_labels, classes=[1, 0]):
    total_classes = len(classes)
    level_labels = [total_classes * [0], list(range(total_classes))]
    # Compute confusion matrix to evaluate the accuracy of a classification.
    cm = metrics.confusion_matrix(
        y_true=true_labels, y_pred=predicted_labels, labels=classes
    )
    cm_frame = pd.DataFrame(
        data=cm,
        columns=pd.MultiIndex(levels=[["Predicted:"], classes], codes=level_labels),
        index=pd.MultiIndex(levels=[["Actual:"], classes], codes=level_labels),
    )
    print(cm_frame)


# Show the model performace matrics
def display_model_performance_metrics(true_labels, predicted_labels, classes=[1, 0]):
    print("Model Performance metrics:")
    print("-" * 30)
    get_metrics(true_labels=true_labels, predicted_labels=predicted_labels)
    print("\nModel Classification report:")
    print("-" * 30)
    display_classification_report(
        true_labels=true_labels, predicted_labels=predicted_labels, classes=classes
    )
    print("\nPrediction Confusion Matrix:")
    print("-" * 30)
    display_confusion_matrix(
        true_labels=true_labels, predicted_labels=predicted_labels, classes=classes
    )

In [None]:
# prepare datasets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# build default SVM model
# C-Support Vector Classification
def_svc = SVC(random_state=42)
def_svc.fit(X_train, y_train)

# predict and evaluate performance
def_y_pred = def_svc.predict(X_test)
print("Default Model Stats:")
display_model_performance_metrics(
    true_labels=y_test, predicted_labels=def_y_pred, classes=[0, 1]
)

### Tune Model with Grid Search

Since we have chosen a SVM model, we specify some hyperparameters specific
to it, which includes the Regularization parameter C (deals with the margin parameter in SVM), the kernel function (used
for transforming data into a higher dimensional feature space) and gamma (determines the influence a
single training data point has). There are a lot of other hyperparameters to tune, which you can check out [here](http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html) for further details.


We will build a grid by supplying some pre-set values. The next choice is selecting the score or metric we want
to maximize here we have chosen to maximize accuracy of the model. Once that is done, we will be using
five-fold cross-validation to build multiple models over this grid and evaluate them to get the best model.

(This step may take a few minutes to complete.)

In [None]:
# setting the parameter grid
grid_parameters = {
    "kernel": ["linear", "rbf"],
    "gamma": [1e-3, 1e-4],
    "C": [1, 10, 50, 100],
}

# perform hyperparameter tuning
print("# Tuning hyper-parameters for accuracy\n")

# Exhaustive search over specified parameter values for an estimator.
clf = GridSearchCV(SVC(random_state=42), grid_parameters, cv=5, scoring="accuracy")
clf.fit(X_train, y_train)

# view accuracy scores for all the models
print("Grid scores for all the models based on CV:\n")
means = clf.cv_results_["mean_test_score"]
stds = clf.cv_results_["std_test_score"]
for mean, std, params in zip(means, stds, clf.cv_results_["params"]):
    print("%0.5f (+/-%0.05f) for %r" % (mean, std * 2, params))

# check out best model performance
print("\nBest parameters set found on development set:", clf.best_params_)
print("Best model validation accuracy:", clf.best_score_)

We can see the best model parameters were obtained
based on cross-validation accuracy and we get a pretty awesome validation accuracy of 96%.

### Evaluate Grid Search Tuned Model

Let’s take this
optimized and tuned model and put it to the test on our test data!

In [None]:
gs_best = clf.best_estimator_
tuned_y_pred = gs_best.predict(X_test)

print("\n\nTuned Model Stats:")
display_model_performance_metrics(
    true_labels=y_test, predicted_labels=tuned_y_pred, classes=[0, 1]
)

Our model gives an overall F1 Score and model
accuracy of 97% on the test dataset too. This should give you a clear indication of
the power of hyperparameter tuning! This scheme of things can be extended for different models and their
respective hyperparameters. We can also play around with the evaluation measure we want to optimize.
The scikit-learn framework provides us with different values that we can optimize. Some of them are
adjusted_rand_score, average_precision, f1, average_recall, and so on.

### Tune Model with Randomized Search

Grid search suffers from some major shortcomings, the most important one being
the limitation of manually specifying the grid. This brings a human element into a process that could benefit
from a purely automatic mechanism.

Randomized parameter search is a modification to the traditional grid search. It takes input for
grid elements as in normal grid search but it can also take distributions as input. For example consider
the parameter gamma whose values we supplied explicitly in the last section instead we can supply a
distribution from which to sample gamma. 

The efficacy of randomized parameter search is based on the
proven (empirically and mathematically) result that the hyperparameter optimization functions normally
have low dimensionality and the effect of certain parameters are more than others. 

We control the number
of times we want to do the random parameter sampling by specifying the number of iterations we want to
run (n_iter). Normally a higher number of iterations mean a more granular parameter search but higher
computation time.

(This step may take a few minutes to complete.)

In [None]:
# replace the gamma and C values with a distribution (exponential distribution)
param_grid = {
    "C": scipy.stats.expon(scale=10),
    "gamma": scipy.stats.expon(scale=0.1),
    "kernel": ["rbf", "linear"],
}

# Randomized search on hyperparameters.
random_search = RandomizedSearchCV(
    SVC(random_state=42), param_distributions=param_grid, n_iter=50, cv=5
)
random_search.fit(X_train, y_train)
print("Grid scores for all the models based on CV:\n")
means = random_search.cv_results_["mean_test_score"]
stds = random_search.cv_results_["std_test_score"]
for mean, std, params in zip(means, stds, random_search.cv_results_["params"]):
    print("%0.5f (+/-%0.05f) for %r" % (mean, std * 2, params))
print("\nBest parameters set found on development set:", random_search.best_params_)
print("Best model validation accuracy:", random_search.best_score_)

## Evaluate Randomized Search Tuned Model

Get the best model, predict and evaluate performance

In [None]:
rs_best = random_search.best_estimator_
rs_y_pred = rs_best.predict(X_test)
get_metrics(true_labels=y_test, predicted_labels=rs_y_pred)

We are getting the values of parameter C and gamma from an exponential distribution
and we are controlling the number of iterations of model search by the parameter n_iter. While the overall
model performance is similar to grid search, the intent is to be aware of the different strategies in model tuning.

## Model Interpretation

The ability to interpret Machine Learning models in an easy to understand way will benefit not only analytics teams but also key stakeholders in trying to explain how models really work.

Some Machine Learning models use interpretable algorithms, for example a decision tree will give you
the importance of all the variables as an output. Unfortunately, this
can’t be said for a lot of models, especially for the ones who have no notion of variable importance.

The lack of understanding of the complex nature
of Machine Learned decision policies makes predictive models to be still viewed as black boxes. Model
interpretations can help a data scientist and an end user in a variety of ways. 
* It will help bridge the gap that
often exists between the technology teams and the business. For example, it can help identify the reason
why a particular prediction is being made and it can be verified using the domain knowledge of the end
user by leveraging that easy to understand interpretation. 
* It can also help the data scientists understand the
interactions among features that can lead to better feature engineering and enhanced performance. 
* It can
also help in model comparisons and explaining the results better to the business stakeholders.

### Understanding Skater

Skater is an open sourced Python library designed to
demystify the inner workings of of predictive models. Skater defines the scope of interpretating models
1.Globally (on the basis of a complete dataset) and 2. Locally (on the basis of an individual prediction). For
global explanations, Skater makes use of model-agnostic variable importance and partial dependence plots
to judge the bias of a model and understand its general behavior. To validate a model’s decision policies
for a single prediction, on the other hand, the library currently embraces a novel technique called local
interpretable model agnostic explanation (LIME, Ribeiro et al., 2016), which uses local surrogate models to
assess performance.

In [None]:
# we will use a logistic regression model to do classification.

warnings.filterwarnings("ignore")

logistic = linear_model.LogisticRegression()
logistic.fit(X_train, y_train)

In [None]:
# create a skater interpretation and in memory model object.

interpreter = Interpretation(X_test, feature_names=bc.feature_names)
# Probability estimates.
model = InMemoryModel(
    logistic.predict_proba, examples=X_train, target_names=logistic.classes_
)

### Visualize Feature Importance

The skater framework’s feature importance implementation is based on an information
theory criterion, where it measures the entropy in the change of predictions, given a perturbation of a
specific feature. The idea is that the more a model’s decision making criteria depends on a feature, the more
the predictions will change as a function of perturbing the feature.

In [None]:
plots = interpreter.feature_importance.plot_feature_importance(
    model, ascending=True, progressbar=False
)

The most important feature in our model is worst perimeter, followed by mean area and worst area. Let’s now consider the most important feature,
worst perimeter, and think about ways it might influence the model decision making process during
predictions.

### One-way partial dependence plot

Partial dependence plots are an excellent tool to leverage to visualize this. In general,
partial dependence plots help describe the marginal impact of a specific feature on model prediction
by holding the other features in the model constant. The derivative of partial dependence, describes
the impact of a feature.

In [None]:
import matplotlib.axes

# Computes partial_dependence of a set of variables. Essentially approximates
# the partial partial_dependence of the predict_fn with respect to the variables
# passed.
p = interpreter.partial_dependence.plot_partial_dependence(
    ["worst perimeter"],
    model,
    grid_resolution=50,
    with_variance=True,
    figsize=(6, 4),
    progressbar=False,
)

We can see that the worst perimeter feature has a strong influence on the
model decision making process. Based on the plot, if the worst perimeter value decreases from 110, the model
is more prone to classify the data point as benign (label 1) which indicates no cancer. 

### Explaining Predictions

Let’s try to interpret some actual predictions now. We will predict two data points, one not
having cancer (label 1) and one having cancer (label 0), and try to interpret the prediction making process.

In [None]:
# Explains predictions on tabular (i.e. matrix) data.
# For numerical features, perturb them by sampling from a Normal(0,1) and
# doing the inverse operation of mean-centering and scaling, according to the
# means and stds in the training data. For categorical features, perturb by
# sampling according to the training distribution, and making a binary
# feature that is 1 when the value is the same as the instance being
# explained.
exp = LimeTabularExplainer(
    X_train,
    feature_names=bc.feature_names,
    discretize_continuous=True,
    class_names=["0", "1"],
)

In [None]:
# Generates explanations for a prediction.
# First, we generate neighborhood data by randomly perturbing features
# from the instance (see __data_inverse). We then learn locally weighted
# linear models on this neighborhood data to explain each of the classes
# in an interpretable way (see lime_base.py).
exp.explain_instance(X_test[0], logistic.predict_proba).show_in_notebook()

We can see the features that were primarily responsible for the model
to predict the data point as label 1, i.e. having no cancer. We can also see the feature that was the most
influential in this decision was worst perimeter!

Let’s run a similar interpretation on a data point with malignant
cancer.

In [None]:
exp.explain_instance(X_test[1], logistic.predict_proba).show_in_notebook()

We can also see the features that were primarily responsible
for the model to predict the data point as label 0, i.e. having malignant cancer. The feature worst perimeter
was again the most influential one and you can notice the stark difference in its value as compared to the
previous data point.

## Model Deployment

The final piece of the Machine Learning modeling puzzle is that of
deploying the model in production so that we actually start using it.

### Persist model to disk

For persisting our model to disk, we can leverage libraries like pickle or joblib, which is also available
with scikit-learn. This allows us to deploy and use the model in the future, without having to retrain it
each time we want to use it.

In [None]:
joblib.dump(logistic, "lr_model.pkl")

### Load model from disk

So whenever we will load
this object in memory again we will get the logistic regression model object.

In [None]:
lr = joblib.load("lr_model.pkl")
lr

### Predict with loaded model

We can now use this lr object, which is our model loaded from the disk, and make predictions.

In [None]:
print("True value: ", y_test[10:11])
print("Predicted value: ", lr.predict(X_test[10:11]))

## Conclusion

Through this tutorial, you have gained practical experience in making machine learning models more transparent and interpretable, optimizing their performance through proper tuning, and preparing them for real-world deployment. These skills are essential for bridging the gap between technical implementation and business understanding in machine learning projects.

## Clean up

Remember to shut down your Jupyter Notebook environment and delete any unnecessary files or resources once you've completed the tutorial.