# Model Tuning, Interpretation and Deployment

## Overview

This tutorial delves into the critical stages of the machine learning workflow that follow the initial model development. It focuses on three key areas: **model interpretation**, **performance tuning**, and **deployment strategies**. These steps are essential for transforming a raw machine learning model into a robust, interpretable, and production-ready solution.

We will explore:

- **Model Interpretation**: Techniques to make machine learning models more transparent and understandable, enabling stakeholders to trust and effectively utilize their predictions.
- **Model Tuning**: Methods to optimize model performance by fine-tuning hyperparameters, addressing overfitting or underfitting, and improving generalization to unseen data.

- **Deployment Considerations**: Best practices for preparing and deploying machine learning models into real-world environments, ensuring scalability, reliability, and maintainability.

By the end of this tutorial, you will gain practical insights into how to refine your models, interpret their decisions, and successfully deploy them to solve real-world problems. Whether you're working on a small-scale project or a large-scale enterprise application, these skills are crucial for delivering impactful machine learning solutions.

## Learning Objectives

- Understand the importance of model interpretation in machine learning
  - Learn how interpretability benefits analytics teams and stakeholders
  - Recognize how interpretability bridges technical and business understanding
- Master techniques for model tuning and optimization
- Learn best practices for model deployment
- Develop skills to explain model decisions to non-technical stakeholders

### Tasks to complete

- Implement model interpretation techniques
- Perform model tuning exercises
- Practice model deployment steps
- Create interpretability visualizations

## Prerequisites

- A working Python environment and familiarity with Python
- Basic understanding of machine learning concepts
- Familiarity with pandas and numpy libraries
- Knowledge of basic statistical concepts


## Get Started

- Please select kernel "conda_tensorflow2_p310" from SageMaker notebook instance.


### Import libraries


In [None]:
# Install the 'lime' and 'shap' Python packages using pip.
# These packages are commonly used for model interpretability and explainability in machine learning.
#%pip install setuptools==57.5.0  # Downgrade setuptools
%pip install git+https://github.com/marcotcr/lime.git
%pip install shap

In [None]:
# Import the warnings module to handle warnings
import warnings

# Import joblib for efficient saving and loading of Python objects
import joblib

# Import NumPy for numerical operations, especially for handling arrays
import numpy as np

# Import pandas for data manipulation and analysis, particularly for working with DataFrames
import pandas as pd

# Import SciPy for scientific and technical computing, including statistical functions
import scipy

# Import SHAP library for explaining the output of machine learning models
import shap  # Import SHAP for explanation

# Import LimeTabularExplainer from the lime library for explaining tabular data predictions
from lime.lime_tabular import LimeTabularExplainer  # Import LIME for explanation

# Import linear_model module from scikit-learn for linear models like Logistic Regression
from sklearn import linear_model, metrics

# Import the load_breast_cancer dataset from scikit-learn for demonstration purposes
from sklearn.datasets import load_breast_cancer

# Import GridSearchCV and RandomizedSearchCV for hyperparameter tuning
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, train_test_split

# Import SVC (Support Vector Classifier) from scikit-learn for classification tasks
from sklearn.svm import SVC

# Model Tunning, Interpretation and Deployment

Adapted from Dipanjan Sarkar et al. 2018. [Practical Machine Learning with Python](https://link.springer.com/book/10.1007/978-1-4842-3207-1).


In this tutorial, we will learn:
- **Hyperparameter Tuning**
  - Techniques for optimizing ML algorithm parameters
  - Best practices for efficient parameter search

- **Model Interpretation**
  - Using open-source frameworks (e.g., SHAP, LIME)
  - Understanding model decisions and feature importance

- **Model Operations**
  - Persistence strategies for trained models
  - Deployment approaches for production environments

## Model Tuning

Model tuning is a cornerstone of machine learning, requiring both a strong grasp of mathematical foundations and an intuitive understanding of algorithmic behavior. In this tutorial, we’ll dissect target models to pinpoint tunable parameters (the "knobs" of the system), then methodically adjust them to maximize performance. This iterative process—spanning dataset experimentation, hyperparameter optimization, and feature engineering—forms the essence of model refinement. By rigorously testing and calibrating these components, we unlock a model’s full potential, ensuring robust results tailored to specific use cases.  

### Build and Evaluate Default Model

In this section, we will use the Wisconsin Breast Cancer Dataset as an example to demonstrate the process of building and evaluating a machine learning model. Here's how we'll proceed:

- **Dataset Preparation**:
    - We will start by splitting the dataset into features (X) and target labels (y).
    - The data will then be divided into training and testing sets to ensure proper evaluation.
- **Model Building**:
    - Using the training data, we will construct a Support Vector Machine (SVM) model with its default parameters.

- **Model Evaluation**:
    - The trained model will be applied to the test dataset to assess its performance.
    - We will analyze key metrics to understand how well the default model performs on unseen data.

This step-by-step approach will provide a baseline understanding of the model's capabilities before we move on to tuning and optimization.

In [None]:
# Load Wisconsin Breast Cancer Dataset
from sklearn.datasets import load_breast_cancer

# load data
bc = load_breast_cancer()

# Extract the feature data (input features) from the dataset
X = bc.data

# Extract the target data (labels or output) from the dataset
y = bc.target

# Print the shape of the feature data (number of samples and features) and the names of the features
print(X.shape, bc.feature_names)

In [None]:
# Utility functions for model evaluation

# Get model performance evaluation matrics
def get_metrics(true_labels, predicted_labels):
    # Print the accuracy score, rounded to 4 decimal places, by comparing true labels to predicted labels.
    print(
        "Accuracy:", np.round(metrics.accuracy_score(true_labels, predicted_labels), 4)
    )
    # Print the precision score, rounded to 4 decimal places, calculated with weighted averaging for multi-class, by comparing true labels to predicted labels.
    print(
        "Precision:",
        np.round(
            metrics.precision_score(true_labels, predicted_labels, average="weighted"),
            4,
        ),
    )
    # Print the recall score, rounded to 4 decimal places, calculated with weighted averaging for multi-class, by comparing true labels to predicted labels.
    print(
        "Recall:",
        np.round(
            metrics.recall_score(true_labels, predicted_labels, average="weighted"), 4
        ),
    )
    # Print the F1 score, rounded to 4 decimal places, calculated with weighted averaging for multi-class, by comparing true labels to predicted labels.
    print(
        "F1 Score:",
        np.round(
            metrics.f1_score(true_labels, predicted_labels, average="weighted"), 4
        ),
    )


# Show the classification report
def display_classification_report(true_labels, predicted_labels, classes=[1, 0]):
    # Build a text report showing the main classification metrics
    # This line calculates and stores the classification report as a string.
    # It uses the `classification_report` function from the `metrics` module (likely scikit-learn).
    # `y_true=true_labels`:  Specifies the true class labels.
    # `y_pred=predicted_labels`: Specifies the predicted class labels from the model.
    # `labels=classes`:  Specifies the classes to be included in the report, here defaulting to [1, 0].
    report = metrics.classification_report(
        y_true=true_labels, y_pred=predicted_labels, labels=classes
    )
    
    # Print the classification report to the console.
    # This will display the precision, recall, f1-score, and support for each class,
    # as well as overall accuracy and macro/weighted averages.
    print(report)


# Show the confusion matrix
def display_confusion_matrix(true_labels, predicted_labels, classes=[1, 0]):
    # Determine the total number of classes from the classes list.
    total_classes = len(classes)
    
    # Define levels for MultiIndex labels in the DataFrame, used for formatting the confusion matrix.
    level_labels = [total_classes * [0], list(range(total_classes))]
    
    # Compute the confusion matrix using scikit-learn's metrics.confusion_matrix function.
    cm = metrics.confusion_matrix(
        y_true=true_labels, y_pred=predicted_labels, labels=classes
    )
    # Create a Pandas DataFrame to display the confusion matrix in a structured format.
    cm_frame = pd.DataFrame(
        data=cm,
        # Set column names for the DataFrame using MultiIndex to represent 'Predicted' and class labels.
        columns=pd.MultiIndex(levels=[["Predicted:"], classes], codes=level_labels),
        
        # Set index names for the DataFrame using MultiIndex to represent 'Actual' and class labels.
        index=pd.MultiIndex(levels=[["Actual:"], classes], codes=level_labels),
    )
    # Print the confusion matrix DataFrame to the console.
    print(cm_frame)


# Show the model performace matrics
def display_model_performance_metrics(true_labels, predicted_labels, classes=[1, 0]):
    # Prints a header for model performance metrics
    print("Model Performance metrics:")
    
    # Prints a separator line for visual clarity
    print("-" * 30)
    
    # Calls the function to calculate and print performance metrics
    get_metrics(true_labels=true_labels, predicted_labels=predicted_labels)
    
    # Prints a newline and header for the classification report
    print("\nModel Classification report:")
    
    # Prints a separator line for visual clarity
    print("-" * 30)
    
    # Calls the function to display the classification report
    display_classification_report(
        true_labels=true_labels, predicted_labels=predicted_labels, classes=classes
    )
    # Prints a newline and header for the confusion matrix
    print("\nPrediction Confusion Matrix:")
    
    # Prints a separator line for visual clarity
    print("-" * 30)
    
    # Calls the function to display the confusion matrix
    display_confusion_matrix(
        true_labels=true_labels, predicted_labels=predicted_labels, classes=classes
    )

In [None]:
# Prepare datasets for training and testing, splitting the data into training (70%) and test (30%) sets.
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Build a default Support Vector Machine (SVM) model.
# Initialize a C-Support Vector Classification model with a fixed random state for reproducibility.
def_svc = SVC(random_state=42)

# Train the default SVM model using the training data (features X_train and labels y_train).
def_svc.fit(X_train, y_train)

# Predict labels for the test dataset using the trained default SVM model.
def_y_pred = def_svc.predict(X_test)

# Print a header to indicate the performance metrics for the default model.
print("Default Model Stats:")

# Display and print the performance metrics of the default model using the test labels (y_test) and the predicted labels (def_y_pred).
# The metrics will be displayed for classes 0 and 1.
display_model_performance_metrics(
    true_labels=y_test, predicted_labels=def_y_pred, classes=[0, 1]
)

### Tune Model with Grid Search

Since we are working with an SVM (Support Vector Machine) model, we will focus on tuning several key hyperparameters specific to this algorithm. These include:

- **Regularization Parameter (C)**: Controls the trade-off between achieving a wide margin and correctly classifying training points. A smaller C creates a larger margin but may allow some misclassifications, while a larger C aims for fewer misclassifications at the cost of a narrower margin.
- **Kernel Function**: Determines how data is transformed into a higher-dimensional feature space. Common choices include linear, polynomial, and radial basis function (RBF) kernels.
- **Gamma**: Defines the influence of a single training example. A low gamma means a training point has influence over a larger area, while a high gamma restricts its influence to a smaller region.

There are additional hyperparameters that can be tuned, and you can explore them in detail [here](http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html).

To systematically search for the best combination of hyperparameters, we will create a grid of predefined values for C, kernel, and gamma. Next, we need to select a scoring metric to evaluate model performance—in this case, we aim to maximize accuracy.

Once the grid and scoring metric are defined, we will use five-fold cross-validation to train and evaluate multiple models across the grid. This process involves splitting the training data into five subsets, training the model on four subsets, and validating it on the fifth. This is repeated five times, ensuring each subset is used for validation once. The result is a robust evaluation of each hyperparameter combination, allowing us to identify the best-performing model.

(This step should take about three minutes to complete.)


In [None]:
# Define the grid of hyperparameters to search over for the SVM model.
grid_parameters = {
    "kernel": [
        "linear",
        "rbf",
    ],  # Kernels to try: linear and radial basis function (rbf).
    "gamma": [1e-3, 1e-4],  # Gamma values to try for rbf kernel (kernel coefficient).
    "C": [1, 10, 50, 100],  # C values to try (regularization parameter).
}

# Indicate that hyperparameter tuning is starting, specifically for optimizing accuracy.
print("# Tuning hyper-parameters for accuracy\n")

# Initialize GridSearchCV for hyperparameter tuning of an SVC classifier.
# SVC(random_state=42): Creates an SVC classifier with a fixed random state for reproducibility.
# grid_parameters: The parameter grid defined above to search through.
# cv=5: Perform 5-fold cross-validation.
# scoring="accuracy": Evaluate models based on accuracy.
clf = GridSearchCV(SVC(random_state=42), grid_parameters, cv=5, scoring="accuracy")

# Fit the GridSearchCV object to the training data (X_train, y_train) to find the best hyperparameter combination.
clf.fit(X_train, y_train)

# Display the accuracy scores obtained for each hyperparameter combination during cross-validation.
print("Grid scores for all the models based on CV:\n")

# Extract the mean test scores from the GridSearchCV results. These are the average accuracy scores across the cross-validation folds for each parameter combination.
means = clf.cv_results_["mean_test_score"]

# Extract the standard deviation of the test scores from the GridSearchCV results. This indicates the variability of the accuracy scores across the cross-validation folds for each parameter combination.
stds = clf.cv_results_["std_test_score"]

# Iterate through the mean scores, standard deviations, and parameter combinations to print the results for each model.
for mean, std, params in zip(means, stds, clf.cv_results_["params"]):
    # Print the mean accuracy score and its standard deviation (multiplied by 2 to represent approximately 95% confidence interval) for each parameter setting.
    print("%0.5f (+/-%0.05f) for %r" % (mean, std * 2, params))

# Output the best hyperparameter combination found by GridSearchCV on the development set (which is the training set in this case, due to cross-validation).
print("\nBest parameters set found on development set:", clf.best_params_)

# Output the best mean cross-validation score (accuracy) achieved with the best hyperparameter combination. This is an estimate of the model's performance on unseen data.
print("Best model validation accuracy:", clf.best_score_)

We can see the best model parameters were obtained
based on cross-validation accuracy and we get a pretty awesome validation accuracy of 96%.


### Evaluate Grid Search Tuned Model

Now that we’ve optimized and fine-tuned our model, it’s time to put it to the ultimate test—evaluating its performance on the test dataset! This step will help us understand how well our model generalizes to unseen data and whether the tuning process has truly enhanced its predictive power. Let’s dive in and see how our refined model performs!

In [None]:
# Retrieves the best estimator found by GridSearchCV (or similar hyperparameter tuning process).
gs_best = clf.best_estimator_

# Uses the best estimator to make predictions on the test set (X_test).
tuned_y_pred = gs_best.predict(X_test)

# Prints a header to indicate the performance of the tuned model.
print("\n\nTuned Model Stats:")

# Calls a function to display performance metrics for the tuned model.
display_model_performance_metrics(
    true_labels=y_test, predicted_labels=tuned_y_pred, classes=[0, 1]
)

Our optimized model achieves an impressive **F1 Score** and **model accuracy** of 97% on the test dataset, demonstrating the significant impact of hyperparameter tuning. This result highlights how fine-tuning can dramatically enhance a model's performance and its ability to generalize to new data.

This approach isn’t limited to just one type of model—it can be extended to various machine learning algorithms and their respective hyperparameters. Additionally, the evaluation metric you choose to optimize can be tailored to your specific use case. The scikit-learn framework offers a wide range of scoring metrics to suit different needs, such as:

- **adjusted_rand_score**: Useful for clustering tasks.
- **average_precision**: Ideal for imbalanced classification problems.
- **average_recall**: Focuses on the model’s ability to capture positive instances.

By experimenting with different models, hyperparameters, and evaluation metrics, you can unlock the full potential of your machine learning workflows. The flexibility provided by scikit-learn makes it easier to adapt these techniques to diverse problems and datasets.


### Tune Model with Randomized Search

While **grid search** is a common approach for hyperparameter tuning, it comes with significant limitations. One of the most notable drawbacks is the need to manually specify the grid of hyperparameter values, which introduces a human element into what could otherwise be a fully automated process.

**Randomized Search** addresses this limitation by offering a more flexible and efficient alternative. Unlike grid search, which requires predefined values for each hyperparameter, randomized search can accept distributions as input. For example, instead of explicitly listing values for the gamma parameter (as we did in the previous section), we can provide a statistical distribution from which values are randomly sampled.

The effectiveness of randomized search is supported by both empirical evidence and mathematical theory. It leverages the fact that hyperparameter optimization problems often have low effective dimensionality, meaning that only a few hyperparameters significantly impact model performance. By focusing on these influential parameters, randomized search can achieve comparable or even better results than grid search, often with far fewer iterations.

To control the extent of the search, we specify the number of iterations (n_iter). A higher number of iterations allows for a more thorough exploration of the hyperparameter space, potentially leading to better results. However, this comes at the cost of increased computational time. Balancing the number of iterations is key to achieving an efficient and effective tuning process.

(This step should take about five minutes to complete.)


In [None]:
# Define the parameter grid for RandomizedSearchCV.
param_grid = {
    # Define 'C' hyperparameter to be sampled from an exponential distribution with scale=10.
    "C": scipy.stats.expon(scale=10),
    
    # Define 'gamma' hyperparameter to be sampled from an exponential distribution with scale=0.1.
    "gamma": scipy.stats.expon(scale=0.1),
    
    # Define 'kernel' hyperparameter to choose from 'rbf' or 'linear'.
    "kernel": ["rbf", "linear"],
}

# Initialize RandomizedSearchCV for hyperparameter tuning of SVC.
random_search = RandomizedSearchCV(
    # Use SVC classifier with a fixed random state for reproducibility.
    SVC(random_state=42),
    
    # Specify the parameter distributions to sample from.
    param_distributions=param_grid,
    
    # Set the number of iterations for random parameter combinations to 50.
    n_iter=50,
    
    # Use 5-fold cross-validation.
    cv=5,
)
# Fit the RandomizedSearchCV model to the training data (X_train, y_train).
random_search.fit(X_train, y_train)

# Print a header for the grid scores from cross-validation.
print("Grid scores for all the models based on CV:\n")

# Extract the mean test scores from the RandomizedSearchCV results.
means = random_search.cv_results_["mean_test_score"]

# Extract the standard deviation of the test scores from the RandomizedSearchCV results.
stds = random_search.cv_results_["std_test_score"]

# Iterate through the mean scores, standard deviations, and parameter sets from the cross-validation.
for mean, std, params in zip(means, stds, random_search.cv_results_["params"]):
    # Print the mean score, 95% confidence interval (std * 2), and the corresponding parameter set for each model.
    print("%0.5f (+/-%0.05f) for %r" % (mean, std * 2, params))
    
# Print a header for the best parameter set found by RandomizedSearchCV.
print("\nBest parameters set found on development set:", random_search.best_params_)

# Print the best model's validation accuracy (mean cross-validation score for the best parameter set).
print("Best model validation accuracy:", random_search.best_score_)

## Evaluate Randomized Search Tuned Model

After completing the randomized search process, the next step is to evaluate the performance of the best model identified during the tuning phase. Here's how we’ll proceed:

- **Retrieve the Best Model**:
    - Randomized search returns the best combination of hyperparameters based on the specified scoring metric. We will extract this optimal model for further evaluation.

- **Make Predictions**:
    - Using the best model, we will generate predictions on the test dataset. This allows us to assess how well the model generalizes to unseen data.

- **Evaluate Performance**:
    - We will evaluate the model's performance using relevant metrics, such as accuracy, precision, recall, F1 score, or any other metric appropriate for the task. These metrics provide a comprehensive understanding of the model's effectiveness.

- **Compare Results**:
    - To gauge the impact of randomized search, we will compare the performance of the tuned model with the baseline model (e.g., the model trained with default hyperparameters). This comparison highlights the improvements achieved through hyperparameter optimization.

By systematically evaluating the tuned model, we can ensure that the randomized search process has successfully enhanced the model's performance and that it is ready for deployment or further refinement. This step is crucial for validating the effectiveness of the tuning process and building confidence in the model's predictive capabilities.


In [None]:
# Retrieve the best estimator found by RandomizedSearchCV.
rs_best = random_search.best_estimator_

# Use the best estimator to predict labels for the test dataset (X_test).
rs_y_pred = rs_best.predict(X_test)

# Evaluate the performance of the best estimator using a function called get_metrics,
# comparing the true labels (y_test) with the predicted labels (rs_y_pred).
get_metrics(true_labels=y_test, predicted_labels=rs_y_pred)

In this approach, we are sampling the values for the hyperparameters C (regularization parameter) and gamma from an exponential distribution. This allows for a more dynamic and flexible exploration of the hyperparameter space compared to manually specifying fixed values. Additionally, we control the extent of the search by setting the n_iter parameter, which determines the number of iterations or random combinations of hyperparameters to evaluate.

While the overall performance of the model tuned using randomized search is often comparable to that achieved with grid search, the primary goal here is to highlight the different strategies available for model tuning. Randomized search offers a more efficient and scalable alternative, especially when dealing with a large hyperparameter space or limited computational resources. By understanding and leveraging these diverse tuning strategies, you can choose the most suitable approach for your specific use case and constraints.

## Model Interpretation

The ability to interpret machine learning models in a clear and understandable way is invaluable—not only for analytics teams but also for key stakeholders who need to understand how these models make decisions. While some machine learning algorithms, like decision trees, inherently provide interpretable outputs (such as variable importance), many others, especially complex models, lack this transparency. As a result, predictive models are often perceived as black boxes, making it challenging to trust or explain their decision-making processes.

Model interpretation addresses this challenge by shedding light on how models operate, offering several key benefits:

- **Bridging the Gap Between Technology and Business**:
    - Interpretable models help align technical teams with business stakeholders by providing clear explanations for predictions. For instance, if a model predicts a specific outcome, interpretation techniques can reveal the underlying reasons, which can then be validated using domain expertise. This fosters trust and collaboration between teams.

- **Enhancing Feature Engineering**:
    - By understanding how features interact and contribute to predictions, data scientists can refine their feature engineering processes. This deeper insight often leads to improved model performance and more robust solutions.

- **Facilitating Model Comparisons**:
    - Interpretation tools enable data scientists to compare different models effectively, identifying which one aligns best with business objectives. This makes it easier to justify model choices to stakeholders.

- **Improving Stakeholder Communication**:
    - Clear interpretations make it easier to explain model results to non-technical audiences, ensuring that business decisions are informed and actionable.

Model interpretation is a critical step in the machine learning workflow. It not only demystifies complex models but also empowers both data scientists and business stakeholders to make better, more informed decisions. By leveraging interpretation techniques, we can transform black-box models into transparent, trustworthy tools that drive meaningful outcomes.

### Understanding LIME and SHAP

**LIME (Local Interpretable Model-agnostic Explanations)** and **SHAP (SHapley Additive exPlanations)** are two advanced techniques designed to make machine learning models more interpretable and transparent. Both tools aim to explain how models arrive at their predictions, but they do so in different ways, each with its own strengths.

**LIME** focuses on providing local explanations for individual predictions. It works by approximating the behavior of a complex machine learning model using a simpler, interpretable model (such as linear regression) in the vicinity of a specific data point. Here’s how it works:
- **Perturbation**: LIME generates slight variations of the input data by perturbing the features.
- **Prediction**: It uses the original model to make predictions for these perturbed instances.
- **Interpretable Model**: A simple model is then trained on the perturbed data and predictions, focusing on the local region around the instance of interest.
- **Explanation**: The simple model provides insights into how the original model made its prediction for that specific instance.

LIME is particularly useful for understanding individual predictions and is model-agnostic, meaning it can be applied to any machine learning model. However, its explanations are limited to local regions and may not capture the global behavior of the model.


**SHAP**, on the other hand, is rooted in game theory and provides a more comprehensive framework for model interpretation. It calculates the contribution of each feature to the model’s prediction by fairly distributing the prediction’s impact across all features. Key aspects of SHAP include:

- **Shapley Values**: SHAP uses Shapley values, a concept from cooperative game theory, to quantify the contribution of each feature.
- **Additivity**: SHAP values are additive, meaning the sum of all feature contributions equals the difference between the model’s prediction and the baseline (average) prediction.
- **Global and Local Interpretability**: SHAP provides both local explanations (for individual predictions) and global explanations (for overall feature importance), making it a versatile tool.

SHAP is particularly powerful because it ensures consistency and fairness in feature attribution, making it highly effective for understanding the overall impact of features on model predictions.

**Complementary Strengths**:
- LIME is ideal for explaining individual predictions in a simple, intuitive way.
- SHAP excels in providing consistent and fair feature importance, both locally and globally.

Together, LIME and SHAP offer a robust toolkit for enhancing the transparency and trustworthiness of machine learning models, enabling data scientists and stakeholders to better understand and trust complex decision-making processes.


In [None]:
# Import the warnings module to handle and filter warnings.
import warnings

# Filter all warnings to be ignored. This is often used to suppress less important warning messages for cleaner output.
warnings.filterwarnings("ignore")

# Import the LogisticRegression class from the linear_model module of the sklearn library.
# This class will be used to create a logistic regression model for classification.
from sklearn import linear_model

# Initialize a Logistic Regression model object.
# This creates an instance of the LogisticRegression classifier with default parameters.
logistic = linear_model.LogisticRegression()

# Train the Logistic Regression model using the training data.
# X_train is the feature matrix of the training data, and y_train is the target variable (labels) for the training data.
# The fit() method learns the relationship between features and target variable from the training data.
logistic.fit(X_train, y_train)

In [None]:
# SHAP Explanation
# Uses kmeans clustering to create a background dataset from the training data (X_train) for KernelExplainer.
background = shap.kmeans(X_train, 50)  # Summarize background using 50 clusters

# Initializes a KernelExplainer object. KernelExplainer is model-agnostic and approximates SHAP values for any prediction function.
# It uses the logistic.predict_proba function (likely from a trained logistic regression model) to explain predictions.
# The 'background' dataset is used to estimate expected values in the SHAP calculation, improving efficiency.
explainer = shap.KernelExplainer(
    logistic.predict_proba, background
)  # Use KernelExplainer

# Calculates SHAP values for the test dataset (X_test).
# SHAP values quantify the contribution of each feature to the prediction for each instance in X_test, based on the KernelExplainer and the model's predict_proba function.
shap_values = explainer.shap_values(X_test)

In [None]:
# Plot SHAP summary plot
# This line generates a SHAP summary plot, which is a visualization to understand feature importance and their impact on the model output.
# shap_values: The SHAP values calculated for the test dataset (X_test). These values represent the contribution of each feature to each individual prediction.
# X_test: The test dataset used for prediction. This is needed to show the actual feature values in the summary plot.
# feature_names=bc.feature_names:  Specifies the names of the features. It's assumed 'bc.feature_names' contains a list of feature names corresponding to the columns in X_test, likely from a dataset object 'bc'.
shap.summary_plot(shap_values, X_test, feature_names=bc.feature_names)

### Key Takeaways from the Plot

- The SHAP interaction values are centered around 0, meaning no strong interaction effects dominate.
- Mean radius and mean texture seem to have moderate interactions, with red and blue points somewhat spread out.
- If you see larger deviations from 0 in any interaction, it indicates that the combination of those features significantly impacts predictions (either increasing or decreasing the model output more than their individual contributions).


### Explaining Predictions

Let’s dive into interpreting actual predictions from our model. We’ll select two data points from the dataset:
- A data point classified as benigh or not having cancer (label 1).
- A data point classified as malignant or having cancer (label 0).

For each of these instances, we’ll use interpretation techniques (such as LIME or SHAP) to understand how the model arrived at its predictions. This process will help us uncover the key features influencing the model’s decisions and provide insights into the prediction-making process. By doing so, we can better explain the model’s behavior to stakeholders and ensure its predictions are both accurate and interpretable. Let’s get started!


In [None]:
# Initialize LimeTabularExplainer for explaining tabular data predictions.
lime_explainer = LimeTabularExplainer(
    X_train,  # Training data (numpy array or pandas DataFrame) used to understand the feature ranges and distributions.
    feature_names=bc.feature_names,  # List of feature names corresponding to the columns in X_train.
    discretize_continuous=True,  # Whether to discretize continuous features. Set to True for tabular data.
    class_names=["0", "1"],  # List of class names or labels for the target variable.
)

In [None]:
# Generate explanation for an individual prediction from the test set using LIME.
lime_exp = lime_explainer.explain_instance(X_test[0], logistic.predict_proba)

# Display the LIME explanation in the notebook for visual interpretation.
lime_exp.show_in_notebook()


### Key Takeaways from the LIME Explanation above

1. Prediction Probabilities

   - The model predicts 87% (0.87) probability for Benign (1).
   - The probability for Malignant (0) is only 13% (0.13).
   - This means the model strongly believes the tumor is benigh.

2. Feature Contributions

   - Features supporting Benign (1) are in orange (positive contribution to benign).
   - Features supporting Malignant (0) are in blue (negative contribution to benign, pushing toward malignancy).
   - The most important benign indicators are:
     - Worst perimeter (96.05)
     - Worst area (677.90)
     - Mean area (481.90)
     - Area error (30.29)
   - The most important malignancy indicators (blue) are:
     - Mean perimeter (81.09)
     - Worst radius (14.97)
     - Mean radius (12.47)

3. Feature Value Ranges
   - The middle section lists decision splits from the model (e.g., “84.54 < worst perimeter <= 97.75” means a higher perimeter increases benign probability).
   - Larger values for worst perimeter, worst area, and mean area strongly push the prediction toward benign.

#### Final Interpretation

Even though some features (blue) support the malignant classification, the dominant benign-supporting features outweigh them. Therefore, the model predicts Benign (1) with high confidence (87%).


Let’s apply a similar analytical approach to interpret a data point associated with malignant cancer. 

In [None]:
# Explain the prediction for a single instance (the second instance) from the test set (X_test[1])
lime_exp = lime_explainer.explain_instance(X_test[1], logistic.predict_proba)

# Display the Lime explanation in the notebook for visual interpretation.
lime_exp.show_in_notebook()

### Key Takeaways from the LIME Explanation above

1. Prediction Probabilities
   - The model is 100% certain this is a malignant tumor (0.00 probability for benign (1)).
   - The blue bar (0) is fully filled, meaning all supporting features push toward malignant.
2. Feature Contributions
   - Blue bars (supporting malignant classification):
     - Worst perimeter (165.90)
     - Worst area (1866.00)
     - Mean area (1130.00)
     - Worst texture (26.58)
   - Orange bars (supporting benign classification):
     - Mean perimeter (123.60)
     - Worst radius (24.86)
     - Mean radius (18.94)
   - Since the blue features dominate, the model predicts malignant (0) with full confidence.
3. Decision Splits & Feature Importance
   - The middle section shows decision splits used by the model.
     - For example, “mean perimeter > 105.62” slightly supports benign (orange).
     - However, features like “worst perimeter > 125.30” strongly push toward malignant.

#### Final Interpretation

Even though a few features (like mean perimeter and worst radius) slightly support benign, the overall strong benign indicators (worst perimeter, mean area, worst area) outweigh them completely.
The model is highly confident this tumor is malignant (100% probability).


## Model Deployment

The final and crucial step in the machine learning workflow is model deployment, where we transition from a trained model to a fully operational system that can be used in real-world applications. Deployment is the process of integrating the model into a production environment, making it accessible to end-users or other systems for generating predictions on new data.

**Why is Model Deployment Important?**
- *Real-World Impact*: A model’s true value is realized only when it is deployed and used to solve real-world problems.
- *Automation*: Deployment allows for automated decision-making, reducing the need for manual intervention.
- *Scalability*: A deployed model can handle large volumes of data and serve multiple users simultaneously.
- *Continuous Improvement*: Once deployed, the model can be monitored and updated to maintain or improve its performance over time.

**Key Considerations for Model Deployment**
- Choosing the Right Deployment Strategy:
    - *Batch Processing*: The model processes data in batches at scheduled intervals (e.g., daily or weekly).
    - *Real-Time Inference*: The model provides instant predictions as new data arrives (e.g., fraud detection or recommendation systems).
- Infrastructure and Tools:
    - *Cloud Platforms*: Services like AWS SageMaker, Google AI Platform, or Azure ML simplify deployment.
    - *Containerization*: Tools like Docker and Kubernetes help package and manage models for scalable deployment.
    - *APIs*: Exposing the model as a REST API or GraphQL endpoint allows easy integration with other systems.
- Monitoring and Maintenance:
    - *Performance Monitoring*: Track metrics like prediction accuracy, latency, and throughput to ensure the model is functioning as expected.
    - *Data Drift Detection*: Monitor for changes in input data distribution that may degrade model performance.
    - *Model Retraining*: Periodically retrain the model with new data to maintain its relevance and accuracy.
- Security and Compliance:
    - Ensure the deployed model adheres to data privacy regulations (e.g., GDPR, HIPAA).
    - Implement security measures to protect the model and data from unauthorized access or attacks.

**Steps to Deploy a Model**
- *Export the Model*: Save the trained model in a format suitable for deployment (e.g., .pkl for scikit-learn models, .pt for PyTorch models).
- *Set Up the Environment*: Create a production environment with the necessary dependencies and infrastructure.
- *Build an API*: Use frameworks like Flask, FastAPI, or Django to create an API for serving predictions.
- *Deploy to Production*: Use cloud platforms, containers, or serverless architectures to deploy the model.
- *Test and Monitor*: Thoroughly test the deployed model and set up monitoring tools to track its performance.

### Persist model to disk

To persist our model to disk, we can use serialization libraries such as `pickle` or `joblib` (the latter being available through scikit-learn). This approach enables model deployment and future reuse without requiring retraining every time we need to make predictions. The serialized model file maintains all learned parameters and can be reloaded when needed.

In [None]:
# Saves the trained logistic regression model to a file named 'lr_model.pkl' using joblib for later use.
joblib.dump(logistic, "lr_model.pkl")

### Load model from disk

Whenever we load this object into memory again, we will retrieve the logistic regression model object with its trained parameters intact. This allows us to reuse the model for predictions or further analysis without needing to retrain it each time. The serialized object preserves all the learned weights, feature coefficients, and model configuration, ensuring consistent behavior when reloaded.

In [None]:
# Load a pre-trained Logistic Regression model from a pickle file named "lr_model.pkl".
lr = joblib.load("lr_model.pkl")

# Display the loaded Logistic Regression model object. This will show the model's parameters and structure.
lr

### Predict with loaded model
Now that the logistic regression model (`lr`) has been successfully loaded from disk, we can use this trained model object to generate predictions on new data. The `lr` object contains all the learned coefficients and model parameters from the training process, allowing us to call standard methods like `.predict()` or `.predict_proba()` on fresh input samples. This enables us to apply the previously trained model to make classifications or probability estimates without needing to retrain.

In [None]:
# Print the true value from the y_test dataset for the index range 10 to 11 (exclusive of 11, so effectively index 10).
print("True value: ", y_test[10:11])

# Print the predicted value for the corresponding input data from X_test dataset at index range 10 to 11, using the trained linear regression model 'lr'.
print("Predicted value: ", lr.predict(X_test[10:11]))

## Conclusion

Through this tutorial, you have gained practical experience in making machine learning models more transparent and interpretable, optimizing their performance through proper tuning, and preparing them for real-world deployment. These skills are essential for bridging the gap between technical implementation and business understanding in machine learning projects.

## Clean up

Remember to shut down your Jupyter Notebook environment and delete any unnecessary files or resources once you've completed the tutorial.
