# Hands-On: Counterfactual Explanations in Python

Popular Python packages for computing counterfactual explanations (CFs):
- [DiCE](https://github.com/interpretml/DiCE) -- general algorithms (incl. diversity)
- [CARLA](https://github.com/carla-recourse/CARLA) -- many algorithms (incl. causality)
- [Alibi](https://github.com/SeldonIO/alibi) -- Counterfactual Guided by Prototypes (outdated tensorflow version!)
- [CEML](https://github.com/andreArtelt/ceml) -- model-specific algorithms (incl. plausibility)
- ...

We consider the (toy) problem of explaining breast cancer predictions made by a random forest classifier.
In this context, we demonstrate how to
1. Implement a nearest unlike neighbor (NUN) CF baseline
2. Use DiCE for computing diverse CFs

In [None]:
%pip install dice-ml scikit-learn matplotlib

In [None]:
%pip install "numpy<2"

In [None]:
from typing import Callable
import dice_ml
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
from sklearn.ensemble import RandomForestClassifier

In [None]:
# Helper function for plotting CFs
def plot_barplot(x: np.ndarray, labels: list):
    """
    Creates a labeled bar plot.

    Parameters
    ----------
    x : `numpy.ndarray`
        Bar height of each item.
    labels : `list`
        Labels for each entry in x
    """
    fig, ax = plt.subplots()
    ax.barh(labels, x, align='center', height=0.5)
    plt.show()

### Problem Setup

Breast cancer prediction using a random forest classifier.

1. Load data set
2. Split into train and test set
3. Train random forest classifier
4. Evalute classifier

#### 1. Load data set

In [None]:
# Get feature names of breast cancer data set
df_data = load_breast_cancer(as_frame=True)
feature_names, target_names = list(df_data.feature_names), list(df_data.target_names)

feature_names, target_names

In [None]:
# Load breast cancer data set
X, y = load_breast_cancer(return_X_y=True)

X.shape, y.shape   # Show dimensions of data

In [None]:
# For illustrative purposes: Merge feature names and target names with Numpy arrays
X_data = pd.DataFrame(X, columns=feature_names)
y_data = pd.DataFrame([target_names[y_i] for y_i in y], columns=["Label"])

X_data.head()

In [None]:
y_data.head()

#### 2. Split into train and test set

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=.3,  # 30% test data
                                                    shuffle=True)

X_train.shape, y_train.shape, X_test.shape, y_test.shape

#### 3. Train random forest classifier

In [None]:
clf = RandomForestClassifier()
clf.fit(X_train, y_train)

In [None]:
y_train_pred = clf.predict(X_train)

#### 4. Evalute classifier

In [None]:
y_test_pred = clf.predict(X_test)
f1_score(y_test, y_test_pred)

#### Pick samples to be explained

We want to explain all malignant predictions!

In [None]:
# Negative classified samples -- i.e. y = 0 => malignant
idx = y_test_pred == 0
X_test_neg = X_test[idx, :]

In [None]:
X_test_neg.shape

## A simple baseline: Nearest Unlike Neighbor Counterfactuals

Assuming a set $D=\{(x_i, y_i)\}$ of (labeled) samples is available -- e.g. training/validation/test data set. The *Nearest Unlike Neighbor* (NUN) Counterfactual corresponds to the closest sample from this test labeled with the requested prediction:

$$\vec{x}_{cf} = \underset{\vec{x}_i\in D \mid y_i = y_{cf}}{\arg\min} d(\vec{x}_{cf}, \vec{x}_i)$$

This simple method often constitutes a surprisingly good/strong baseline for plausible counterfactual explanations.

In [None]:
# Different distance functions for measuring closeness/proximity to the original sample
def get_dist_func(desc: str, epsilon: float = 1e-3) -> Callable[[np.ndarray, np.ndarray], float]:
    """
    Creates and returns a distance function for comparing the
    similarity of two counterfactuals.

    Parameters
    ----------
    desc : `str`
        Name of the distance function.

        Must be one of the following:

            - l0
            - l1
            - l2
    epsilon : `float`, optional
        Threshold at which the difference between two values is considered zero.
        Only relevant in the case of "l0".

        The default is 1e-3

    Returns
    -------
    `Callable[[np.ndarray, np.ndarray], float]`
        Distance function.
    """
    if desc == "l0":
        return lambda x_orig, x_cf: np.sum(np.abs(x_orig - x_cf) < epsilon)
    elif desc == "l1":
        return lambda x_orig, x_cf: np.sum(np.abs(x_orig - x_cf))
    elif desc == "l2":
        return lambda x_orig, x_cf: np.sum(np.square(x_orig - x_cf))
    else:
        raise ValueError(f"Unknown distance function '{desc}'")

In [None]:
# Nearest Unlike Neighbor Counterfactual
class NearestUnlikeNeighborCF():
    """
    Implementation of the Nearest Unlike Neighbor counterfactual explanation method.

    Parameters
    ----------
    clf : `sklearn.base.ClassifierMixin`
        Classifier that is going to be explained.
    X_train : `numpy.ndarray`
        Input training data.
    y_tain : `numpy.ndarray`
        Labels of the corresponding input training data.
    dist : `str`, optional
        Description/Name of the distance function for comparing
        the similarity of two counterfactuals.

        The default is "l1".
    """
    def __init__(self, clf: sklearn.base.ClassifierMixin, X_train: np.ndarray, y_train: np.ndarray, dist="l1"):
        self.clf = clf

        y_pred = clf.predict(X_train)
        mask = y_pred == y_train    # Limit the feasible set to correctly classified samples, assuming the label is available
        self.X = X_train[mask, :]
        self.y = y_train[mask]

        self.dist = dist
        if not callable(self.dist):
            self.dist = get_dist_func(dist)

    def compute_counterfactual(self, x_orig: np.ndarray, y_target: int) -> np.ndarray:
        """
        Computes a counterfactual explanation.

        Note that this function returns the counterfactual sample -- i.e.
        the final data point after applying the counterfactual change.

        Parameters
        ----------
        x_orig : `numpy.ndarray`
            Original data point.
        y_target : `int`
            Requested target class -- the final counterfactual sample should
            be classified as specified in y_target.

        Returns
        -------
        `numpy.ndarray`
            The counterfactual sample -- i.e. the final data point after
            applying the counterfactual change.
        """
        # Identify potential target samples
        mask = self.y == y_target
        X_ = self.X[mask, :]

        # Find the closest samples with the requested prediction
        X_diff = X_ - x_orig
        dist = [self.dist(x_orig, X_[i, :]) for i in range(X_diff.shape[0])]
        idx = np.argmin(dist)

        return X_[idx, :]

    def compute_delta_cf(self, x_orig: np.ndarray, y_target: int) -> np.ndarray:
        """
        Computes a counterfactual explanation -- i.e. the change that has to be
        applied to the original sample in order to change its classification.

        Parameters
        ----------
        x_orig : `numpy.ndarray`
            Original data point.
        y_target : `int`
            Requested target class -- the final counterfactual sample should be
            classified as specified in y_target.

        Returns
        -------
        `numpy.ndarray`
            The change that, if added to the original sample x_orig,
            would change the classification as requested.
        """
        return self.compute_counterfactual(x_orig, y_target) - x_orig
    
    def compute_counterfactual_batch(self, X_orig: np.ndarray, y_target: np.ndarray) -> np.ndarray:
        """
        Computes a batch of counterfactual explanations.

        Note that this function returns the counterfactual samples -- i.e.
        the final data points after applying the counterfactual changes.

        Parameters
        ----------
        X_orig : `numpy.ndarray`
            Batch of original data point.
        y_target : `int`
            Batch of requested target class -- the final counterfactual samples should
            be classified as specified in y_target.

        Returns
        -------
        `numpy.ndarray`
            Batch of counterfactual samples -- i.e. the final data points after
            applying the counterfactual changes.
        """
        X_cf = []

        for i in range(X_orig.shape[0]):
            X_cf.append(self.compute_counterfactual(X_orig[i, :], y_target[i]))

        return np.array(X_cf)
    
    def compute_delta_cf_batch(self, X_orig: np.ndarray, y_target: np.ndarray) -> np.ndarray:
        """
        Computes a batch counterfactual explanation -- i.e. the changes that have to be
        applied to a batch of original samples in order to change their classification.

        Parameters
        ----------
        X_orig : `numpy.ndarray`
            Batch of original data points.
        y_target : `int`
            Batch of requested target classes -- the final counterfactual samples
            should be classified as specified in  y_target.

        Returns
        -------
        `numpy.ndarray`
            Batch of changes that, if added to the original samples X_orig,
            would change their classification as requested.
        """
        Delta_cf = []

        for i in range(X_orig.shape[0]):
            Delta_cf.append(self.compute_counterfactual(X_orig[i, :], y_target[i]) - X_orig[i, :])
        
        return np.array(Delta_cf)

#### Compute Counterfactual Explanations

In [None]:
cf_baseline = NearestUnlikeNeighborCF(clf, X_train, y_train)

In [None]:
X_cf = cf_baseline.compute_counterfactual_batch(X_orig=X_test_neg,
                                                y_target=np.array([1] * X_test_neg.shape[0]))  # Flip prediction from 0 -> 1 for every query sample

Inspect first counterfactual

In [None]:
X_cf[0, :]

Compute and inspect changes delta

In [None]:
Delta_cf = cf_baseline.compute_delta_cf_batch(X_orig=X_test_neg,
                                              y_target=np.array([1] * X_test_neg.shape[0]))

In [None]:
Delta_cf[0, :]

In [None]:
np.round(Delta_cf[0, :], 1)

Visualize the counterfactual

In [None]:
plot_barplot(Delta_cf[0, :], feature_names)

## DiCE: Diverse Counterfactual Explanations

#### Pre-requisits

1. Predictive model (e.g. classifier)
2. (Training) data set (as a pandas [DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html))
3. Query sample(s) -- i.e. sample(s) for which we want to generate a CF

#### How to use DiCE

1. Wrap (training) data set as a [dice_ml.Data](https://interpret.ml/DiCE/dice_ml.html#dice_ml.data.Data) instance
2. Wrap predictive model as a [dice_ml.Model](https://interpret.ml/DiCE/dice_ml.html#dice_ml.model.Model) instance
3. Wrap query samples as a pandas [DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html)
4. Compute CFs by using creating a [dice_ml.Dice](https://interpret.ml/DiCE/dice_ml.html#dice_ml.dice.Dice) instance and calling the [generate_counterfactuals()](https://interpret.ml/DiCE/dice_ml.explainer_interfaces.html#dice_ml.explainer_interfaces.explainer_base.ExplainerBase.generate_counterfactuals) function

#### 1. Wrap (training) data set as a [dice_ml.Data](https://interpret.ml/DiCE/dice_ml.html#dice_ml.data.Data) instance

In [None]:
X_df = pd.DataFrame(X_train, columns=feature_names)
y_df = pd.DataFrame(y_train_pred, columns=["y"]).astype(np.int32)  # Alternative, use only correctly classified samples!
data_df = pd.concat([X_df, y_df], axis=1)

data = dice_ml.Data(dataframe=data_df,
                    continuous_features=feature_names,  # All feature are continous!
                    outcome_name='y')

#### 2. Wrap predictive model as a [dice_ml.Model](https://interpret.ml/DiCE/dice_ml.html#dice_ml.model.Model) instance

In [None]:
model = dice_ml.Model(model=clf, backend='sklearn')

#### 3. Wrap query samples as a pandas [DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html)

In [None]:
X_queries = pd.DataFrame(X_test_neg, columns=feature_names)

#### 4. Compute CFs by using creating a [dice_ml.Dice](https://interpret.ml/DiCE/dice_ml.html#dice_ml.dice.Dice) instance and calling the [generate_counterfactuals()](https://interpret.ml/DiCE/dice_ml.explainer_interfaces.html#dice_ml.explainer_interfaces.explainer_base.ExplainerBase.generate_counterfactuals) function

In [None]:
cf_algo = dice_ml.Dice(data, model, method="genetic")   # Evolutionary algorithm for computing CFs -- see documentation for alternatives!

cf_results = cf_algo.generate_counterfactuals(X_queries,
                                              total_CFs=3,        # 3 diverse CFs
                                              desired_class="opposite",  # Flip prediction!
                                              verbose=False)

In [None]:
cf_results

In [None]:
#cf_results.visualize_as_dataframe()

In [None]:
len(cf_results.cf_examples_list)

Inspect CFs of the first query sample

In [None]:
cf_results.cf_examples_list[0].final_cfs_df

Export to [NumPy arrays](https://numpy.org/doc/stable/reference/generated/numpy.array.html)

In [None]:
X_cf = cf_results.cf_examples_list[0].final_cfs_df[feature_names].to_numpy() 

In [None]:
X_cf

Compute change vector

In [None]:
X_cf - X_test_neg[0, :] 

In [None]:
np.round(X_cf - X_test_neg[0, :], 1)

Visualize the counterfactual (i.e. change vector)

In [None]:
plot_barplot(X_cf[0, :] - X_test_neg[0, :], feature_names)