# TruSHAP

Add your model to a TruEra deployment with only 2! code changes to your notebooks that already use SHAP.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/truera-examples/blob/release/prod/extensions/TruSHAP-example.ipynb)

### 0.1: Import Packages

In [None]:
from truera.client.experimental.trushap import trushap as shap
from sklearn.tree import DecisionTreeRegressor
import pandas as pd
import xgboost
from sklearn.model_selection import train_test_split

### 0.2: Train the Model

In [None]:
X, y = shap.datasets.adult()
x_cols = list(X.columns)
y_cols = ["y"]
y = y.astype(int)
model = xgboost.XGBRegressor().fit(X, y)

X = X.reset_index(names="id")

y = pd.DataFrame(y, columns=y_cols).reset_index(names="id")
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                    test_size = 0.3,
                                    random_state = 123)
                                    
model = xgboost.XGBRegressor().fit(X, y)

### 1.0: Get Shapley Values

In [None]:
#connection details
CONNECTION_STRING = "https://app.truera.net/"
TOKEN = "..."

In [None]:
#just add connection string and token (optional)
explainer = shap.Explainer(model, connection_string = CONNECTION_STRING, token = TOKEN)

shap_values = explainer(X_train, pre_data_col_names = x_cols, id_col_name = "id")

### 1.1 Use existing SHAP Functionality

In [None]:
shap.summary_plot(shap_values, X)

### 1.2: Explore your new project in the TruEra Web App

Visit your TruEra application and explore to better understand the performance of your model.

### 1.3: Be intentional about naming

In [None]:
#add naming
explainer = shap.Explainer(model,
                            connection_string = CONNECTION_STRING,
                            token = TOKEN,
                            project = "Adult Census Example",
                            data_collection_name = "Adult Census Data Collection",
                            model_name = "XGBRegressor V1"
                            )

shap_values = explainer(X, data_split_name = "all", pre_data_col_names = x_cols, id_col_name = "id")

### 1.4: Add label data

In [None]:
data_all = pd.concat([X, pd.DataFrame(y.astype(float), columns=["label"])], axis = 1)

shap_values = explainer(data_all, data_split_name = "all_w_labels",
    pre_data_col_names = x_cols,
    id_col_name = "id",
    label_col_names=["label"])

### 1.6: Compare models against each other and across multiple splits

In [None]:
#Add multiple models
xgb_model = xgboost.XGBRegressor(max_depth = 6, min_child_weight = 2).fit(X_train, y_train)

explainer_xgb = shap.Explainer(xgb_model,
                            connection_string = CONNECTION_STRING,
                            token = TOKEN,
                            project = "Adult Census Model Comparison 2",
                            data_collection_name = "Adult Census Data Collection",
                            model_name = "XGBRegressor",
                            train_parameters = {"max_depth":6,
                                                "min_child_weight":2}
                            )

train_data = pd.concat([X_train, pd.DataFrame(y_train.astype(float), columns=["label"])], axis = 1)
test_data = pd.concat([X_test, pd.DataFrame(y_test.astype(float), columns=["label"])], axis = 1)

shap_values_xgb_train = explainer_xgb(train_data,
    data_split_name = "train",
    pre_data_col_names = list(X_train.columns),
    id_col_name = "id",
    label_col_names=["label"])
shap_values_xgb_test = explainer_xgb(test_data,
    data_split_name = "test",
    pre_data_col_names = list(X_test.columns),
    id_col_name = "id",
    label_col_names=["label"])

tree_model = DecisionTreeRegressor(max_depth=6).fit(X_train, y_train)

explainer_tree = shap.Explainer(tree_model,
                            connection_string = CONNECTION_STRING,
                            token = TOKEN,
                            project = "Adult Census Model Comparison",
                            data_collection_name = "Adult Census Data Collection",
                            model_name = "Decision Tree Regression",
                            train_parameters = {"max_depth": 6}
                            )

shap_values_tree_train = explainer_tree(train_data,
    data_split_name = "train",
    pre_data_col_names = list(X_train.columns),
    id_col_name = "id",
    label_col_names=["label"])
shap_values_tree_test = explainer_tree(test_data,
    data_split_name = "test",
    pre_data_col_names = list(X_test.columns),
    id_col_name = "id",
    label_col_names=["label"])


## 2.0: Unlock More Capabilities

### 2.1.0: Using the TruEra Explainer - Plotting

In [None]:
tru = explainer.get_truera_workspace()

tru_explainer = tru.get_explainer('all_w_labels')

tru_explainer.get_global_feature_importances()

tru_explainer.plot_isp('Age')

### 2.1.1 Using the TruEra Explainer - Find High Error Segments

In [None]:
tru_explainer.suggest_high_error_segments()

### 2.2: Get Serious with the Test Harness

Establish performance, fairness, feature importance and stability tests using the TruEra Test Harness.

1. Performance tests warn and/or fail if any of a number of metrics (accuracy, precision, AUC, etc) reaches a specified threshold.

2. Fairness tests establish criteria to compare a protected segment against the rest of the population.

3. Feature Importance tests ensure there are not too many unimportant features in the model.

4. Stability ensures that the behavior of the model is similar across two distributions.

In [None]:
#set environment to the remote project, set context in remote
tru.set_project("Adult Census Model Comparison")
tru.set_data_collection("Adult Census Data Collection")
tru.set_data_split("train")

#set up protected segment for fairness test
#tru.add_segment_group(name = "Gender", segment_definitions = dict({"Male": 'Sex == 1', 'Female': 'Sex == 0'}) )
tru.set_as_protected_segment(segment_group_name = "Gender", segment_name = "Female")

#performance test
for split_name in ["train", "test"]:
    tru.tester.add_performance_test( test_name = "Performance Test 1",
        data_split_names = [split_name],
        fail_if_greater_than = 0.3,
        metric = "RMSE",
        overwrite = True)

    #fairness test
    tru.tester.add_fairness_test( test_name = "Fairness Test 1",
    data_split_names = [split_name],
    all_protected_segments=True,
    metric = "MEAN_SCORE_DIFFERENCE",
    fail_if_outside = [-0.1,0.1], warn_if_outside = [-0.05, 0.05],
    overwrite = True)

    #feature importance test
    tru.tester.add_feature_importance_test(test_name = "FI Test 1",
                                            data_split_names = [split_name],
                                            min_importance_value= 0.02,
                                            warn_if_greater_than = 0,
                                            fail_if_greater_than = 5,
                                            overwrite = True)

#stability test
tru.tester.add_stability_test(test_name = "Stability Test 1",
                                base_data_split_name="train",
                                comparison_data_split_names=["test"],
                                metric="DIFFERENCE_OF_MEAN",
                                warn_if_greater_than = 0.1)

### 2.3: Check the Model Leaderboard

In [None]:
tru.tester.get_model_leaderboard()

### 2.4 Explore the Test Results Further

In [None]:
tru.set_model("XGBRegressor")
tru.tester.get_model_test_results()

### 2.5 Explore failed tests using your TruEra Web App (Explore in the UI)