# Guided Exercise: Fairness

### Goals 🎯
1. Train and ingest a credit-worthiness model
2. Create a fairness test to evaluate its impact ratio.
3. View the results of the fairness test.

### First, set the credentials for your TruEra deployment.

If you don't have credentials yet, get them by signing up for the free private beta: https://go.truera.com/diagnostics-free

In [None]:
#connection details
CONNECTION_STRING = ""
AUTH_TOKEN = ""

### Install required packages

In [None]:
! pip install --upgrade shap
! pip install --upgrade truera

### From here, you can run the rest of the notebook to follow the analysis.

In [None]:
import pandas as pd
import xgboost as xgb
from sklearn import preprocessing
import sklearn.metrics
from sklearn.utils import resample
import logging

from truera.client.truera_workspace import TrueraWorkspace
from truera.client.truera_authentication import TokenAuthentication

auth = TokenAuthentication(AUTH_TOKEN)
tru = TrueraWorkspace(CONNECTION_STRING, auth, ignore_version_mismatch=True, log_level=logging.ERROR)

# set our environmetn to local compute so we can compute predictions and feature influences on our local machine
tru.set_environment("local")
# note: we'll periodically toggle between local and remote so we can interact with our remote deployment as well.

In [None]:
from smart_open import open

data_s3_file_name = "https://truera-examples.s3.us-west-2.amazonaws.com/data/starter-fairness/starter-fairness-data.pickle"
with open(data_s3_file_name, 'rb') as f:
   data = pd.read_pickle(f)

feature_map_s3_file_name = "https://truera-examples.s3.us-west-2.amazonaws.com/data/starter-fairness/starter-fairness-feature-map.pickle"
with open(feature_map_s3_file_name, 'rb') as f:
   feature_map = pd.read_pickle(f)

In [None]:
# create the first project and data collection
tru.set_environment("local")
project_name = "Starter Example - Fairness"
tru.add_project(project_name, score_type = "probits")

tru.add_data_collection("Data Collection v1",
                        pre_to_post_feature_map = feature_map,
                        provide_transform_with_model = False)

# add data splits to the collection we just created
year_begin = 2014
year_end = 2016 # exclusive
states = ['CA', 'NY']
first_iteration = True

for year in range(year_begin, year_end):
    for state in states:
        tru.add_data_split(f'{year}-{state}', pre_data = data[year][state]['data_preprocessed'],
                            post_data = data[year][state]['data_postprocessed'],
                            label_data = data[year][state]['label'],
                            extra_data_df = data[year][state]['extra_data'],
                            split_type = "all")
                            
        if first_iteration:
            tru.add_segment_group("Sex", {"Male": "Sex == 'Male'", "Female": "Sex == 'Female'"})
            tru.add_segment_group("Language at home", {"English": "LANX == 1", "Not English": "LANX == 2"})
            first_iteration = False

In [None]:

# Train xgboost
models = {}
model_name_v1 = "model_1"

models[model_name_v1] = xgb.XGBClassifier(eta = 0.2, max_depth = 4)

models[model_name_v1].fit(data[2014]['NY']['data_postprocessed'],
                        data[2014]['NY']['label'])

train_params = {"model_type":"xgb.XGBClassifier", "eta":0.2, "max_depth":4}

train_params['model_type'] = str(type(models[model_name_v1]))

# register the model
tru.add_python_model(model_name_v1,
                        models[model_name_v1],
                        train_split_name='2014-NY',
                        train_parameters = train_params
                        )

tru.upload_project()



In [None]:
# set up protected segment for fairness test
tru.set_environment("remote")
tru.set_project(project_name)
tru.set_data_collection("Data Collection v1")
tru.set_data_split("2014-NY")
tru.set_as_protected_segment(segment_group_name = "Sex", segment_name = "Female")

# fairness test
tru.tester.add_fairness_test(test_name = "Impact Ratio Test",
    data_split_name_regex = ".",
    all_data_collections=True,
    all_protected_segments=True,
    metric = "DISPARATE_IMPACT_RATIO",
    fail_if_outside = [0.8, 1.25])

In [None]:
tru.set_model(model_name_v1)
tru.tester.get_model_test_results(test_types = ["fairness"])

0,1,2,3,4,5,6,7
,Name,Split,Protected Segment,Comparison Segment,Metric,Score,Navigate
❌,Impact Ratio Test,2014-CA,Sex--Female: Sex = 'Female',REST OF POPULATION,DISPARATE_IMPACT_RATIO,0.6358,Explore in UI
❌,Impact Ratio Test,2014-NY,Sex--Female: Sex = 'Female',REST OF POPULATION,DISPARATE_IMPACT_RATIO,0.6227,Explore in UI
❌,Impact Ratio Test,2015-CA,Sex--Female: Sex = 'Female',REST OF POPULATION,DISPARATE_IMPACT_RATIO,0.6713,Explore in UI
❌,Impact Ratio Test,2015-NY,Sex--Female: Sex = 'Female',REST OF POPULATION,DISPARATE_IMPACT_RATIO,0.6453,Explore in UI


* What? Shown in the model test results, the first version of the test fails the Impact Ratio Test.

### From here, navigate to the TruEra Web App for analysis or continue on to Part 2!    [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1wSCmWMeWlFPdLSYP4RnSvhsEh9lONHLQ)