# Fairness in healthcare utilization scoring model

<b> Reference - https://nbviewer.jupyter.org/github/IBM/AIF360/blob/master/examples/tutorial_medical_expenditure.ipynb
    </b>
  

### This tutorial demonstrates classification model learning with bias mitigation as a part of a Care Management use case using Medical Expenditure data.

The notebook demonstrates how the AIF 360 toolkit can be used to detect and reduce bias when learning classifiers using a variety of fairness metrics and algorithms . It also demonstrates how explanations can be generated for predictions made by models learnt with the toolkit using LIME.

Classifiers are built using Logistic Regression as well as Random Forests.

Bias detection is demonstrated using several metrics, including disparate impact, average odds difference, statistical parity difference, equal opportunity difference, and Theil index.

Bias alleviation is explored via a variety of methods, including reweighing (pre-processing algorithm), prejudice remover (in-processing algorithm), and disparate impact remover (pre-processing technique).

Data from the [Medical Expenditure Panel Survey](https://meps.ahrq.gov/mepsweb/) is used in this tutorial. See [Section 2](#2.-Data-used) below for more details.


## [1.](#Table-of-Contents) Use case

In order to demonstrate how AIF 360 can be used to detect and mitigate bias in classfier models, we adopt the following use case:

1. a data scientist develops a 'fair' healthcare utilization scoring model with respect to defined protected classes. Fairness may be dictated by legal or government regulations, such as a requirement that additional care decisions be not predicated on factors such as race of the patient.


2. developer takes the model AND performance characteristics / specs of the model (e.g. accuracy, fairness tests, etc. basically the model factsheet) and deploys the model in an enterprise app that prioritizes cases for care management.


3. the app is put into production and starts scoring people and making recommendations. 


4. explanations are generated for each recommendation


5. both recommendations and associated explanations are given to nurses as a part of the care management process. The nurses can evaluate the recommendations for quality and correctness and provide feedback.


6. nurse feedback as well as analysis of usage data with respect to specs of the model w.r.t accuracy and fairness is communicated to AI Ops specialist and LOB user periodically.


7. when significant drift in model specs relative to the model factsheet is observed, the model is sent back for retraining.

## [2.](#Table-of-Contents) Data used

The specific data used is the [2015 Full Year Consolidated Data File](https://meps.ahrq.gov/mepsweb/data_stats/download_data_files_detail.jsp?cboPufNumber=HC-181) as well as the [2016 Full Year Consolidated Data File](https://meps.ahrq.gov/mepsweb/data_stats/download_data_files_detail.jsp?cboPufNumber=HC-192).

The 2015 file contains data from rounds 3,4,5 of panel 19 (2014) and rounds 1,2,3 of panel 20 (2015). The 2016 file contains data from rounds 3,4,5 of panel 20 (2015) and rounds 1,2,3 of panel 21 (2016).

For this demonstration, three datasets were constructed: one from panel 19, round 5 (used for learning models), one from panel 20, round 3 (used for deployment/testing of model - steps); the other from panel 21, round 3 (used for re-training and deployment/testing of updated model).

## [3.](#Table-of-Contents) Training models on original 2015 Panel 19 data

First, load all necessary packages

In [25]:
import os 
import pandas as pd
from common_utils import compute_metrics

In [2]:
filepath = os.path.join(os.path.dirname(os.path.abspath("__file__")),
                                '..', 'data', 'raw', 'meps', 'h201.csv')

In [3]:
filepath

'/Users/vandanarao/PycharmProjects/AIF360_latest2611/examples/../data/raw/meps/h201.csv'

In [4]:
df = pd.read_csv("/Users/vandanarao/PycharmProjects/AIF360_latest2611/data/raw/meps/h201.csv", sep=',')

FileNotFoundError: [Errno 2] File b'/Users/vandanarao/PycharmProjects/AIF360_latest2611/data/raw/meps/h201.csv' does not exist: b'/Users/vandanarao/PycharmProjects/AIF360_latest2611/data/raw/meps/h201.csv'

In [45]:
for s in df.columns.values:
    if s.startswith('FTSTU'):
        print(s)

FTSTU31X
FTSTU42X
FTSTU53X
FTSTU17X


In [23]:
cols = df.columns.values

In [38]:
for col in ['FTSTU','ACTDTY','HONRDC','RTHLTH','MNHLTH','HIBPDX','CHDDX','ANGIDX','EDUCYR','HIDEG',
                     'MIDX','OHRTDX','STRKDX','EMPHDX','CHBRON','CHOLDX','CANCERDX','DIABDX',
                     'JTPAIN','ARTHDX','ARTHTYPE','ASTHDX','ADHDADDX','PREGNT','SOCLIM','COGLIM','DFHEAR42','DFSEE42','ADSMOK42',
                     'PHQ242']:
    if col not in cols:
        print(col)
#         print(col in cols)

FTSTU
ACTDTY
HONRDC
RTHLTH
MNHLTH
CHBRON
JTPAIN
PREGNT
SOCLIM
COGLIM


In [14]:
default_mappings = {
    'label_maps': [{1.0: '>= 10 Visits', 0.0: '< 10 Visits'}],
    'protected_attribute_maps': [{1.0: 'White', 0.0: 'Non-White'}]
}

In [32]:
def default_preprocessing(df):
    """
    1.Create a new column, RACE that is 'White' if RACEV2X = 1 and HISPANX = 2 i.e. non Hispanic White
      and 'Non-White' otherwise
    2. Restrict to Panel 21
    3. RENAME all columns that are PANEL/ROUND SPECIFIC
    4. Drop rows based on certain values of individual features that correspond to missing/unknown - generally < -1
    5. Compute UTILIZATION, binarize it to 0 (< 10) and 1 (>= 10)
    """
    def race(row):
        if ((row['HISPANX'] == 2) and (row['RACEV2X'] == 1)):  #non-Hispanic Whites are marked as WHITE; all others as NON-WHITE
            return 'White'
        return 'Non-White'

    df['RACEV2X'] = df.apply(lambda row: race(row), axis=1)
    df = df.rename(columns = {'RACEV2X' : 'RACE'})

    df = df[df['PANEL'] == 21]

    # RENAME COLUMNS
    df = df.rename(columns = {'FTSTU53X' : 'FTSTU', 'ACTDTY53' : 'ACTDTY', 'HONRDC53' : 'HONRDC', 'RTHLTH53' : 'RTHLTH',
                              'MNHLTH53' : 'MNHLTH', 'CHBRON53' : 'CHBRON', 'JTPAIN53' : 'JTPAIN', 'PREGNT53' : 'PREGNT',
                              'WLKLIM53' : 'WLKLIM', 'ACTLIM53' : 'ACTLIM', 'SOCLIM53' : 'SOCLIM', 'COGLIM53' : 'COGLIM',
                              'EMPST53' : 'EMPST', 'REGION53' : 'REGION', 'MARRY53X' : 'MARRY', 'AGE53X' : 'AGE',
                              'POVCAT16' : 'POVCAT', 'INSCOV16' : 'INSCOV'})

    df = df[df['REGION'] >= 0] # remove values -1
    df = df[df['AGE'] >= 0] # remove values -1

    df = df[df['MARRY'] >= 0] # remove values -1, -7, -8, -9

    df = df[df['ASTHDX'] >= 0] # remove values -1, -7, -8, -9

    df = df[(df[['FTSTU','ACTDTY','HONRDC','RTHLTH','MNHLTH','HIBPDX','CHDDX','ANGIDX','EDUCYR','HIDEG',
                     'MIDX','OHRTDX','STRKDX','EMPHDX','CHBRON','CHOLDX','CANCERDX','DIABDX',
                     'JTPAIN','ARTHDX','ARTHTYPE','ASTHDX','ADHDADDX','PREGNT','SOCLIM','COGLIM','DFHEAR42','DFSEE42','ADSMOK42',
                     'PHQ242']] >= -1).all(1)]  #for all other categorical features, remove values < -1

    def utilization(row):
        return row['OBTOTV16'] + row['OPTOTV16'] + row['ERTOT16'] + row['IPNGTD16'] + row['HHTOTD16']

    df['TOTEXP16'] = df.apply(lambda row: utilization(row), axis=1)
    lessE = df['TOTEXP16'] < 10.0
    df.loc[lessE,'TOTEXP16'] = 0.0
    moreE = df['TOTEXP16'] >= 10.0
    df.loc[moreE,'TOTEXP16'] = 1.0

    df = df.rename(columns = {'TOTEXP16' : 'UTILIZATION'})
    return df

In [28]:
label_name='UTILIZATION'
favorable_classes=[1.0]
protected_attribute_names=['RACE']
privileged_classes=[['White']]
instance_weights_name='PERWT16F'
categorical_features=['REGION','SEX','MARRY', 'FTSTU','ACTDTY','HONRDC','RTHLTH','MNHLTH','HIBPDX','CHDDX','ANGIDX',
                      'MIDX','OHRTDX','STRKDX','EMPHDX','CHBRON','CHOLDX','CANCERDX','DIABDX',
                      'JTPAIN','ARTHDX','ARTHTYPE','ASTHDX','ADHDADDX','PREGNT','WLKLIM',
                      'ACTLIM','SOCLIM','COGLIM','DFHEAR42','DFSEE42', 'ADSMOK42', 'PHQ242',
                      'EMPST','POVCAT','INSCOV']
features_to_keep=['REGION','AGE','SEX','RACE','MARRY',
                     'FTSTU','ACTDTY','HONRDC','RTHLTH','MNHLTH','HIBPDX','CHDDX','ANGIDX',
                     'MIDX','OHRTDX','STRKDX','EMPHDX','CHOLDX','CANCERDX','DIABDX',
                     'ARTHDX','ARTHTYPE','ASTHDX','ADHDADDX','DFHEAR42','DFSEE42','ADSMOK42', 'PCS42',
                     'MCS42','K6SUM42','PHQ242','EMPST','UTILIZATION', 'PERWT16F']
features_to_drop=[]
na_values=[]
custom_preprocessing=default_preprocessing
metadata=default_mappings

In [29]:
filepath="/Users/dhanush/Documents/USC/Fall-2019/INF599/projects/AIF360/aif360/data/raw/meps/h201.csv"

In [19]:
df = pd.read_csv(filepath, sep=',', na_values=na_values)

In [20]:
from aif360.datasets import StandardDataset

In [33]:
x = StandardDataset(df=df, label_name=label_name,
            favorable_classes=favorable_classes,
            protected_attribute_names=protected_attribute_names,
            privileged_classes=privileged_classes,
            instance_weights_name=instance_weights_name,
            categorical_features=categorical_features,
            features_to_keep=features_to_keep,
            features_to_drop=features_to_drop, na_values=na_values,
            custom_preprocessing=custom_preprocessing, metadata=metadata)

KeyError: "['WLKLIM', 'ACTLIM', 'COGLIM', 'JTPAIN', 'POVCAT', 'CHBRON', 'INSCOV', 'SOCLIM', 'PREGNT'] not in index"

In [1]:
import sys
sys.path.insert(0, '../')

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from IPython.display import Markdown, display

# Datasets
from aif360.datasets import MEPSDataset19
from aif360.datasets import MEPSDataset20
from aif360.datasets import MEPSDataset21

# Fairness metrics
from aif360.metrics import BinaryLabelDatasetMetric
from aif360.metrics import ClassificationMetric

# Explainers
from aif360.explainers import MetricTextExplainer

# Scalers
from sklearn.preprocessing import StandardScaler

# Classifiers
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline

# Bias mitigation techniques
from aif360.algorithms.preprocessing import Reweighing

np.random.seed(1)

In [2]:
from sklearn.preprocessing import StandardScaler, MaxAbsScaler
import tensorflow as tf
from aif360.algorithms.inprocessing.adversarial_debiasing_v2 import AdversarialDebiasing

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


### 3.1. Load data & create splits for learning/validating/testing model

Get the dataset and split into train (70%), test (30%)

In [3]:
(dataset_orig_panel19_train,
 dataset_orig_panel19_vt) = MEPSDataset19().split([0.6], shuffle=True)


(dataset_orig_panel19_valid,
 dataset_orig_panel19_test) = dataset_orig_panel19_vt.split([0.5], shuffle=True)

sens_ind = 0
sens_attr = dataset_orig_panel19_train.protected_attribute_names[sens_ind]

unprivileged_groups = [{sens_attr: v} for v in
                       dataset_orig_panel19_train.unprivileged_protected_attributes[sens_ind]]
privileged_groups = [{sens_attr: v} for v in
                     dataset_orig_panel19_train.privileged_protected_attributes[sens_ind]]

  priv = np.logical_or.reduce(np.equal.outer(vals, df[attr]))


This function will be used throughout the notebook to print out some labels, names, etc.

In [4]:
def describe(train=None, val=None, test=None):
    if train is not None:
        display(Markdown("#### Training Dataset shape"))
        print(train.features.shape)
    if val is not None:
        display(Markdown("#### Validation Dataset shape"))
        print(val.features.shape)
    display(Markdown("#### Test Dataset shape"))
    print(test.features.shape)
    display(Markdown("#### Favorable and unfavorable labels"))
    print(test.favorable_label, test.unfavorable_label)
    display(Markdown("#### Protected attribute names"))
    print(test.protected_attribute_names)
    display(Markdown("#### Privileged and unprivileged protected attribute values"))
    print(test.privileged_protected_attributes, 
          test.unprivileged_protected_attributes)
    display(Markdown("#### Dataset feature names"))
    print(test.feature_names)

Show 2015 dataset details

In [5]:
describe(dataset_orig_panel19_train, None, dataset_orig_panel19_test)

#### Training Dataset shape

(9498, 138)


#### Test Dataset shape

(3166, 138)


#### Favorable and unfavorable labels

1.0 0.0


#### Protected attribute names

['RACE']


#### Privileged and unprivileged protected attribute values

[array([1.])] [array([0.])]


#### Dataset feature names

['AGE', 'RACE', 'PCS42', 'MCS42', 'K6SUM42', 'REGION=1', 'REGION=2', 'REGION=3', 'REGION=4', 'SEX=1', 'SEX=2', 'MARRY=1', 'MARRY=2', 'MARRY=3', 'MARRY=4', 'MARRY=5', 'MARRY=6', 'MARRY=7', 'MARRY=8', 'MARRY=9', 'MARRY=10', 'FTSTU=-1', 'FTSTU=1', 'FTSTU=2', 'FTSTU=3', 'ACTDTY=1', 'ACTDTY=2', 'ACTDTY=3', 'ACTDTY=4', 'HONRDC=1', 'HONRDC=2', 'HONRDC=3', 'HONRDC=4', 'RTHLTH=-1', 'RTHLTH=1', 'RTHLTH=2', 'RTHLTH=3', 'RTHLTH=4', 'RTHLTH=5', 'MNHLTH=-1', 'MNHLTH=1', 'MNHLTH=2', 'MNHLTH=3', 'MNHLTH=4', 'MNHLTH=5', 'HIBPDX=-1', 'HIBPDX=1', 'HIBPDX=2', 'CHDDX=-1', 'CHDDX=1', 'CHDDX=2', 'ANGIDX=-1', 'ANGIDX=1', 'ANGIDX=2', 'MIDX=-1', 'MIDX=1', 'MIDX=2', 'OHRTDX=-1', 'OHRTDX=1', 'OHRTDX=2', 'STRKDX=-1', 'STRKDX=1', 'STRKDX=2', 'EMPHDX=-1', 'EMPHDX=1', 'EMPHDX=2', 'CHBRON=-1', 'CHBRON=1', 'CHBRON=2', 'CHOLDX=-1', 'CHOLDX=1', 'CHOLDX=2', 'CANCERDX=-1', 'CANCERDX=1', 'CANCERDX=2', 'DIABDX=-1', 'DIABDX=1', 'DIABDX=2', 'JTPAIN=-1', 'JTPAIN=1', 'JTPAIN=2', 'ARTHDX=-1', 'ARTHDX=1', 'ARTHDX=2', 'ARTHTYPE=-1'

Metrics for original data

In [6]:
metric_orig_panel19_train = BinaryLabelDatasetMetric(
        dataset_orig_panel19_train,
        unprivileged_groups=unprivileged_groups,
        privileged_groups=privileged_groups)
explainer_orig_panel19_train = MetricTextExplainer(metric_orig_panel19_train)

print(explainer_orig_panel19_train.disparate_impact())

Disparate impact (probability of favorable outcome for unprivileged instances / probability of favorable outcome for privileged instances): 0.49531639033990477


# Adverserial Debiasing 

In [7]:
# Metric for the original dataset
metric_orig_train = BinaryLabelDatasetMetric(dataset_orig_panel19_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
display(Markdown("#### Original training dataset"))
print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train.mean_difference())
metric_orig_test = BinaryLabelDatasetMetric(dataset_orig_panel19_test, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_test.mean_difference())

#### Original training dataset

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.133351
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.155040


In [8]:
min_max_scaler = MaxAbsScaler()
dataset_orig_panel19_train.features = min_max_scaler.fit_transform(dataset_orig_panel19_train.features)
dataset_orig_panel19_test.features = min_max_scaler.transform(dataset_orig_panel19_test.features)
metric_scaled_train = BinaryLabelDatasetMetric(dataset_orig_panel19_train, 
                             unprivileged_groups=unprivileged_groups,
                             privileged_groups=privileged_groups)
display(Markdown("#### Scaled dataset - Verify that the scaling does not affect the group label statistics"))
print("Train set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_scaled_train.mean_difference())
metric_scaled_test = BinaryLabelDatasetMetric(dataset_orig_panel19_test, 
                             unprivileged_groups=unprivileged_groups,
                             privileged_groups=privileged_groups)
print("Test set: Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_scaled_test.mean_difference())


#### Scaled dataset - Verify that the scaling does not affect the group label statistics

Train set: Difference in mean outcomes between unprivileged and privileged groups = -0.133351
Test set: Difference in mean outcomes between unprivileged and privileged groups = -0.155040


In [9]:
tf.compat.v1.reset_default_graph
sess = tf.compat.v1.Session()
# Learn parameters with debias set to True
debiased_model = AdversarialDebiasing(privileged_groups = privileged_groups,
                          unprivileged_groups = unprivileged_groups,
                          scope_name='debiased_classifier',
                          debias=True,
                          sess=sess)

In [10]:
debiased_model.fit(dataset_orig_panel19_train)




The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.




epoch 0; iter: 0; batch classifier loss: 0.216574; batch adversarial loss: 0.545828
epoch 1; iter: 0; batch classifier loss: 0.655460; batch adversarial loss: 0.436933
epoch 2; iter: 0; batch classifier loss: 0.738545; batch adversarial loss: 0.520480
epoch 3; iter: 0; batch classifier loss: 0.584028; batch adversarial loss: 0.509030
epoch 4; iter: 0; batch classifier loss: 0.911982; batch adversarial loss: 0.509030
epoch 5; iter: 0; batch classifier loss: 2.191361; batch adversarial loss: 0.530691
epoch 6; i

<aif360.algorithms.inprocessing.adversarial_debiasing_v2.AdversarialDebiasing at 0x1a40652d90>

In [11]:
debiased_model.pred_labels

<tf.Tensor 'debiased_classifier/classifier_model/Softmax:0' shape=(?, 2) dtype=float32>

In [12]:
pos_ind = np.where(np.asarray([0., 1.])== dataset_orig_panel19_train.favorable_label)[0][0]
pos_ind

1

In [None]:
# Apply the plain model to test data
#dataset_nodebiasing_train = plain_model.predict(dataset_orig_train)
#dataset_nodebiasing_test = plain_model.predict(dataset_orig_test)
#dataset_valid_pred = plain_model.predict(dataset_orig_valid)

In [36]:
dataset_debiasing_test = debiased_model.predict(dataset_orig_panel19_test)

In [13]:
dataset_debiasing_valid = debiased_model.predict(dataset_orig_panel19_valid)

In [14]:
dataset_debiasing_valid.probs

array([[6.87913655e-08],
       [1.00000000e+00],
       [7.48751132e-16],
       ...,
       [1.00000000e+00],
       [1.00000000e+00],
       [1.00000000e+00]])

In [37]:
dataset_debiasing_test.probs

array([[0.0011514 ],
       [0.00255091],
       [0.19314738],
       ...,
       [0.00053161],
       [0.00595715],
       [0.10970578]])

In [15]:
dataset_debiasing_valid.labels

array([[0.99999988],
       [0.        ],
       [1.        ],
       ...,
       [0.        ],
       [0.        ],
       [0.        ]])

In [38]:
dataset_debiasing_test.labels

array([[0.99884856],
       [0.9974491 ],
       [0.80685264],
       ...,
       [0.99946839],
       [0.99404281],
       [0.89029425]])

In [29]:
scale_orig = StandardScaler()
X_train = scale_orig.fit_transform(dataset_orig_panel19_train.features)
y_train = dataset_orig_panel19_train.labels.ravel()


dataset_orig_panel19_valid_pred = dataset_orig_panel19_valid.copy(deepcopy=True)
X_valid = scale_orig.transform(dataset_orig_panel19_valid_pred.features)
y_valid = dataset_orig_panel19_valid_pred.labels
#dataset_orig_test_pred.scores = dataset_nodebiasing_test.labels.reshape(-1,1)
#dataset_orig_valid_pred.scores = plain_model.predict((X_valid)[:,pos_ind].reshape(-1,1))
dataset_orig_panel19_valid_pred.scores = dataset_debiasing_valid.probs.reshape(-1,1)
dataset_orig_panel19_valid_pred.scores

array([[6.87913655e-08],
       [1.00000000e+00],
       [7.48751132e-16],
       ...,
       [1.00000000e+00],
       [1.00000000e+00],
       [1.00000000e+00]])

In [39]:
scale_orig = StandardScaler()
X_train = scale_orig.fit_transform(dataset_orig_panel19_train.features)
y_train = dataset_orig_panel19_train.labels.ravel()


dataset_orig_panel19_test_pred = dataset_orig_panel19_test.copy(deepcopy=True)
X_valid = scale_orig.transform(dataset_orig_panel19_test_pred.features)
y_valid = dataset_orig_panel19_test_pred.labels
#dataset_orig_test_pred.scores = dataset_nodebiasing_test.labels.reshape(-1,1)
#dataset_orig_valid_pred.scores = plain_model.predict((X_valid)[:,pos_ind].reshape(-1,1))
dataset_orig_panel19_test_pred.scores = dataset_debiasing_test.probs.reshape(-1,1)
dataset_orig_panel19_test_pred.scores

array([[0.0011514 ],
       [0.00255091],
       [0.19314738],
       ...,
       [0.00053161],
       [0.00595715],
       [0.10970578]])

In [40]:
cost_constraint = "fnr" # "fnr", "fpr", "weighted"
#random seed for calibrated equal odds prediction
randseed = 12345679 

In [32]:
# Odds equalizing post-processing algorithm
from aif360.algorithms.postprocessing.calibrated_eq_odds_postprocessing import CalibratedEqOddsPostprocessing
# from tqdm import tqdm

# Learn parameters to equalize odds and apply to create a new dataset
cpp = CalibratedEqOddsPostprocessing(privileged_groups = privileged_groups,
                                     unprivileged_groups = unprivileged_groups,
                                     cost_constraint=cost_constraint,
                                     seed=randseed)
cpp = cpp.fit(dataset_orig_panel19_valid, dataset_orig_panel19_valid_pred)

In [41]:
# Odds equalizing post-processing algorithm
from aif360.algorithms.postprocessing.calibrated_eq_odds_postprocessing import CalibratedEqOddsPostprocessing
# from tqdm import tqdm

# Learn parameters to equalize odds and apply to create a new dataset
cpp = CalibratedEqOddsPostprocessing(privileged_groups = privileged_groups,
                                     unprivileged_groups = unprivileged_groups,
                                     cost_constraint=cost_constraint,
                                     seed=randseed)
cpp = cpp.fit(dataset_orig_panel19_test, dataset_orig_panel19_test_pred)

In [42]:
#dataset_transf_valid_pred = cpp.predict(dataset_orig_panel19_valid_pred)
dataset_transf_test_pred = cpp.predict(dataset_orig_panel19_test)

In [34]:
cm_transf_valid = ClassificationMetric(dataset_orig_panel19_valid, dataset_transf_valid_pred,
                             unprivileged_groups=unprivileged_groups,
                             privileged_groups=privileged_groups)
display(Markdown("#### Original-Transformed validation dataset"))
print("Difference in GFPR between unprivileged and privileged groups")
print(cm_transf_valid.difference(cm_transf_valid.generalized_false_positive_rate))
print("Difference in GFNR between unprivileged and privileged groups")
print(cm_transf_valid.difference(cm_transf_valid.generalized_false_negative_rate))
print("Test set: Classification accuracy = %f" % cm_transf_valid.accuracy())


#### Original-Transformed validation dataset

Difference in GFPR between unprivileged and privileged groups
-0.05674665158367431
Difference in GFNR between unprivileged and privileged groups
-0.00035573091750576435
Test set: Classification accuracy = 0.497713


In [43]:
cm_transf_test = ClassificationMetric(dataset_orig_panel19_test, dataset_transf_test_pred,
                             unprivileged_groups=unprivileged_groups,
                             privileged_groups=privileged_groups)
print("Test set: Classification accuracy = %f" % cm_transf_test.accuracy())
display(Markdown("#### Original-Transformed testing dataset"))
print("Difference in GFPR between unprivileged and privileged groups")
print(cm_transf_test.difference(cm_transf_test.generalized_false_positive_rate))
print("Difference in GFNR between unprivileged and privileged groups")
print(cm_transf_test.difference(cm_transf_test.generalized_false_negative_rate))

Test set: Classification accuracy = 0.995532


#### Original-Transformed testing dataset

Difference in GFPR between unprivileged and privileged groups
0.016652374763453805
Difference in GFNR between unprivileged and privileged groups
0.07274757736886617


### Results on Validation set

In [35]:
metric_valid_bef = compute_metrics(dataset_orig_panel19_valid, dataset_transf_valid_pred, 
                unprivileged_groups, privileged_groups)

Balanced accuracy = 0.6032
Statistical parity difference = -0.0355
Disparate impact = 0.9445
Average odds difference = 0.0060
Equal opportunity difference = 0.0325
Theil index = 0.0971


### Results on Test set

In [44]:
metric_valid_bef = compute_metrics(dataset_orig_panel19_test, dataset_transf_test_pred, 
                unprivileged_groups, privileged_groups)

Balanced accuracy = 0.9903
Statistical parity difference = -0.1665
Disparate impact = 0.4280
Average odds difference = -0.0421
Equal opportunity difference = -0.0842
Theil index = 0.0079
