# Veritas Fairness Assement - Life Insurance Underwriting Study (sample code)



This notebook includes samples of code used in the analysis conducted during the life insurance underwriting case study.

It is applicable to insurance underwriting datasets including a life insurance dataset available on
[kaggle](https://www.kaggle.com/c/prudential-life-insurance-assessment/data)

## License

Written by Sankarshan Mridha (Swiss Re) and Laura Alvarez (Accenture) as an extension to Phase 1 Credit Scoring Use Case code https://github.com/veritas-project/phase1/tree/main/credit_scoring 

Contact email: Veritas@mas.gov.sg


Copyright © 2021 Monetary Authority of Singapore

Licensed under the Apache License, Version 2.0 (the "License"); you may not use
this file except in compliance with the License. You may obtain a copy of the
License at http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed
under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the Licens

## Imports

In [None]:
# Core Packages
import os
import sys

import pandas as pd
import numpy as np
import sklearn
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.linear_model import LogisticRegression 
from sklearn.compose import ColumnTransformer
from sklearn.compose import make_column_selector as selector
from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.metrics import roc_auc_score, roc_curve
from sklearn.inspection import permutation_importance
import phik
from phik import resources, report
from phik.report import plot_correlation_matrix
import joblib
import seaborn as sns

SEED = 123

In [None]:
# Our code (autoreload)
%load_ext autoreload
%autoreload 2
sys.path.append("../utils")
import utility as utils

In [None]:
# High-res plots
%config InlineBackend.figure_format = 'retina'

In [None]:
import warnings
warnings.filterwarnings('ignore') 

## Load Data

Please modify the following cell to update dataset file path 

In [None]:
all_data = pd.read_csv('../dataset.csv')

## Feature Engineering and Pipeline

In [None]:
all_data['BMI_Age'] = all_data['BMI'] * all_data['Ins_Age']

med_keyword_columns = all_data.columns[all_data.columns.str.startswith('Medical_Keyword_')]
all_data['Med_Keywords_Count'] = all_data[med_keyword_columns].sum(axis=1)

mapper = {
    'Id': 'Insured ID',
    'InsuredInfo_6': 'Gender',
    'InsuredInfo_1': 'Race',
    'InsuredInfo_4': 'Nationality',
    'Family_Hist_1': 'Marital Status',
    'InsuredInfo_3': 'Occupation Type',
    'Employment_Info_2': 'Occupation Industry',
    'Wt': 'Weight',
    'Ht': 'Height',
    'Medical_History_4': 'Smoker Status',
    'Ins_Age': 'Age at Policy Inception',
    'Insurance_History_3': 'No. of Life Policies',
    'Insurance_History_2': 'No. of Accident Policies',
    'Insurance_History_7': 'No. of CI Policies',
    'Product_Info_3': 'Duration in force for Medical Plan'
}

all_data.rename(mapper=mapper, axis=1, inplace=True)
# Drop columns we do not have confidence in mapping to
drop_columns = ('Medical', 'Family', 'Insurance', 'Product', 'Employment', 'Insurance', 'InsuredInfo')
mask = all_data.columns.str.startswith(drop_columns)
all_data = all_data.iloc[:,~mask]
all_data.head()

### Dtypes

In [None]:
all_data = all_data.astype({"Occupation Industry": object, "Occupation Type": object, "Smoker Status": object, "Gender": object, \
             "Nationality": object, "Marital Status":object, "Race":object})

### Create binary labels

In [None]:
# create labels
# 0: {1,2}
# 1: {7,8}
# -1: the rest
all_data['Risk'] = pd.cut(all_data.Response, bins=[0,2,6,8], labels=[0,-1,1])
all_data = all_data.astype({"Risk": int})
all_data.Risk.value_counts()

In [None]:
# Remove Response = -1
df = all_data.loc[all_data['Risk']!= -1].reset_index(drop=True)

### Train/test splits

In [None]:
# prepare train & test datasets
columns_to_drop = ['Insured ID','Response','Risk']  
X = df.drop(columns=columns_to_drop)
y = df['Risk']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=SEED)
print(f"X_train.shape: {X_train.shape}, X_test.shape: {X_test.shape}")
print(f"y_train.shape: {y_train.shape}, y_test.shape: {y_test.shape}")

### Create masks for Fairness Analysis

In [None]:
# Create a gender identifying mask
gender_mask = (X_test["Gender"] == 1)  # assuming 1: Male, 2: Female 
print('Percent Male:', round(np.mean(gender_mask), 5), 'Percent Female:', round(np.mean(~gender_mask), 5))

In [None]:
# Create a race identifying mask
race_mask = (X_test["Race"] == 1)  # assuming 1: Majority, 2: Other 
print('Percent Major:', round(np.mean(race_mask), 5), 'Percent Other:', round(np.mean(~race_mask), 5))

### Pre-processing

In [None]:
numeric_transformer = StandardScaler()
categorical_transformer = OneHotEncoder(handle_unknown='ignore')
preprocessor = ColumnTransformer(transformers=[
    ('num', numeric_transformer, selector(dtype_exclude=["object", "category"])),
    ('cat', categorical_transformer, selector(dtype_include=["object", "category"]))
], remainder='passthrough')

X_train_transformed = preprocessor.fit_transform(X_train)
X_test_transformed = preprocessor.transform(X_test)
print(f"X_train_transformed.shape: {X_train_transformed.shape}, X_test_transformed.shape: {X_test_transformed.shape}")

print(f"Class distribution: {np.unique(y_train, return_counts=True)}")

### Get features names from transformer

In [None]:
features_preprocessor_c=list(preprocessor.named_transformers_['cat'].get_feature_names())
features_preprocessor_n=preprocessor.transformers_[0][2]
features_preprocessor_all=features_preprocessor_n+features_preprocessor_c

## Load Model

In [None]:
# load model containing personal attributes
model_all_v = joblib.load('model/model_all_variables_lr.pkl')

In [None]:
# predict probabilites
y_prob = model_all_v.predict_proba(X_test_transformed)[:,1]


# compute AUC
print(roc_auc_score(y_test, y_prob))

In [None]:
# compute classification metrics by 0.5 cutoff
y_pred = np.where(y_prob > 0.5, 1, 0)
print(classification_report(y_test, y_pred))

In [None]:
# compute ROC curve
fpr, tpr, th = roc_curve(y_test, y_prob)

In [None]:
# find optimal cutoff by max balanced accuracy
ba = (tpr + (1 - fpr))/2
best_ba = np.max(ba)
best_th = th[np.argmax(ba)]
best_th

In [None]:
# compute classification metrics by optimal cutoff
y_pred_ba = np.where(y_prob > best_th, 1, 0)
print(classification_report(y_test, y_pred_ba))

## Fairness
Here we compute some fairness metrics with respect to gender.

####  Code corresponding to section 2.7.4 Part C – Measuring Disadvantage in Veritas Document 4 FEAT Principles Assessment Case Studies

In [None]:
X_test.head()

In [None]:
# Run fairness analysis
gender_analysis = utils.FairnessAnalysis(y_test.astype(int), y_prob, gender_mask)
gender_metrics = gender_analysis.compute(best_th)
for attr, name in utils.FairnessAnalysis.metric_names.items():
    print(name, ":", round(getattr(gender_metrics, attr), 3))

In [None]:
# Bootstrap Uncertainty
bs_metrics = []
np.random.seed(0)
for i in range(25):
    idx = np.random.choice(len(y_test), len(y_test), replace=True)
    tmp = utils.FairnessAnalysis(y_test.astype(int).values[idx], y_prob[idx], gender_mask.values[idx])
    tmp2 = tmp.compute(best_th)
    bs_metrics.append(tmp2)

bs_metrics = np.array(bs_metrics)

In [None]:
for i, attr in enumerate(gender_metrics._fields):
    print(utils.FairnessAnalysis.metric_names[attr], ":", 
          utils.format_uncertainty(bs_metrics[:, i].mean(), 2 * bs_metrics[:, i].std()))

## Personal Attributes
Here we consider how we might justify the inclusion of personal attributes

####  Code corresponding to section 2.7.2.2 Part D: Justify the Use of Personal Attributes in Veritas Document 4 FEAT Principles Assessment Case Studies

In [None]:
personal_attrs = ['Gender', 'Race', 'Nationality', 'Marital Status']

In [None]:
numeric_transformer = StandardScaler()
categorical_transformer = OneHotEncoder(handle_unknown='ignore')
preprocessor = ColumnTransformer(transformers=[
    ('num', numeric_transformer, selector(dtype_exclude=["object", "category"])),
    ('cat', categorical_transformer, selector(dtype_include=["object", "category"]))
], remainder='passthrough')

### Gender

In [None]:
# Leave one out analysis
loo_metrics_gender = []
model_loo = LogisticRegression(max_iter=150, random_state=SEED)
for i, attr in enumerate(personal_attrs):
    print('\nTraining model without:', attr)
    X_train_transformed_loo = preprocessor.fit_transform(X_train.drop([attr], axis=1))
    X_test_transformed_loo = preprocessor.transform(X_test.drop([attr], axis=1))

    model_loo.fit(X_train_transformed_loo, y_train)
    
    # Predict and compute fairness Metrics
    loo_test_probs = model_loo.predict_proba(X_test_transformed_loo)[:,1]
    loo_analysis = utils.FairnessAnalysis(y_test.astype(int).values, loo_test_probs, gender_mask)
    loo_metrics_gender.append(loo_analysis.compute(best_th))
    
    # Display results as they arrive
    for field, name in utils.FairnessAnalysis.metric_names.items():
        print(name, ":", round(getattr(loo_metrics_gender[i], field), 5))

In [None]:
# Compute difference (removed - included)
bal_acc_deltas = [loo.bal_acc - gender_metrics.bal_acc for loo in loo_metrics_gender]
fnr_par_deltas = [loo.fnr_parity - gender_metrics.fnr_parity for loo in loo_metrics_gender]
fnr_rat_deltas = [loo.fnr_ratio - gender_metrics.fnr_ratio for loo in loo_metrics_gender]
fpr_rat_deltas = [loo.fpr_ratio - gender_metrics.fpr_ratio for loo in loo_metrics_gender]
fpr_par_deltas = [loo.fpr_parity - gender_metrics.fpr_parity for loo in loo_metrics_gender]
equal_opp_deltas = [loo.equal_opp - gender_metrics.equal_opp for loo in loo_metrics_gender]

fnr_loo = [loo.fnr_parity for loo in loo_metrics_gender]
fnr_rat_loo = [loo.fnr_ratio for loo in loo_metrics_gender]
fpr_rat_loo = [loo.fpr_ratio for loo in loo_metrics_gender]

In [None]:
# Plot
plt.figure(figsize=(8,6))
plt.title('Impact of Personal Attributes', fontsize=18)
x = np.arange(len(personal_attrs))  # the label locations
width = 0.35  # the width of the bars
rects1 = plt.bar(x - width/2, bal_acc_deltas, width, label='Balanced Acc.')
rects2 = plt.bar(x + width/2, fnr_par_deltas, width, label='FNR Parity (gender)')
plt.axhline(0, c='k', ls='-', lw='1')
plt.axhline(-gender_metrics.fnr_parity, c='k', ls=':', label='Neutral FNR Parity') # show neutrality
plt.xticks(x, personal_attrs, rotation=45, ha='right', fontsize=12)
plt.yticks(fontsize=12)
plt.ylabel('Effect of Removal (removed - included)', fontsize=16)
plt.legend(fontsize=12)
plt.show()

In [None]:
# Plot for Fairness metrics based on Ratios
plt.figure(figsize=(8,6))
plt.title('Impact of Personal Attributes', fontsize=18)
x = np.arange(len(personal_attrs))  # the label locations
width = 0.35  # the width of the bars
rects1 = plt.bar(x - width/2, bal_acc_deltas, width, label='Balanced Acc. Delta')
rects2 = plt.bar(x + width/2, fnr_rat_loo, width, label='FNR Ratio (Gender) - LOO')
plt.axhline(0, c='k', ls='-', lw='1')
plt.axhline(gender_metrics.fnr_ratio, c='k', ls=':', label='FNR Ratio (Gender) ')
plt.axhline(1.0, c='k', ls='-', lw='2', label='FNR Parity') # show neutrality
plt.xticks(x, personal_attrs, rotation=45, ha='right', fontsize=12)
plt.yticks(fontsize=12)
plt.ylabel('Effect of Removal (removed - baseline)', fontsize=16)
plt.legend(fontsize=12,loc='center right')
plt.show()

In [None]:
# Plot
plt.figure(figsize=(8,6))
plt.title('Impact of Personal Attributes', fontsize=18)
x = np.arange(len(personal_attrs))  # the label locations
width = 0.35  # the width of the bars
rects1 = plt.bar(x - width/2, bal_acc_deltas, width, label='Balanced Acc.')
rects2 = plt.bar(x + width/2, equal_opp_deltas, width, label='Equal Opportunity (gender)')
plt.axhline(0, c='k', ls='-', lw='1')
plt.axhline(-gender_metrics.equal_opp, c='k', ls=':', label='Equal Opportunity') # show neutrality
plt.xticks(x, personal_attrs, rotation=45, ha='right', fontsize=12)
plt.yticks(fontsize=12)
plt.ylabel('Effect of Removal (removed - included)', fontsize=16)
plt.legend(fontsize=12)
plt.show()

In [None]:
# Plot for Fairness metrics based on Ratios
plt.figure(figsize=(8,6))
plt.title('Impact of Personal Attributes', fontsize=18)
x = np.arange(len(personal_attrs))  # the label locations
width = 0.35  # the width of the bars
rects1 = plt.bar(x - width/2, bal_acc_deltas, width, label='Balanced Acc. Delta')
rects2 = plt.bar(x + width/2, fpr_rat_loo, width, label='FPR Ratio (Gender) - LOO')
plt.axhline(0, c='k', ls='-', lw='1')
plt.axhline(gender_metrics.fpr_ratio, c='k', ls=':', label='FPR Ratio (Gender) ')
plt.axhline(1.0, c='k', ls='-', lw='2', label='FPR Parity') # show neutrality
plt.xticks(x, personal_attrs, rotation=45, ha='right', fontsize=12)
plt.yticks(fontsize=12)
plt.ylabel('Effect of Removal (removed - baseline)', fontsize=16)
plt.legend(fontsize=12,loc='upper right')
plt.show()

In [None]:
# Plot
plt.figure(figsize=(8,6))
plt.title('Impact of Personal Attributes', fontsize=18)
x = np.arange(len(personal_attrs))  # the label locations
width = 0.35  # the width of the bars
rects1 = plt.bar(x - width/2, bal_acc_deltas, width, label='Balanced Acc.')
rects2 = plt.bar(x + width/2, fpr_par_deltas, width, label='False Positive Rate Parity (gender)')
plt.axhline(0, c='k', ls='-', lw='1')
plt.axhline(-gender_metrics.fpr_parity, c='k', ls=':', label='FPR Parity') # show neutrality
plt.xticks(x, personal_attrs, rotation=45, ha='right', fontsize=12)
plt.yticks(fontsize=12)
plt.ylabel('Effect of Removal (removed - baseline)', fontsize=16)
plt.legend(fontsize=12, )
plt.show()

### Justifying the use of Personal Attributes
After running a "leave-one-out" feature removal analysis, we can assess the approximate impact of personal attributes on both fairness and model performance. We plot the impact of removing each personal attribute on the model's performance (balanced accuracy) and different fairness metric with respect to gender. We want balanced accuracy to be as high a possible, while ideally false negative rate ratio would be at neutrality. 

**Tradeoffs to be further examined**: Attributes for which removal negatively affect model performance but positively affect the fairness metric(s) of interest (or vice-versa). 

**Evidence for inclusion**: Attributes for which removal negatively affect both model performance and the fairness metric(s) of interest. 

**Evidence for exclusion**: Attributes for which removal positively affect both model performance and the fairness metric(s) of interest. 

### Permutation importance

In [None]:
# order features by perm importance and plot 
# for model containing all personal attributes
perm_importance = permutation_importance(model_all_v, X_train_transformed.toarray(), y_train, n_repeats=10,random_state=0)
sorted_idx = perm_importance.importances_mean.argsort()

In [None]:
features_array=np.array(features_preprocessor_all)#features

In [None]:
plt.figure(figsize=(5,12)) 
plt.barh(features_array[sorted_idx], perm_importance.importances_mean[sorted_idx],color ='#FF9933')
plt.xlabel("Permutation Importance")

### Correlations

In [None]:
# compute correlation and plot for numerical features
corr_num = df.select_dtypes(exclude=object).corr()

fig, ax = plt.subplots(figsize=(20,17))
sns.heatmap(corr_num, xticklabels=corr_num.columns, yticklabels=corr_num.columns, ax=ax, cmap='RdGy', annot=True,
            fmt='.2f', square=True)

### Correlations Phik

In [None]:
corr_phik = df.phik_matrix()
fig, ax = plt.subplots(figsize=(20,17))
sns.heatmap(corr_phik, xticklabels=corr_phik.columns, yticklabels=corr_phik.columns, ax=ax, cmap="YlGnBu", annot=True,
            fmt='.2f', square=True)

In [None]:
fig, ax = plt.subplots(figsize=(18,15))
sns.heatmap(df.significance_matrix(), xticklabels=corr_phik.columns, yticklabels=corr_phik.columns, ax=ax, cmap="YlGnBu", annot=True,
            fmt='.2f', square=True)

In [None]:
global_correlation, global_labels = X_train.global_phik()
for c, l in zip(global_correlation, global_labels):
    print(l, c[0])

In [None]:
plot_correlation_matrix(global_correlation, x_labels=[''], y_labels=global_labels, 
                        vmin=0, vmax=1, figsize=(3.5,4),
                        color_map='Blues', title=r'$g_k$',
                        fontsize_factor=1.0)
plt.tight_layout()

## Race Fairness (major vs other)

####  Code corresponding to section 2.7.4 Part C – Measuring Disadvantage in Veritas Document 4 FEAT Principles Assessment Case Studies

In [None]:
# Run race analysis, side-by-side with gender analysis
race_analysis = utils.FairnessAnalysis(y_test.astype(int), y_prob, race_mask)
race_metrics = race_analysis.compute(best_th)
for attr, name in utils.FairnessAnalysis.metric_names.items():
    print(name, ":", round(getattr(race_metrics, attr), 3), " | ", round(getattr(gender_metrics, attr), 3),)

In [None]:
# Bootstrap Uncertainty
bs_metrics = []
np.random.seed(0)
for i in range(25):
    idx = np.random.choice(len(y_test), len(y_test), replace=True)
    tmp = utils.FairnessAnalysis(y_test.astype(int).values[idx], y_prob[idx], race_mask.values[idx])
    tmp2 = tmp.compute(best_th)
    bs_metrics.append(tmp2)

bs_metrics = np.array(bs_metrics)

In [None]:
for i, attr in enumerate(race_metrics._fields):
    print(utils.FairnessAnalysis.metric_names[attr], ":", 
          utils.format_uncertainty(bs_metrics[:, i].mean(), 2 * bs_metrics[:, i].std()))

### Personal Attributes
Here we consider how we might justify the inclusion of personal attributes

####  Code corresponding to section 2.7.2.2 Part D: Justify the Use of Personal Attributes in Veritas Document 4 FEAT Principles Assessment Case Studies

In [None]:
# Leave one out analysis
loo_metrics = []
model_loo = LogisticRegression(max_iter=150, random_state=SEED)
for i, attr in enumerate(personal_attrs):
    print('\nTraining model without:', attr)
    X_train_transformed_loo = preprocessor.fit_transform(X_train.drop([attr], axis=1))
    X_test_transformed_loo = preprocessor.transform(X_test.drop([attr], axis=1))

    model_loo.fit(X_train_transformed_loo, y_train)
    
    # Predict and compute fairness Metrics
    loo_test_probs = model_loo.predict_proba(X_test_transformed_loo)[:,1]
    loo_analysis = utils.FairnessAnalysis(y_test.astype(int).values, loo_test_probs, race_mask)
    loo_metrics.append(loo_analysis.compute(best_th))
    
    # Display results as they arrive
    for field, name in utils.FairnessAnalysis.metric_names.items():
        print(name, ":", round(getattr(loo_metrics[i], field), 5))

In [None]:
# Compute difference (removed - included)
bal_acc_deltas = [loo.bal_acc - race_metrics.bal_acc for loo in loo_metrics]
fnr_par_deltas = [loo.fnr_parity - race_metrics.fnr_parity for loo in loo_metrics]
fnr_rat_deltas = [loo.fnr_ratio - race_metrics.fnr_ratio for loo in loo_metrics]
fpr_rat_deltas = [loo.fpr_ratio - race_metrics.fpr_ratio for loo in loo_metrics]
fpr_par_deltas = [loo.fpr_parity - race_metrics.fpr_parity for loo in loo_metrics]
equal_opp_deltas = [loo.equal_opp - race_metrics.equal_opp for loo in loo_metrics]

fnr_loo = [loo.fnr_parity for loo in loo_metrics]

In [None]:
# Plot
plt.figure(figsize=(8,6))
plt.title('Impact of Personal Attributes', fontsize=18)
x = np.arange(len(personal_attrs))  # the label locations
width = 0.35  # the width of the bars
rects1 = plt.bar(x - width/2, bal_acc_deltas, width, label='Balanced Acc.')
rects2 = plt.bar(x + width/2, fnr_par_deltas, width, label='FNR Parity (race)')
plt.axhline(0, c='k', ls='-', lw='1')
plt.axhline(-race_metrics.fnr_parity, c='k', ls=':', label='Neutral FNR Parity') # show neutrality
plt.xticks(x, personal_attrs, rotation=45, ha='right', fontsize=12)
plt.yticks(fontsize=12)
plt.ylabel('Effect of Removal (removed - included)', fontsize=16)
plt.legend(fontsize=12, loc='lower right')
plt.show()

In [None]:
# Plot Fairness Metrics based on Ratios
plt.figure(figsize=(8,6))
plt.title('Impact of Personal Attributes', fontsize=18)
x = np.arange(len(personal_attrs))  # the label locations
width = 0.35  # the width of the bars
rects1 = plt.bar(x - width/2, bal_acc_deltas, width, label='Balanced Acc. Delta')
#rects2 = plt.bar(x + width/2, fnr_par_deltas, width, label='FNR Parity (gender)')
rects2 = plt.bar(x + width/2, fnr_rat_loo, width, label='FNR Ratio (Race) - LOO')
plt.axhline(0, c='k', ls='-', lw='1')
plt.axhline(race_metrics.fnr_ratio, c='k', ls=':', label='FNR Ratio (Race) ')
plt.axhline(1.0, c='k', ls='-', lw='2', label='FNR Parity') # show neutrality
plt.xticks(x, personal_attrs, rotation=45, ha='right', fontsize=12)
plt.yticks(fontsize=12)
plt.ylabel('Effect of Removal (removed - baseline)', fontsize=16)
plt.legend(fontsize=12,loc='lower right')
plt.show()

In [None]:
# Plot
plt.figure(figsize=(8,6))
plt.title('Impact of Personal Attributes', fontsize=18)
x = np.arange(len(personal_attrs))  # the label locations
width = 0.35  # the width of the bars
rects1 = plt.bar(x - width/2, bal_acc_deltas, width, label='Balanced Acc.')
rects2 = plt.bar(x + width/2, equal_opp_deltas, width, label='Equal Opportunity (race)')
plt.axhline(0, c='k', ls='-', lw='1')
plt.axhline(-race_metrics.equal_opp, c='k', ls=':', label='Neutral Equal Opportunity') # show neutrality
plt.xticks(x, personal_attrs, rotation=45, ha='right', fontsize=12)
plt.yticks(fontsize=12)
plt.ylabel('Effect of Removal (removed - included)', fontsize=16)
plt.legend(fontsize=12)
plt.show()

In [None]:
# Plot
plt.figure(figsize=(8,6))
plt.title('Impact of Personal Attributes', fontsize=18)
x = np.arange(len(personal_attrs))  # the label locations
width = 0.35  # the width of the bars
rects1 = plt.bar(x - width/2, bal_acc_deltas, width, label='Balanced Acc.')
rects2 = plt.bar(x + width/2, fpr_par_deltas, width, label='False Positive Rate Parity (race)')
plt.axhline(0, c='k', ls='-', lw='1')
plt.axhline(-race_metrics.fpr_parity, c='k', ls=':', label='Neutal FPR Parity') # show neutrality
plt.xticks(x, personal_attrs, rotation=45, ha='right', fontsize=12)
plt.yticks(fontsize=12)
plt.ylabel('Effect of Removal (removed - baseline)', fontsize=16)
plt.legend(fontsize=12,loc='upper right')
plt.show()

### Justifying the use of Personal Attributes
After running a "leave-one-out" feature removal analysis, we can assess the approximate impact of personal attributes on both fairness and model performance. We plot the impact of removing each personal attribute on the model's performance (balanced accuracy) and different fairness metric with respect to gender. We want balanced accuracy to be as high a possible, while ideally false negative rate ratio would be at neutrality. 

**Tradeoffs to be further examined**: Attributes for which removal negatively affect model performance but positively affect the fairness metric(s) of interest (or vice-versa). 

**Evidence for inclusion**: Attributes for which removal negatively affect both model performance and the fairness metric(s) of interest. 

**Evidence for exclusion**: Attributes for which removal positively affect both model performance and the fairness metric(s) of interest. 