# Execute the credit decision pipeline that is detecting age bias and removing using the reweighting algorithm
---

Machine learning has phenomenally improved the capability of data interpretation for making informed decisions in various fields and domains. Supervised machine learning has equipped systems to make predictions for new instances. The predicted values are driven by the model developed by trying to understand and devise a pattern in the training data in terms of the protected attributes.
	The systems fairness in making the prediction is thus dependent on the model and in turn the training data that has been provided to the generate the model. The issue in training data maybe insufficient data to make unbiased decision and or generating a prediction model which introduces bias in the system on specific attributes.
The credit decision pipeline is trained by the german credit data instances. The pipeline is inspected for bias on age as the protected attribute. An age of over ‘25’ is privileged with positive outcome while age less than ‘25’ is unprivileged. Following the machine learning pipeline, the instances are first divided into training and testing dataset. The model and prediction is then tested for bias in the credit decision by comparing the decision for privileged and unprivileged group. The process has dropped another protected attribute gender from the training dataset to reduce the influence of gender on the model and hence the prediction. 
	Binary Label dataset metric has been used to classify the training data as favorable and unfavorable for the protected attribute ‘age’. The mean difference in the outcomes of the two groups clearly indicates that a positive outcome of credit decision is 17% more likely for a privileged group with age >25. 


The entire process of the bias mitigation can be explianed as a process in  involving the following steps

1. Import the necessary packages and load the dataset

In [21]:
%pip install aif360 #'aif360 < 0.3.0'
%matplotlib inline
# Load all necessary packages
import sys
sys.path.append("../")
sys.path.insert(1, "../")  
import numpy as np
from tqdm import tqdm

from aif360.datasets import BinaryLabelDataset
from aif360.datasets import GermanDataset
from aif360.metrics import BinaryLabelDatasetMetric
from aif360.algorithms.preprocessing import Reweighing

from IPython.display import Markdown, display



2 : The dataset is loaded with protected attribute as age and providign with a value greater than 25 is considered privileged. We also drop a the 'sex' feature to reduce impact of the feature on the dataset. The entire dataset is then broken in train and test.

In [0]:
## import dataset
dataset_orig = GermanDataset(
    protected_attribute_names=['age'],           # this dataset also contains protected
                                                 # attribute for "sex" which we do not
                                                 # consider in this evaluation
    privileged_classes=[lambda x: x >= 25],      # age >=25 is considered privileged
    features_to_drop=['personal_status', 'sex'] # ignore sex-related attributes
)

dataset_orig_train, dataset_orig_test = dataset_orig.split([0.7], shuffle=True)

privileged_groups = [{'age': 1}]
unprivileged_groups = [{'age': 0}]

In [0]:
# print out some labels, names, etc.
display(Markdown("#### Training Dataset shape"))
print(dataset_orig_train.features.shape)
display(Markdown("#### Favorable and unfavorable labels"))
print(dataset_orig_train.favorable_label, dataset_orig_train.unfavorable_label)
display(Markdown("#### Protected attribute names"))
print(dataset_orig_train.protected_attribute_names)
display(Markdown("#### Privileged and unprivileged protected attribute values"))
print(dataset_orig_train.privileged_protected_attributes, 
      dataset_orig_train.unprivileged_protected_attributes)
display(Markdown("#### Dataset feature names"))
print(dataset_orig_train.feature_names)

#### Training Dataset shape

(700, 11)


#### Favorable and unfavorable labels

1.0 2.0


#### Protected attribute names

['age']


#### Privileged and unprivileged protected attribute values

[array([1.])] [array([0.])]


#### Dataset feature names

['age', 'sex', 'credit_history=Delay', 'credit_history=None/Paid', 'credit_history=Other', 'savings=500+', 'savings=<500', 'savings=Unknown/None', 'employment=1-4 years', 'employment=4+ years', 'employment=Unemployed']


3 : Assesing Bias in the original dataset. Here we check the bias in the privileged and unpriveleged group by comparing the mean difference in the outcomes.

In [23]:
# Metric for the original dataset
metric_orig_train = BinaryLabelDatasetMetric(dataset_orig_train, 
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
display(Markdown("#### Original training dataset"))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train.mean_difference())

#### Original training dataset

Difference in mean outcomes between unprivileged and privileged groups = -0.104495


4 : Bias Idnetification tells us that there is a bias for positive outcomes for the privileged group of around 10.45 %. 

5 : We then apply reweighing to the dataset to assign more weight to posotve outcomes for unprivileged group.

In [0]:
RW = Reweighing(unprivileged_groups=unprivileged_groups,
               privileged_groups=privileged_groups)
#RW.fit(dataset_orig_train)
dataset_transf_train = RW.fit_transform(dataset_orig_train)

6: The final step is to assess the bias in the transsformed datset.

In [26]:
metric_transf_train = BinaryLabelDatasetMetric(dataset_transf_train, 
                                         unprivileged_groups=unprivileged_groups,
                                         privileged_groups=privileged_groups)
display(Markdown("#### Transformed training dataset"))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_transf_train.mean_difference())

#### Transformed training dataset

Difference in mean outcomes between unprivileged and privileged groups = 0.000000


The bias is mitigated in the training dataset by reweighing or assigning more weight to the positive outcomes for privileged and unprivileged group for the attribute age. The effect is that the bias towards a negative outcome for the unprivileged group is mitigated by the increased weight of the positives. The re-trained system on the transformed dataset (by reweighing) results in a mean difference outcome of 0.0. Hence the reweighing has mitigated the bias in age attribute for the credit decision outcome prediction.