# Assignment 4: Detecting and Mitigating Bias

The goal of this tutorial is to introduce the basic functionality of AI Fairness 360 for detecting and mitigating bias. As before, we will work with the German Credit dataset. There are many metrics one can use to detect the presence of bias. Likewise, there are many different bias mitigation algorithms one can employ. AI Fairness 360 provides some of them most common metrics and algorithms.


### Bias mitigation techniques

We learnt about the different bias mitigation techniques in class called _pre-processing_, _in-processing_, and _post-processing_.


We will use AI Fairness 360 (`aif360`) to detect and mitigate bias. We will look for bias in the creation of a machine learning model that predicts whether an applicant should be given credit based on various features from a typical credit application. The protected attribute will be "Age", with "1" (older than or equal to 25) and "0" (younger than 25) being the values for the _privileged_ and _unprivileged_ groups, respectively.

In this notebook, we will:

1. Install and import packages and modules
2. Load dataset, split between train and test, and compute fairness metrics on original training dataset
3. Mitigate bias using a pre-processing algorithm (reweighing)
4. Mitigate bias using an in-processing algorithm (adversarial debiasing)
5. Mitigate bias using a post-processing algorithm (equalized odds post processing)

## Note:

This assignment was completed on Colab due to local errors with the Python environment.


## 1. Import Statements

First, we install the necessary packages. Then we import several components from the `aif360` package. We are relying on aif360 for this assignment, so please start early to make sure that the dependencies are resolved and that the pacakges load correctly.

In [1]:
#!pip install numba



In [2]:
#!pip install tensorflow[and-cuda]

Collecting nvidia-cublas-cu12==12.5.3.2 (from tensorflow[and-cuda])
  Downloading nvidia_cublas_cu12-12.5.3.2-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.5.82 (from tensorflow[and-cuda])
  Downloading nvidia_cuda_cupti_cu12-12.5.82-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.5.82 (from tensorflow[and-cuda])
  Downloading nvidia_cuda_nvrtc_cu12-12.5.82-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.5.82 (from tensorflow[and-cuda])
  Downloading nvidia_cuda_runtime_cu12-12.5.82-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cudnn-cu12==9.3.0.75 (from tensorflow[and-cuda])
  Downloading nvidia_cudnn_cu12-9.3.0.75-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cufft-cu12==11.2.3.61 (from tensorflow[and-cuda])
  Downloading nvidia_cufft_cu12-11.2.3.61-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecti

In [3]:
# No need to re-install if you already did so in Assignment 2
#!pip install aif360

Collecting aif360
  Downloading aif360-0.6.1-py3-none-any.whl.metadata (5.0 kB)
Downloading aif360-0.6.1-py3-none-any.whl (259 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m259.7/259.7 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: aif360
Successfully installed aif360-0.6.1


In [5]:
#!pip install aif360[Reductions] aif360[inFairness]

Collecting fairlearn~=0.7 (from aif360[Reductions])
  Downloading fairlearn-0.13.0-py3-none-any.whl.metadata (7.3 kB)
Collecting skorch (from aif360[inFairness])
  Downloading skorch-1.2.0-py3-none-any.whl.metadata (11 kB)
Collecting inFairness>=0.2.2 (from aif360[inFairness])
  Downloading inFairness-0.2.3-py3-none-any.whl.metadata (8.1 kB)
Collecting scipy>=1.2.0 (from aif360[Reductions])
  Downloading scipy-1.15.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.0/62.0 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting POT>=0.8.0 (from inFairness>=0.2.2->aif360[inFairness])
  Downloading pot-0.9.6.post1-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (40 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.2/40.2 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.6.77 (from torch>=1.13.0->inFairness>=0.2.2->

In [1]:
# import all necessary packages
import numpy as np
np.random.seed(0)

from numba import jit

from aif360.datasets import GermanDataset, BinaryLabelDataset
from aif360.metrics import BinaryLabelDatasetMetric, DatasetMetric

from aif360.algorithms.preprocessing import Reweighing, LFR, DisparateImpactRemover
from aif360.algorithms.inprocessing import AdversarialDebiasing
from aif360.algorithms.postprocessing import EqOddsPostprocessing

from aif360.explainers import MetricTextExplainer, MetricJSONExplainer

from sklearn.linear_model import LogisticRegression

import tensorflow as tf
print(tf.__version__)

from IPython.display import Markdown, display

import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

import json
from collections import OrderedDict

  vect_normalized_discounted_cumulative_gain = vmap(
  monte_carlo_vect_ndcg = vmap(vect_normalized_discounted_cumulative_gain, in_dims=(0,))


2.19.0


In [3]:
# Need to insert a .doc file and the german.data file to use the Dataset class; this is only inserting the content of the doc file.
file_path = '/usr/local/lib/python3.12/dist-packages/aif360/data/raw/german/german.doc'
file_content = """Description of the German credit dataset.

1. Title: German Credit data

2. Source Information

Professor Dr. Hans Hofmann
Institut f"ur Statistik und "Okonometrie
Universit"at Hamburg
FB Wirtschaftswissenschaften
Von-Melle-Park 5
2000 Hamburg 13

3. Number of Instances:  1000

Two datasets are provided.  the original dataset, in the form provided
by Prof. Hofmann, contains categorical/symbolic attributes and
is in the file "german.data".

For algorithms that need numerical attributes, Strathclyde University
produced the file "german.data-numeric".  This file has been edited
and several indicator variables added to make it suitable for
algorithms which cannot cope with categorical variables.   Several
attributes that are ordered categorical (such as attribute 17) have
been coded as integer.    This was the form used by StatLog.


6. Number of Attributes german: 20 (7 numerical, 13 categorical)
   Number of Attributes german.numer: 24 (24 numerical)


7.  Attribute description for german

Attribute 1:  (qualitative)
               Status of existing checking account
               A11 :      ... <    0 DM
               A12 : 0 <= ... <  200 DM
               A13 :      ... >= 200 DM /
                     salary assignments for at least 1 year
               A14 : no checking account

Attribute 2:  (numerical)
              Duration in month

Attribute 3:  (qualitative)
              Credit history
              A30 : no credits taken/
                    all credits paid back duly
              A31 : all credits at this bank paid back duly
              A32 : existing credits paid back duly till now
              A33 : delay in paying off in the past
              A34 : critical account/
                    other credits existing (not at this bank)

Attribute 4:  (qualitative)
              Purpose
              A40 : car (new)
              A41 : car (used)
              A42 : furniture/equipment
              A43 : radio/television
              A44 : domestic appliances
              A45 : repairs
              A46 : education
              A47 : (vacation - does

"""

# Write the content to the file
with open(file_path, "w") as f:
    f.write(file_content)

print(f"Content successfully written to {file_path}")

Content successfully written to /usr/local/lib/python3.12/dist-packages/aif360/data/raw/german/german.doc


## 2. Load Data, Specify Protected Attribute, and Split Data

We will use the German Credit data, set the protected attribute to be age, create two variables to represent the privileged and unprivileged groups, and split the original dataset into training and test data subsets. Finally, we will build a typical machine learning workflow that involves training a machine learning model on the training dataset and use a test dataset to assess the model's efficacy (e.g., accuracy, fairness). For this dataset, we have a binary classification problem that predicts individuals as being a good or a bad credit risk.

In this dataset, we consider older applicants (age >= 25) as the privileged group and younger applicants (age < 25) as the unprivileged group.

We will use the preprocessed GermanDataset with one-hot encoded data provided by the aif360 package.

In [4]:
# note that we drop sex, which may also be a protected attribute
dataset_orig = GermanDataset(protected_attribute_names=['age'],
                             privileged_classes=[lambda x: x >= 25],
                             features_to_drop=['personal_status', 'sex'])

dataset_orig_train, dataset_orig_test = dataset_orig.split([0.7], shuffle=True)

privileged_groups = [{'age': 1}]
unprivileged_groups = [{'age': 0}]

In [5]:
print("Original data shape: ",dataset_orig.features.shape)
print("Train dataset shape: ", dataset_orig_train.features.shape)
print("Test dataset shape: ", dataset_orig_test.features.shape)

Original data shape:  (1000, 57)
Train dataset shape:  (700, 57)
Test dataset shape:  (300, 57)


The object ```dataset_orig``` is an aif360 dataset, which has some useful methods and attributes that you can explore. More documentation is available at https://aif360.readthedocs.io/en/latest/modules/datasets.html.
For now, we'll just transform the data into a pandas dataframe:

In [6]:
df, dict_df = dataset_orig.convert_to_dataframe()
print("Shape: ", df.shape)
# print(df.columns)
# df.head(5)

Shape:  (1000, 58)


## 3. Compute Fairness Metrics on Original Training Data
Now that we have identified the protected attribute "age" and defined privileged and unprivileged values, we can use aif360 to detect bias in the dataset.  

### Mean Outcomes

Compare the base rates (i.e., percentage of favorable results) for the privileged and unprivileged groups and report the difference (unprivileged base rate - privileged base rate). This is implemented in the ```mean_difference``` method on the BinaryLabelDatasetMetric class, as shown below:

In [7]:
metric_orig_train = BinaryLabelDatasetMetric(
     dataset_orig_train,
     unprivileged_groups=unprivileged_groups,
     privileged_groups=privileged_groups
  )
print("Original training dataset")
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train.mean_difference())

Original training dataset
Difference in mean outcomes between unprivileged and privileged groups = -0.169905


### Disparate Impact
We can calculate the ratio of (predicted) favorable outcomes for the unprivileged group compared to the privileged group as implemented in the ```disparate_impact``` method on the BinaryLabelDatasetMetric class:

In [8]:
print("Original training dataset")
print("Disparate Impact = %f" % metric_orig_train.disparate_impact())

Original training dataset
Disparate Impact = 0.766430


**Note:** The fairness metrics above will vary depending upon the train-test split. If the magnitude of mean difference is less than 10%, try another split.

### Built-In Explainers

```aif360``` has some useful explainers for the fairness metrics which can be used to interpret the fairness metric values:

In [9]:
json_expl = MetricJSONExplainer(metric_orig_train)
def format_json(json_str):
    return json.dumps(json.loads(json_str, object_pairs_hook=OrderedDict),
                      indent=2)

Let's print the mean difference explainer:

In [10]:
print(format_json(json_expl.mean_difference()))

{
  "metric": "Mean Difference",
  "message": "Mean difference (mean label value on unprivileged instances - mean label value on privileged instances): -0.1699054740619017",
  "numPositivesUnprivileged": 63.0,
  "numInstancesUnprivileged": 113.0,
  "numPositivesPrivileged": 427.0,
  "numInstancesPrivileged": 587.0,
  "description": "Computed as the difference of the rate of favorable outcomes received by the unprivileged group to the privileged group.",
  "ideal": "The ideal value of this metric is 0.0"
}


We can also print the disparate impact explainer:

In [11]:
print(format_json(json_expl.disparate_impact()))

{
  "metric": "Disparate Impact",
  "message": "Disparate impact (probability of favorable outcome for unprivileged instances / probability of favorable outcome for privileged instances): 0.7664297113013201",
  "numPositivePredictionsUnprivileged": 63.0,
  "numUnprivileged": 113.0,
  "numPositivePredictionsPrivileged": 427.0,
  "numPrivileged": 587.0,
  "description": "Computed as the ratio of rate of favorable outcome for the unprivileged group to that of the privileged group.",
  "ideal": "The ideal value of this metric is 1.0 A value < 1 implies higher benefit for the privileged group and a value >1 implies a higher benefit for the unprivileged group."
}


**Q1:** Using the explainers above, interpret the difference in means and disparate impact in the German Credit data:

The mean difference between unprivileged and privileged groups is given as `-0.169` which indicates that the unprivileged group receives a favourable outcome about 17% (rounded up) of the time as the privileged group. So this means that there is a disadvantage for the unprivileged group compared to the privileged group in that they receive a favourable outcome a lot less often.

The disparate impact between the groups is calculated to be `0.766` - impying a benefit for the privileged group.

### Build a model on the training data

Let's build a logistic regression model on this training data, predict credit risk for test data and compute the same fairness metrics over the model predictions.

In [12]:
model = LogisticRegression(solver='liblinear', class_weight='balanced')

df_test, dict_df_test = dataset_orig_test.convert_to_dataframe()
df_train, dict_df_train = dataset_orig_train.convert_to_dataframe()

# Fit the model to the training data
x_train = df_train.drop(['credit'], axis=1)
y_train = df_train['credit']
model.fit(x_train, y_train)

x_test = df_test.drop(['credit'], axis=1)
y_test = df_test['credit']

y_pred = model.predict(x_test)

dataset_pred_test = dataset_orig_test.copy()
dataset_pred_test.labels = y_pred.copy()

metric_dataset_test = BinaryLabelDatasetMetric(
    dataset_pred_test,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups
  )


In [13]:
# write code here to compute fairness metrics
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_test.mean_difference())
print("Disparate Impact = %f" % metric_dataset_test.disparate_impact())

Difference in mean outcomes between unprivileged and privileged groups = -0.303030
Disparate Impact = 0.523810


**Q2:** Using the fairness metric functions as before, report the bias observed in the model's predictions over test data. What do these values indicate? Are the model's predictions more biased or less biased compared to the bias observed in the training data?

The difference in mean outcomes between groups is reported as `-0.303` which tells us that the unprivleged group is at a significant disadvatage - receiving only ~30% of favourable outcomes as often compared to the privileged groups. Furthermore, disparate impact is calculated to be `0.523` which tells us that the privileged group is still favoured. The value is positive, and `< 1` which indicates a bias towards the privileged group.

Comparing the two sets and the metrics derived from them:

| Set | Mu-Diff | Disparate Impact |
| --- | --- | --- |
| Original | -0.169 | 0.766 |
| Predictions | -0.303 | 0.523 |

Generally, if bias mitigation isn't considered prior to training a model on the dataset, we would expect the model to reflect the same biases in its predictions, if not amplify these biases in its predictions. The difference between the original dataset and the predictions are `-0.169 vs. -0.303`, which tells us that there is a larger "gap" between both groups, meaning that the bias has gotten worse. Similarly, disparate impact is `0.766 vs. 0.523` - given that the ideal value of `1` which indicates perfect parity, this has also worsened.

The predictions are amplifying the bias observed in the original dataset.

## 4. Bias Mitigation Techniques

We learnt in class that there are several bias mitigation techniques namely, pre-processing, in-processing, and post-processing algorithms.

_Pre-processing_ bias mitigation is performed at the data end, before the creation of the model. In other words, we transform the data such that a model learned on the transformed data produces less biased decisions.

_In-processing_ bias mitigation methods focus on the model training stage, as compared to pre-processing which focuses on transforming the data prior to model training. This suite of methods includes incorporating a fairness constraint during model training, tweaking the model's objective function, and adversarial learning.

_Post-processing_ bias mitigation focus on the model predictions after the model has been trained.



### 4.1 Bias Mitigation via Pre-Processing

AI Fairness 360 implements several pre-processing mitigation algorithms. We will use the **reweighing algorithm**, which is implemented in the `Reweighing` class in the `aif360.algorithms.preprocessing` package. As discussed in class, this algorithm will transform the dataset by assigning weights to instances in each (group, label) combination to change the base rates and ensure fairness before classification. The idea is to apply appropriate weights to different tuples in the training data to reduce discrimination with respect to the protected attributes.

You can find documentation for reweighting here:
https://aif360.readthedocs.io/en/latest/modules/generated/aif360.algorithms.preprocessing.Reweighing.html

Call the fit and transform methods to perform the transformation, producing a newly transformed training dataset (```dataset_transf_train```):

In [14]:
RW = Reweighing(unprivileged_groups=unprivileged_groups,
                privileged_groups=privileged_groups)
dataset_transf_train = RW.fit_transform(dataset_orig_train)

We can print the weights. Each observation in the data should have a weight. For brevity, let's look at the weights for the first 10 rows:

In [15]:
len(dataset_transf_train.instance_weights)
dataset_transf_train.instance_weights[0:10]

array([0.96229508, 0.96229508, 0.96229508, 0.96229508, 0.96229508,
       0.96229508, 0.96229508, 0.96229508, 1.25555556, 0.678     ])

### Compute Fairness Metrics in Transformed Data

We can check how effective the transformed data was in removing bias by calculating the metrics used for the original training dataset.

In [16]:
metric_rw_train = BinaryLabelDatasetMetric(
    dataset_transf_train,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups
  )

Print the difference in mean outcomes and disparate impact in the transformed data:

In [17]:
# write your code here
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_rw_train.mean_difference())
print("Disparate Impact = %f" % metric_rw_train.disparate_impact())

Difference in mean outcomes between unprivileged and privileged groups = 0.000000
Disparate Impact = 1.000000


**Q3:** How do these values compare to the difference in mean outcomes and disparate impact in the original data?

Reweighing appears balance outcomes between both groups such that there is perfect parity between groups, and equal likelhood of outcomes. The metrics are at their ideal values - indicating no bias and perfect parity.

### Compute Fairness Metrics on Model Trained on Transformed Data

In the following, we will train a model on the transformed data and compute the metrics over predictions made on the test data.

**Q4:**  How do you expect the fairness metrics would be over a model trained on the transformed data?

I would expect that there would still be some bias and disparity reflected in the model's predictions even after reweighing, though their magnitude of difference over the previous model should have improved significantly after this step (depending on which model is being used).

Since the instances now have weights, we will use a classifier that can incorporate instance weights. In this case, we will use a Naive Bayes classifier (more details here: https://scikit-learn.org/stable/modules/naive_bayes.html).

In [18]:
df_train_rw, dict_df_train_rw = dataset_transf_train.convert_to_dataframe()

# Fit the model to the transformed training data
x_train_rw = df_train_rw.drop(['credit'], axis=1)
y_train_rw = df_train_rw['credit']

from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(x_train_rw, y_train_rw)

# Use the model to make predictions on the test data
y_pred_rw = model.predict(x_test)

dataset_pred_test_rw = dataset_orig_test.copy()
dataset_pred_test_rw.labels = y_pred_rw.copy()

# Construct the BinaryLabelDatasetMetric object over the test predictions
metric_dataset_test_rw = BinaryLabelDatasetMetric(
    dataset_pred_test_rw,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups
  )

# Print fairness metrics computed over test predictions
# write code here

print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_test_rw.mean_difference())
print("Disparate Impact = %f" % metric_dataset_test_rw.disparate_impact())


Difference in mean outcomes between unprivileged and privileged groups = -0.123737
Disparate Impact = 0.810078


**Q5:** Are your observations in line with what you expected in Q4 above? Why or why not?

Yes. The reported metrics are `-0.123` and `0.810` - there is some bias in the model, but they are much less severe than a model trained from data that was not reweighed during preprocessing; if we note the distance of the reported metrics from their ideal values, this observation much more apparent.

This is likley due to Algorithmic bias: where models learn inter-feature relationships and feature-target relationships differently, depending on what is used. Our classifier is probably still picking up some correlations between the unprivileged or protected group and the target/independent variable; or either due to a proxy variable such as neighbourhood or zipcode information.

**Q6:** Instead of reweighing, one could also apply techniques such as suppression, i.e. removing sensitive attributes. Write code below to train a model that does not use any information on the sensitive attribute, use this model to make predictions over the test data, and then compute the fairness metrics over the predictions.

In [21]:
# write code here to implement suppression
# !pip install BlackBoxAuditing
from aif360.algorithms.preprocessing import DisparateImpactRemover

SUPPRESSION = DisparateImpactRemover(repair_level=1.0, sensitive_attribute='age')
dataset_transf_train = SUPPRESSION.fit_transform(dataset_orig_train)

df_train_supp, dict_df_train_supp = dataset_transf_train.convert_to_dataframe()

# Fit the model to the transformed training data
x_train_supp = df_train_supp.drop(['credit'], axis=1)
y_train_supp = df_train_supp['credit']

# use GNB again
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(x_train_supp, y_train_supp)

y_pred_supp = model.predict(x_test)

dataset_pred_test_supp = dataset_orig_test.copy()
dataset_pred_test_supp.labels = y_pred_supp.copy()

metric_dataset_test_supp = BinaryLabelDatasetMetric(
    dataset_pred_test_supp,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups
  )

print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_test_supp.mean_difference())
print("Disparate Impact = %f" % metric_dataset_test_supp.disparate_impact())

Difference in mean outcomes between unprivileged and privileged groups = -0.140152
Disparate Impact = 0.781065


**Q7:** Interpret your results. How does the preprocessing technique in Q5 compare to the suppression technique?

From the metrics, it appears that suppression performs slightly worse than the reweighing technique. The disparate impact is slightly lower than reweighing, and similarly, the mean difference is a bit further from the ideal value of 0.

### 4.2. Bias Mitigation via In-Processing

In-processing methods focus on the model training stage, as compared to pre-processing which focuses on transforming the data prior to model training. Broadly speaking, contemporary in-processing methods are stronger than pre-processing methods.

### Adversarial Debiasing

In this part of the notebook, we will use an in-processing algorithm, called _Adversarial Debiasing_, that we briefly discussed in class. From the aif360 documentation (https://aif360.readthedocs.io/en/v0.2.3/modules/inprocessing.html):

> Adversarial debiasing is an in-processing technique that learns a classifier to maximize prediction accuracy and simultaneously reduce an adversary’s ability to determine the protected attribute from the predictions. This approach leads to a fair classifier as the predictions cannot carry any group discrimination information that the adversary can exploit.

For intuition, you can think of adversarial debiasing as a model with two supervised learning tasks. The first task is to predict an outcome using the training data input. The second task, i.e. the adversary, is to predict a protected feature using these predictions and non-protected features in the training data input. The aim is to maximize the model's ability to carry out the first task (i.e. predict outcomes) while minimizing its ability to carry out the second task (i.e. predict protected features).

We implement adversarial debiasing below:

In [22]:
# reset tensorflow graph
tf.compat.v1.reset_default_graph()

# start tensorflow session
sess = tf.compat.v1.Session()
tf.compat.v1.disable_eager_execution()

# create AdversarialDebiasing model
debiased_model = AdversarialDebiasing(
    privileged_groups = privileged_groups,
    unprivileged_groups = unprivileged_groups,
    scope_name = 'debiased_classifier',
    debias = True,
    sess = sess)

# fit the model to training data
debiased_model.fit(dataset_orig_train)

# make predictions on training and test data
dataset_debiasing_train = debiased_model.predict(dataset_orig_train)
dataset_debiasing_test = debiased_model.predict(dataset_orig_test)

# metrics
metric_dataset_debiasing_test = BinaryLabelDatasetMetric(
    dataset_debiasing_test,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups
  )

# Close session
sess.close()

Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


epoch 0; iter: 0; batch classifier loss: 84.117470; batch adversarial loss: 0.609595
epoch 1; iter: 0; batch classifier loss: 60.385742; batch adversarial loss: 0.599298
epoch 2; iter: 0; batch classifier loss: 68.519501; batch adversarial loss: 0.537759
epoch 3; iter: 0; batch classifier loss: 49.546165; batch adversarial loss: 0.607168
epoch 4; iter: 0; batch classifier loss: 44.690742; batch adversarial loss: 0.563261
epoch 5; iter: 0; batch classifier loss: 76.991425; batch adversarial loss: 0.638900
epoch 6; iter: 0; batch classifier loss: 82.345169; batch adversarial loss: 0.609121
epoch 7; iter: 0; batch classifier loss: 35.181107; batch adversarial loss: 0.590382
epoch 8; iter: 0; batch classifier loss: 23.353128; batch adversarial loss: 0.546886
epoch 9; iter: 0; batch classifier loss: 48.510334; batch adversarial loss: 0.581755
epoch 10; iter: 0; batch classifier loss: 40.626083; batch adversarial loss: 0.529961
epoch 11; iter: 0; batch classifier loss: 33.520020; batch adver

### Fairness Metrics under Adversarial Debiasing

The adversarial debiasing algorithm has built-in methods for the difference in mean outcomes (called ```.mean_difference()```) and disparate impact (called ```.disparate_impact()```). Print these below:

In [23]:
# write your code here
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_debiasing_test.mean_difference())
print("Disparate Impact = %f" % metric_dataset_debiasing_test.disparate_impact())

Difference in mean outcomes between unprivileged and privileged groups = 0.000000
Disparate Impact = 1.000000


**Q8:** Interpret the difference in means and disparate impact for the predicted outcomes under adversarial debiasing. How do these compare to the metrics calculated in Q2 and Q5?

Disparate Impact and the difference in mean outcomes are at their ideal values `0, 1`. This indicates that there is little to no bias and perfect parity between the groups.

| Approach                 | Mu-Diff | Disparate Impact |
|---------------------------|-----------------|------------------|
| LR Model       | -0.303       | 0.523        |
| Reweighing           | -0.123       | 0.810         |
| Adversarial Debiasing | 0        | 1         |

It appears that adversarial debiasing is best suited to mitigating bias when compared to a plain model without interventions, and a preprocessing method like reweighing.

### 4.3. Bias Mitigation via Post-Processing

In this last section, we will use one of the post-processing algorithms in AI Fairness 360 called as **equalized odds postprocessing**, which is implemented in the `EqOddsPostprocessing` class in the `aif360.algorithms.postprocessing` package. This technique solves a linear program to find probabilities with which to change output labels to optimize equalized odds.

You can find documentation for reweighting here:
https://aif360.readthedocs.io/en/latest/modules/generated/aif360.algorithms.postprocessing.EqOddsPostprocessing.html

Call the fit and transform methods to perform the transformation, producing a newly transformed training dataset (```dataset_post_train```):

In [28]:
df_test, dict_df_test = dataset_orig_test.convert_to_dataframe()
df_train, dict_df_train = dataset_orig_train.convert_to_dataframe()

# Fit the model to the training data and predict for test data
x_train = df_train.drop(['credit'], axis=1)
y_train = df_train['credit']

model = GaussianNB()
model.fit(x_train, y_train)

x_test = df_test.drop(['credit'], axis=1)
y_test = df_test['credit']

y_pred = model.predict(x_test)

# dataset_pred_test -- dataset with predictions stored in labels
dataset_pred_test = dataset_orig_test.copy()
# Reshape y_pred, EO object errors out if we do it like we did earlier
dataset_pred_test.labels = y_pred.reshape(-1, 1).copy()

# create Equalized Odds Post processing object
eo_post = EqOddsPostprocessing(unprivileged_groups=unprivileged_groups,
                privileged_groups=privileged_groups)

# fit the object to training data
eo_post.fit(dataset_orig_test, dataset_pred_test)

# make predictions on test data
dataset_post_test = eo_post.predict(dataset_pred_test)


# construct metrics object
metric_dataset_post_test = BinaryLabelDatasetMetric(
    dataset_post_test,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups
)

# compute fairnesss metrics
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_dataset_post_test.mean_difference())
print("Disparate Impact = %f" % metric_dataset_post_test.disparate_impact())

Difference in mean outcomes between unprivileged and privileged groups = -0.002525
Disparate Impact = 0.995238


**Q9:** Interpret the difference in fairness metrics for the predicted outcomes under this post-processing technique. How do these compare to the metrics calculated in Q2, Q5 and Q8?

Mean outcomes are given as `-0.002` while disparate impact is `0.995`. These metrics tell us that there is close to no bias and very close perfect parity between the groups. These are close to the ideal values of 0 and 1 respectively, so we know that this method is also quite effective at mitigating bias.

Comparing other approaches in the table below:

| Approach                 | Mu-Diff | Disparate Impact |
|---------------------------|-----------------|------------------|
| LR Model       | -0.303       | 0.523         |
| Reweighing            | -0.123       | 0.810         |
| Adversarial Debiasing | 0        | 1         |
| Equalized Odds   | -0.002 | 0.995 |

Adversarial debaising and equalized odds are able to mitigate bias more effectively than a regular model trained without bias mitigation or just reweighting. Reweighting is also resonably good, but has a larger mean difference between groups that adversarial debiasing and equalized odds.

# Submitting this Assignment Notebook

Once complete, please submit your assignment notebook as an attachment under \"Assignments > Assignment 4\" on Brightspace. You can download a copy of your notebook using ```File > Download .ipynb```. Please ensure you submit the `.ipynb` file (and not a `.py` file)."