# Notebook Outline

The purpose of this notebook is to implement "fairness" evaluation metrics. These ccan be used to evaluate prediction fairness with respect to a sensitive feature.<br><br>

### This notebook includes the following sections:
1. **Data Ingestion:** Ingest and prepare law-school prediction results from the following models:
  1. Full
  2. Unaware
  3. Counterfactually fair
  4. Individually fair <br><br>
2. **Metric Implementation:** 
  1. Demographic parity
  2. Equality of opportunity<br><br>
3. **Model Evaluation:** Evaluate predictions using ETT

** Note: Sections 1 and 3 should be re-used in other pipelines

## Section 1: Data Ingestion

In [1]:
# Import libraries

import pandas as pd
import numpy as np

import warnings
warnings.filterwarnings('ignore')

In [2]:
# Connect to Google Drive (to download raw data, upload clean data)

!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# Authenticate and create the PyDrive client.

auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

In [3]:
# Download Law School datasets (these are already augmented with Lily's predictions!):

csv_files = {
  'law_school_train': '1eNiQgZUyxL7Fu5zN80EYrdqja8xesKd1',
  'law_school_test': '1g8NNc3OPwLnqskkNkz8EocvDiC-z_sWw',
  'law_school_counterfactual': '1LZ8Pqfh86X8FXSOvojpcW3kvEey3HAAq',
}

dfs = {}

for key, value in csv_files.items():
  csv_name = key + '.csv'
  downloaded = drive.CreateFile({'id': value})
  downloaded.GetContentFile(csv_name)
  dfs[key] = pd.read_csv(csv_name, low_memory=False, index_col=0) # Re-use the original index 
  print("Saved: ", key, "\n")

ls_train = dfs['law_school_train']
ls_test = dfs['law_school_test']
ls_cf = dfs['law_school_counterfactual']

Saved:  law_school_train 

Saved:  law_school_test 

Saved:  law_school_counterfactual 



In [4]:
ls_train

Unnamed: 0,LSAT,UGPA,region_first,sander_index,first_pf,Amerindian,Asian,Black,Hispanic,Mexican,Other,Puertorican,White,female,male,ZFYA,Knowledge,Init_class,Fair_pred,Fair_pred_class,full_pred,full_pred_class,unaware_pred,unaware_pred_class
9543,31,3.4,GL,0.711310,1.0,0,0,0,0,0,0,0,1,0,1,-1.45,0.055840,-1.0,0.087859,1.0,0.053080,1.0,-0.100215,-1.0
11945,36,3.1,GL,0.745238,1.0,0,0,0,0,0,0,0,1,0,1,-0.01,0.263451,-1.0,0.047502,1.0,0.142441,1.0,0.031020,1.0
19551,36,3.5,GL,0.783333,1.0,0,0,0,0,0,0,0,1,1,0,-0.31,-0.090122,-1.0,0.116232,1.0,0.204204,1.0,0.144656,1.0
17378,29,3.1,SC,0.657738,1.0,0,0,0,0,0,0,0,1,1,0,-0.40,0.949375,-1.0,-0.085834,-1.0,-0.106896,-1.0,-0.272026,-1.0
17283,23,2.3,Mt,0.512798,1.0,0,0,0,0,1,0,0,0,0,1,-1.38,1.603534,-1.0,-0.212994,-1.0,-0.964055,-1.0,-0.759050,-1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11964,33,3.1,GL,0.707738,0.0,0,0,0,0,0,0,0,1,1,0,-0.51,0.604640,-1.0,-0.018821,-1.0,0.019030,1.0,-0.098857,-1.0
21576,38,2.7,FW,0.732143,1.0,0,0,0,1,0,0,0,0,0,1,-1.21,0.700493,-1.0,-0.037454,-1.0,-0.270339,-1.0,0.003969,1.0
5390,47,3.8,SC,0.949405,1.0,0,0,0,0,0,0,0,1,0,1,1.27,-1.166616,1.0,0.325489,1.0,0.647515,1.0,0.706097,1.0
860,37,2.6,GL,0.710119,1.0,0,0,0,0,0,0,0,1,0,1,0.06,1.248656,1.0,-0.144010,-1.0,0.060511,1.0,-0.067732,-1.0


## Section 2: Metric Implementation

In [5]:
def neg_to_zero(val):
    """
    Simple helper function to clean up input DF
    """
    if val == -1.0:
        return 0
    else:
        return val


### 1. Demographic Parity

A predictor Y satisfies *Demographic Parity* (with respect to a sensitive attribute A) if:

$P(\hat{Y}|A=0) = P(\hat{Y}|A=1)$ ,

In practice, we don't expect these quantities to be exactly equal, but we can define our Demographic Parity metric as the absolute value of the difference between them, as below:

$DP = | P(\hat{Y}|A=0) - P(\hat{Y}|A=1)  |$ 

In [6]:
def demographic_parity(df, label_col, prediction_col, sensitive_col, cols_to_condition_on=None):
  """
  Inputs:
  Output:
  """
  if cols_to_condition_on:
    cols_to_keep = [label_col, prediction_col, sensitive_col, cols_to_condition_on]
  else:
    cols_to_keep = [label_col, prediction_col, sensitive_col]

  # Clean up dataframe
  df = df[cols_to_keep]
  df[label_col] = df[label_col].apply(neg_to_zero)
  df[prediction_col] = df[prediction_col].apply(neg_to_zero)
  
  # Split by sensitive variable
  non_sensitive = df[df[sensitive_col]==0]
  p_yhat_non_sensitive = non_sensitive[prediction_col].sum() / non_sensitive[prediction_col].count()

  sensitive = df[df[sensitive_col]!=0]
  p_yhat_sensitive = sensitive[prediction_col].sum() / sensitive[prediction_col].count()

  # Return final metric
  probability_difference = abs(p_yhat_sensitive - p_yhat_non_sensitive)
  
  return probability_difference



### 2. Equality of opportunity

A predictor Y satisfies *Equality of Opportunity* (with respect to a sensitive attribute A) if:

$P(\hat{Y}=1 | A=0,Y=1) = P(\hat{Y}=1 | A=1,Y=1)$ ,

As with *Demographic Parity*, we don't expect these quantities to be exactly equal, but we can define our *Equality of Opportunity* metric as the absolute value of the difference between them, as below:


$EO = | P(\hat{Y}=1 | A=0,Y=1) - P(\hat{Y}=1 | A=1,Y=1)  |$ 

In [7]:
def equality_of_opportunity(df, label_col, prediction_col, sensitive_col, cols_to_condition_on=None):
  """
  Inputs:
  Output:
  """
  if cols_to_condition_on:
    cols_to_keep = [label_col, prediction_col, sensitive_col, cols_to_condition_on]
  else:
    cols_to_keep = [label_col, prediction_col, sensitive_col]

  # Clean up dataframe
  df = df[cols_to_keep]
  df[label_col] = df[label_col].apply(neg_to_zero)
  df[prediction_col] = df[prediction_col].apply(neg_to_zero)
  
  # Condition on Y=1
  df = df[df[label_col]==1.0]


  # Split by sensitive variable
  non_sensitive = df[df[sensitive_col]==0]
  p_yhat_non_sensitive = non_sensitive[prediction_col].sum() / non_sensitive[prediction_col].count()

  sensitive = df[df[sensitive_col]!=0]
  p_yhat_sensitive = sensitive[prediction_col].sum() / sensitive[prediction_col].count()

  # Return final metric
  probability_difference = abs(p_yhat_sensitive - p_yhat_non_sensitive)
  
  return (probability_difference)


## Section 3: Model Evaluation

In [8]:
# Evaluate it on a subset of your data

full_demographic_parity = demographic_parity(df=ls_test, label_col='Init_class', prediction_col='full_pred_class', sensitive_col='female')
unaware_demographic_parity = demographic_parity(df=ls_test, label_col='Init_class', prediction_col='unaware_pred_class', sensitive_col='female')
fair_demographic_parity = demographic_parity(df=ls_test, label_col='Init_class', prediction_col='Fair_pred_class', sensitive_col='female')

print("Full demographic parity: ", full_demographic_parity)
print("Unaware demographic parity: ", unaware_demographic_parity)
print("Fair demographic parity: ", fair_demographic_parity)

print("\nThe unaware model is most fair by this metric! The fair model is second.")

Full demographic parity:  0.04088897275783987
Unaware demographic parity:  0.010939322720098899
Fair demographic parity:  0.006798960406230736

The unaware model is most fair by this metric! The fair model is second.


In [9]:
full_eq_of_op = equality_of_opportunity(df=ls_test, label_col='Init_class', prediction_col='full_pred_class', sensitive_col='female')
unaware_eq_of_op = equality_of_opportunity(df=ls_test, label_col='Init_class', prediction_col='unaware_pred_class', sensitive_col='female')
fair_eq_of_op = equality_of_opportunity(df=ls_test, label_col='Init_class', prediction_col='Fair_pred_class', sensitive_col='female')

print("Full equality of opportunity: ", full_eq_of_op)
print("Unaware equality of opportunity: ", unaware_eq_of_op)
print("Fair equality of opportunity: ", fair_eq_of_op)

print("\nThe full model is most fair by this metric! The fair model is second.")

Full equality of opportunity:  0.027013106294906164
Unaware equality of opportunity:  0.0204839428818121
Fair equality of opportunity:  0.0025312991721967437

The full model is most fair by this metric! The fair model is second.


In [10]:
results_df = pd.DataFrame({
    'model': ['Full','Unaware','Latent Fair Variable'],
    'Demographic_Parity': [full_demographic_parity, unaware_demographic_parity, fair_demographic_parity],
    'Equality_of_Opportunity': [full_eq_of_op, unaware_eq_of_op, fair_eq_of_op],
    })

In [11]:
results_df

Unnamed: 0,model,Demographic_Parity,Equality_of_Opportunity
0,Full,0.040889,0.027013
1,Unaware,0.010939,0.020484
2,Latent Fair Variable,0.006799,0.002531


In [14]:
# ls_train.groupby('female').count()
ls_test.groupby('female').count()

Unnamed: 0_level_0,LSAT,UGPA,region_first,sander_index,first_pf,Amerindian,Asian,Black,Hispanic,Mexican,Other,Puertorican,White,male,ZFYA,Knowledge,Init_class,Fair_pred,Fair_pred_class,full_pred,full_pred_class,unaware_pred,unaware_pred_class
female,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
0,2470,2470,2470,2470,2470,2470,2470,2470,2470,2470,2470,2470,2470,2470,2470,2470,2470,2470,2470,2470,2470,2470,2470
1,1888,1888,1888,1888,1888,1888,1888,1888,1888,1888,1888,1888,1888,1888,1888,1888,1888,1888,1888,1888,1888,1888,1888


In [15]:
2470 / 1888

1.3082627118644068

In [16]:
1 / 1.308

0.764525993883792