# EthicML

## Runnning experiments on the Adult dataset

### Installation

First we need to install EthicML. Currently, the toolkit isn't on PyPi, but this will change soon.

For now, the toolkit has to be cloned, then installed as an editable package
```
cd <Location to clone to>
git clone https://github.com/predictive-analytics-lab/EthicML.git
cd EthicML
pip install --editable ./
```

(Obviously this notebook is within the package, so we can skip this step)

### Loading the data

EthicML includes some often used datasets from fairness literature.
First, we load one of these... in this example we load the UCI Adult dataset

In [1]:
import ethicml as em

data: em.DataTuple = em.adult().load()
assert (45222, 101) == data.x.shape
assert (45222, 1) == data.s.shape
assert (45222, 1) == data.y.shape

This loads the dataset as a DataTuple, which comprises $x$ (features), $s$ (sensitive attribute and $y$ (class label). Each member of the DataTuple is stored as a Pandas DataFrame.

By default, the Adult dataset uses the binary attribute `sex_Male` as the sensitive feature.

In [2]:
data.s.head()

Unnamed: 0,sex_Male
0,1
1,1
2,1
3,1
4,0


If we want to run experiments using race as the sensitive attribute we could change that manually, or, as this is a common task, EthicML can split the data for you.

In [3]:
data: em.DataTuple = em.adult(split="Race").load()
assert (45222, 98) == data.x.shape
assert (45222, 1) == data.s.shape
assert (45222, 1) == data.y.shape

In [4]:
data.s.head()

Unnamed: 0,race
0,4
1,4
2,4
3,2
4,4


However, we're going to be repeating some of the experiments from FairGP. In that paper they do experiments with race as the sensitive attribute, but the value is binary. The value of race is White or Not_White.

Fortunately, race has been one-hot-encoded so to replicate this we can just drop the features from the sensitive attribute that aren't `race_White`.

The Dataset class is really just a guide that tells EthicML how to read the underlying CSV. So to remove the other race attributes, we can just not include them in our list of sensitive attribute columns.

In [5]:
from dataclasses import replace

dataset = em.adult("Race")
dataset = replace(dataset, sens_attr_spec="race_White")
data = em.load_data(dataset)

In [6]:
data.s.head()

Unnamed: 0,race_White
0,1
1,1
2,1
3,0
4,1


### Evaluating some models

In [7]:
datasets = [dataset, em.toy()]
preprocess_models = [em.Upsampler()]
# inprocess_models = [em.Agarwal(), em.Kamishima(), em.LR(), em.SVM(kernel='linear'), em.Kamiran()]
inprocess_models = [em.LR(), em.SVM(kernel='linear'), em.Kamiran()]
postprocess_models = []
metrics = [em.Accuracy(), em.CV()]
per_sens_metrics = [em.Accuracy(), em.TPR(), em.ProbPos()]

In [8]:
test123 = em.evaluate_models(datasets, preprocess_models, inprocess_models, postprocess_models, metrics, per_sens_metrics, test_mode=False, repeats=2)

100%|██████████| 28/28 [00:13<00:00,  2.12it/s, model=Kamiran & Calders LR, dataset=Toy, transform=Upsample uniform, repeat=1]              


In [9]:
test123

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Accuracy,Accuracy_race_White_0,Accuracy_race_White_0-race_White_1,Accuracy_race_White_0/race_White_1,Accuracy_race_White_1,CV,TPR_race_White_0,TPR_race_White_0-race_White_1,TPR_race_White_0/race_White_1,TPR_race_White_1,...,Accuracy_sensitive-attr_0/sensitive-attr_1,Accuracy_sensitive-attr_1,TPR_sensitive-attr_0,TPR_sensitive-attr_0-sensitive-attr_1,TPR_sensitive-attr_0/sensitive-attr_1,TPR_sensitive-attr_1,prob_pos_sensitive-attr_0,prob_pos_sensitive-attr_0-sensitive-attr_1,prob_pos_sensitive-attr_0/sensitive-attr_1,prob_pos_sensitive-attr_1
dataset,transform,model,split_id,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
Adult Race,no_transform,"Logistic Regression, C=1.0",0,0.85141,0.896743,0.052664,0.941272,0.844079,0.916525,0.602804,0.022318,0.964297,0.625122,...,,,,,,,,,,
Adult Race,no_transform,SVM,0,0.851962,0.898332,0.053868,0.940036,0.844464,0.91497,0.593458,0.026288,0.957583,0.619746,...,,,,,,,,,,
Adult Race,no_transform,"Logistic Regression, C=1.0",1,0.841902,0.884365,0.049134,0.944442,0.835231,0.931427,0.553922,0.016474,0.971119,0.570395,...,,,,,,,,,,
Adult Race,no_transform,SVM,1,0.843449,0.888436,0.052054,0.941409,0.836382,0.929112,0.558824,0.0135,0.976411,0.572324,...,,,,,,,,,,
Adult Race,no_transform,Logistic Regression (C=1.0),0,0.85141,0.897538,0.053587,0.940295,0.843951,0.91955,0.61215,0.011506,0.98155,0.623656,...,,,,,,,,,,
Adult Race,no_transform,SVM (linear),0,0.851962,0.898332,0.053868,0.940036,0.844464,0.91497,0.593458,0.026288,0.957583,0.619746,...,,,,,,,,,,
Adult Race,no_transform,Kamiran & Calders LR,0,0.850636,0.894361,0.050795,0.943205,0.843565,0.930913,0.621495,0.008103,0.986962,0.613392,...,,,,,,,,,,
Adult Race,no_transform,Logistic Regression (C=1.0),1,0.842676,0.889251,0.053892,0.939396,0.835359,0.925321,0.563725,0.015349,0.973494,0.579074,...,,,,,,,,,,
Adult Race,no_transform,SVM (linear),1,0.843449,0.888436,0.052054,0.941409,0.836382,0.929112,0.558824,0.0135,0.976411,0.572324,...,,,,,,,,,,
Adult Race,no_transform,Kamiran & Calders LR,1,0.844002,0.887622,0.050472,0.943138,0.83715,0.932603,0.578431,0.002572,0.995574,0.581003,...,,,,,,,,,,


In [10]:
test123.groupby(level=[0,1,2]).agg(['mean', 'std'])[['Accuracy', 'Accuracy_race_White_0/race_White_1', 'TPR_race_White_0/race_White_1', 'prob_pos_race_White_0/race_White_1']]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Accuracy,Accuracy,Accuracy_race_White_0/race_White_1,Accuracy_race_White_0/race_White_1,TPR_race_White_0/race_White_1,TPR_race_White_0/race_White_1,prob_pos_race_White_0/race_White_1,prob_pos_race_White_0/race_White_1
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,mean,std,mean,std,mean,std,mean,std
dataset,transform,model,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
Adult Race,Upsample uniform,Kamiran & Calders LR,0.846103,0.004596,0.943158,0.004163,0.995934,0.001713,0.678003,0.022108
Adult Race,Upsample uniform,Logistic Regression (C=1.0),0.846324,0.00434,0.941924,0.002738,0.99188,0.005797,0.674693,0.009641
Adult Race,Upsample uniform,SVM (linear),0.845384,0.005553,0.942148,0.001691,0.963814,0.018791,0.708329,0.022138
Adult Race,no_transform,Kamiran & Calders LR,0.847319,0.00383,0.943171,3.9e-05,0.991268,0.004972,0.676381,0.004406
Adult Race,no_transform,Logistic Regression (C=1.0),0.847043,0.005043,0.939846,0.000519,0.977522,0.004651,0.637285,0.001443
Adult Race,no_transform,"Logistic Regression, C=1.0",0.846656,0.006723,0.942857,0.002242,0.967708,0.004823,0.64209,0.02637
Adult Race,no_transform,SVM,0.847706,0.00602,0.940723,0.000971,0.966997,0.013314,0.629927,0.02706
Adult Race,no_transform,SVM (linear),0.847706,0.004915,0.940723,0.000793,0.966997,0.010871,0.629927,0.022094
Toy,Upsample uniform,Kamiran & Calders LR,0.9875,0.014434,,,,,,
Toy,Upsample uniform,Logistic Regression (C=1.0),0.9875,0.014434,,,,,,


In [None]:
datasets = [em.adult()]
preprocess_models = []
inprocess_models = [em.Agarwal(), em.Kamishima(), em.LR(), em.SVM()]
postprocess_models = []
metrics = [em.Accuracy(), em.CV()]
per_sens_metrics = [em.Accuracy(), em.TPR(), em.ProbPos()]
test123 = await em.evaluate_models_async(datasets, preprocess_models, inprocess_models, postprocess_models, metrics, per_sens_metrics, test_mode=False, repeats=10)

synchronous algorithms...


 65%|██████▌   | 13/20 [06:51<03:41, 31.66s/it, model=SVM, dataset=Adult Sex - Train (3)]                        

In [None]:
test123.groupby(level=[0,1,2]).agg(['mean', 'std'])[['Accuracy', 'Accuracy_sex_Male_0/sex_Male_1', 'TPR_sex_Male_0/sex_Male_1', 'prob_pos_sex_Male_0/sex_Male_1']]
