# EthicML

## Runnning experiments on the Adult dataset

### Installation

First we need to install EthicML. Currently, the toolkit isn't on PyPi, but this will change soon.

For now, the toolkit has to be cloned, then installed as an editable package
```
cd <Location to clone to>
git clone https://github.com/predictive-analytics-lab/EthicML.git
cd EthicML
pip install --editable ./
```

(Obviously this notebook is within the package, so we can skip this step)

### Loading the data

EthicML includes some often used datasets from fairness literature.
First, we load one of these... in this example we load the UCI Adult dataset

In [1]:
from ethicml.utility import DataTuple
from ethicml.data.load import load_data
from ethicml.data import Adult, Compas, Credit, German, Sqf, Toy

data: DataTuple = load_data(Adult())
assert (45222, 101) == data.x.shape
assert (45222, 1) == data.s.shape
assert (45222, 1) == data.y.shape

This loads the dataset as a DataTuple, which comprises $x$ (features), $s$ (sensitive attribute and $y$ (class label). Each member of the DataTuple is stored as a Pandas DataFrame.

By default, the Adult dataset uses the binary attribute `sex_Male` as the sensitive feature.

In [2]:
data.s.head()

Unnamed: 0,sex_Male
0,1
1,1
2,1
3,1
4,0


If we want to run experiments using race as the sensitive attribute we could change that manually, or, as this is a common task, EthicML can split the data for you.

In [3]:
data: DataTuple = load_data(Adult(split="Race"))
assert (45222, 98) == data.x.shape
assert (45222, 5) == data.s.shape
assert (45222, 1) == data.y.shape

In [4]:
data.s.head()

Unnamed: 0,race_Amer-Indian-Eskimo,race_Asian-Pac-Islander,race_Black,race_Other,race_White
0,0,0,0,0,1
1,0,0,0,0,1
2,0,0,0,0,1
3,0,0,1,0,0
4,0,0,0,0,1


However, we're going to be repeating some of the experiments from FairGP. In that paper they do experiments with race as the sensitive attribute, but the value is binary. The value of race is White or Not_White.

Fortunately, race has been one-hot-encoded so to replicate this we can just drop the features from the sensitive attribute that aren't `race_White`.

The Dataset class is really just a guide that tells EthicML how to read the underlying CSV. So to remove the other race attributes, we can just not include them in our list of sensitive attribute columns.

In [5]:
dataset = Adult("Race")
dataset.sens_attrs = ["race_White"]
data = load_data(dataset)

In [6]:
data.s.head()

Unnamed: 0,race_White
0,1
1,1
2,1
3,0
4,1


### Evaluating some models

In [9]:
from ethicml.algorithms.inprocess import Agarwal, InAlgorithm, LR, SVM, Kamishima, Kamiran
from ethicml.algorithms.preprocess import Upsampler
from ethicml.metrics import Accuracy, CV, TPR, ProbPos
from ethicml.evaluators import evaluate_models

datasets = [dataset, Toy()]
preprocess_models = [Upsampler()]
# inprocess_models = [Agarwal(), Kamishima(), LR(), SVM(kernel='linear'), Kamiran()]
inprocess_models = [LR(), SVM(kernel='linear'), Kamiran()]
postprocess_models = []
metrics = [Accuracy(), CV()]
per_sens_metrics = [Accuracy(), TPR(), ProbPos()]

In [10]:
test123 = evaluate_models(datasets, preprocess_models, inprocess_models, postprocess_models, metrics, per_sens_metrics, test_mode=False, repeats=2)

100%|██████████| 28/28 [00:26<00:00,  3.33it/s, task=Kamiran & Calders LR, dataset=Toy, transform=Upsample, repeat=1]      


In [12]:
test123

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Accuracy,CV,Accuracy_race_White_0,Accuracy_race_White_0-race_White_1,Accuracy_race_White_0/race_White_1,Accuracy_race_White_1,TPR_race_White_0,TPR_race_White_0-race_White_1,TPR_race_White_0/race_White_1,TPR_race_White_1,...,Accuracy_s_0/s_1,Accuracy_s_1,TPR_s_0,TPR_s_0-s_1,TPR_s_0/s_1,TPR_s_1,prob_pos_s_0,prob_pos_s_0-s_1,prob_pos_s_0/s_1,prob_pos_s_1
dataset,transform,model,repeat,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
Adult,no_transform,Agarwal LR,0-2410,0.851410,0.916525,0.896743,0.052664,0.941272,0.844079,0.602804,0.022318,0.964297,0.625122,...,,,,,,,,,,
Adult,no_transform,Kamishima,0-2410,0.849420,0.905814,0.888006,0.044826,0.949520,0.843180,0.546729,0.077416,0.875965,0.624145,...,,,,,,,,,,
Adult,no_transform,Logistic Regression,0-2410,0.850304,0.928949,0.893566,0.050258,0.943756,0.843309,0.602804,0.003257,0.994626,0.606061,...,,,,,,,,,,
Adult,no_transform,SVM,0-2410,0.861913,0.938460,0.901509,0.045999,0.948975,0.855510,0.588785,0.001786,0.996967,0.586999,...,,,,,,,,,,
Adult,no_transform,Kamiran & Calders LR,0-2410,0.850525,0.926266,0.896743,0.053692,0.940126,0.843052,0.626168,0.006422,0.989743,0.619746,...,,,,,,,,,,
Adult,no_transform,Agarwal LR,0-2410,0.851741,0.918604,0.897538,0.053202,0.940725,0.844336,0.607477,0.015691,0.974821,0.623167,...,,,,,,,,,,
Adult,no_transform,Logistic Regression,0-2410,0.850415,0.928155,0.894361,0.051052,0.942918,0.843309,0.602804,0.003257,0.994626,0.606061,...,,,,,,,,,,
Adult,no_transform,SVM,0-2410,0.851962,0.914970,0.898332,0.053868,0.940036,0.844464,0.593458,0.026288,0.957583,0.619746,...,,,,,,,,,,
Adult,no_transform,Kamiran & Calders LR,0-2410,0.850636,0.930913,0.894361,0.050795,0.943205,0.843565,0.621495,0.008103,0.986962,0.613392,...,,,,,,,,,,
Adult,Upsample,Agarwal LR,0-2410,0.799005,0.865892,0.854647,0.064639,0.924368,0.790008,0.869159,0.001809,0.997923,0.870968,...,,,,,,,,,,


In [11]:
test123.groupby(level=[0,1,2]).agg(['mean', 'std'])[['Accuracy', 'Accuracy_race_White_0/race_White_1', 'TPR_race_White_0/race_White_1', 'prob_pos_race_White_0/race_White_1']]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Accuracy,Accuracy,Accuracy_race_White_0/race_White_1,Accuracy_race_White_0/race_White_1,TPR_race_White_0/race_White_1,TPR_race_White_0/race_White_1,prob_pos_race_White_0/race_White_1,prob_pos_race_White_0/race_White_1
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,mean,std,mean,std,mean,std,mean,std
dataset,transform,model,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
Adult,Upsample,Agarwal LR,0.798355,0.001289,0.928318,0.006361,0.99238,0.007494,0.671727,0.00515
Adult,Upsample,Kamiran & Calders LR,0.77958,0.015777,0.915295,0.00551,0.977441,0.012051,0.653193,0.010896
Adult,Upsample,Logistic Regression,0.780228,0.016157,0.915456,0.005604,0.977056,0.012277,0.65257,0.01126
Adult,Upsample,SVM,0.793694,0.000385,0.925306,0.003082,0.978822,0.00092,0.659572,0.003557
Adult,no_transform,Agarwal LR,0.85018,0.003217,0.940825,0.000616,0.974035,0.008606,0.632836,0.011559
Adult,no_transform,Kamiran & Calders LR,0.849458,0.002452,0.941908,0.001486,0.989484,0.003002,0.672548,0.006965
Adult,no_transform,Kamishima,0.84942,0.0,0.94952,0.0,0.875965,0.0,0.575863,0.0
Adult,no_transform,Logistic Regression,0.849005,0.003208,0.943269,0.001007,0.989815,0.008957,0.661139,0.007127
Adult,no_transform,SVM,0.853933,0.00649,0.943321,0.004103,0.974201,0.017703,0.639658,0.029777
Toy,Upsample,Agarwal LR,0.849167,0.011277,,,,,,


In [9]:
from ethicml.algorithms.inprocess import Agarwal, InAlgorithm, LR, SVM, Kamishima
from ethicml.algorithms.preprocess import Kamiran
from ethicml.metrics import Accuracy, CV, TPR, ProbPos
from ethicml.evaluators.evaluate_models import evaluate_models

datasets = [Adult()]
preprocess_models = []
inprocess_models = [Agarwal(), Kamishima(), LR(), SVM()]
postprocess_models = []
metrics = [Accuracy(), CV()]
per_sens_metrics = [Accuracy(), TPR(), ProbPos()]
test123 = evaluate_models(datasets, preprocess_models, inprocess_models, postprocess_models, metrics, per_sens_metrics, test_mode=False, repeats=10)

 98%|█████████▊| 50/51 [41:59<00:50, 50.40s/it]


In [13]:
test123.groupby(level=[0,1,2]).agg(['mean', 'std'])[['Accuracy', 'Accuracy_sex_Male_0/sex_Male_1', 'TPR_sex_Male_0/sex_Male_1', 'prob_pos_sex_Male_0/sex_Male_1']]


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Accuracy,Accuracy,Accuracy_sex_Male_0/sex_Male_1,Accuracy_sex_Male_0/sex_Male_1,TPR_sex_Male_0/sex_Male_1,TPR_sex_Male_0/sex_Male_1,prob_pos_sex_Male_0/sex_Male_1,prob_pos_sex_Male_0/sex_Male_1
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,mean,std,mean,std,mean,std,mean,std
dataset,transform,model,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
Adult,no_transform,Agarwal,0.848572,0.003176,1.13996,0.006937,0.931315,0.08621,0.354343,0.040643
Adult,no_transform,Kamishima,0.850384,0.002961,1.137531,0.007479,0.815207,0.037357,0.29161,0.01369
Adult,no_transform,Logistic Regression,0.850097,0.003253,1.138135,0.007381,0.825997,0.034157,0.297058,0.014001
Adult,no_transform,SVM,0.864725,0.002844,1.117742,0.006871,0.757971,0.044756,0.26067,0.018487
