# EthicML

## Runnning experiments on the Adult dataset

### Installation

First we need to install EthicML. Currently, the toolkit isn't on PyPi, but this will change soon.

For now, the toolkit has to be cloned, then installed as an editable package
```
cd <Location to clone to>
git clone https://github.com/predictive-analytics-lab/EthicML.git
cd EthicML
pip install --editable ./
```

(Obviously this notebook is within the package, so we can skip this step)

### Loading the data

EthicML includes some often used datasets from fairness literature.
First, we load one of these... in this example we load the UCI Adult dataset

In [2]:
import ethicml as em

data: em.DataTuple = em.adult().load()
assert data.x.shape == (45222, 101)
assert data.s.shape == (45222, 1)
assert data.y.shape == (45222, 1)

This loads the dataset as a DataTuple, which comprises $x$ (features), $s$ (sensitive attribute and $y$ (class label). Each member of the DataTuple is stored as a Pandas DataFrame.

By default, the Adult dataset uses the binary attribute `sex_Male` as the sensitive feature.

In [3]:
data.s.head()

Unnamed: 0,sex_Male
0,1
1,1
2,1
3,1
4,0


If we want to run experiments using race as the sensitive attribute we could change that manually, or, as this is a common task, EthicML can split the data for you.

In [4]:
data: em.DataTuple = em.adult(split="Race").load()
assert data.x.shape == (45222, 98)
assert data.s.shape == (45222, 1)
assert data.y.shape == (45222, 1)

In [5]:
data.s.head()

Unnamed: 0,race
0,4
1,4
2,4
3,2
4,4


However, we're going to be repeating some of the experiments from FairGP. In that paper they do experiments with race as the sensitive attribute, but the value is binary. The value of race is White or Not_White.

Fortunately, race has been one-hot-encoded so to replicate this we can just drop the features from the sensitive attribute that aren't `race_White`.

The Dataset class is really just a guide that tells EthicML how to read the underlying CSV. So to remove the other race attributes, we can just not include them in our list of sensitive attribute columns.

In [6]:
from dataclasses import replace

dataset = em.adult("Race")
dataset = replace(dataset, sens_attr_spec="race_White")
data = em.load_data(dataset)

In [7]:
data.s.head()

Unnamed: 0,race_White
0,1
1,1
2,1
3,0
4,1


### Evaluating some models

In [8]:
datasets = [dataset, em.toy()]
preprocess_models = [em.Upsampler()]
# inprocess_models = [em.Agarwal(), em.Kamishima(), em.LR(), em.SVM(kernel='linear'), em.Kamiran()]
inprocess_models = [em.LR(), em.SVM(kernel='linear'), em.Kamiran()]
postprocess_models = []
metrics = [em.Accuracy(), em.CV()]
per_sens_metrics = [em.Accuracy(), em.TPR(), em.ProbPos()]

In [9]:
test123 = em.evaluate_models(
    datasets,
    preprocess_models,
    inprocess_models,
    postprocess_models,
    metrics,
    per_sens_metrics,
    test_mode=False,
    repeats=2,
)

100%|██████████| 28/28 [00:11<00:00,  2.42it/s, model=Kamiran & Calders LR, dataset=Toy, transform=Upsample uniform, repeat=1]


In [10]:
test123

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Accuracy,Accuracy_race_White_0,Accuracy_race_White_0-race_White_1,Accuracy_race_White_0/race_White_1,Accuracy_race_White_1,CV,TPR_race_White_0,TPR_race_White_0-race_White_1,TPR_race_White_0/race_White_1,TPR_race_White_1,...,Accuracy_sensitive-attr_0/sensitive-attr_1,Accuracy_sensitive-attr_1,TPR_sensitive-attr_0,TPR_sensitive-attr_0-sensitive-attr_1,TPR_sensitive-attr_0/sensitive-attr_1,TPR_sensitive-attr_1,prob_pos_sensitive-attr_0,prob_pos_sensitive-attr_0-sensitive-attr_1,prob_pos_sensitive-attr_0/sensitive-attr_1,prob_pos_sensitive-attr_1
dataset,transform,model,split_id,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
Adult Race,no_transform,"Logistic Regression, C=1.0",0,0.851410,0.897538,0.053587,0.940295,0.843951,0.919550,0.612150,0.011506,0.981550,0.623656,...,,,,,,,,,,
Adult Race,no_transform,SVM,0,0.851962,0.898332,0.053868,0.940036,0.844464,0.914970,0.593458,0.026288,0.957583,0.619746,...,,,,,,,,,,
Adult Race,no_transform,Kamiran & Calders LR,0,0.850525,0.895155,0.051846,0.942081,0.843309,0.931660,0.621495,0.011525,0.981457,0.609971,...,,,,,,,,,,
Adult Race,no_transform,"Logistic Regression, C=1.0",0,0.851410,0.897538,0.053587,0.940295,0.843951,0.919550,0.612150,0.011506,0.981550,0.623656,...,,,,,,,,,,
Adult Race,no_transform,SVM,0,0.851962,0.898332,0.053868,0.940036,0.844464,0.914970,0.593458,0.026288,0.957583,0.619746,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Toy,Upsample uniform,SVM (linear),0,0.950000,,,,,0.677945,,,,,...,0.953668,0.928571,1.0,0.035714,0.964286,0.964286,0.368421,0.322055,0.533575,0.690476
Toy,Upsample uniform,Kamiran & Calders LR,0,0.975000,,,,,0.627820,,,,,...,0.952381,0.952381,1.0,0.000000,1.000000,1.000000,0.342105,0.372180,0.478947,0.714286
Toy,Upsample uniform,Logistic Regression (C=1.0),1,1.000000,,,,,0.818296,,,,,...,1.000000,1.000000,1.0,0.000000,1.000000,1.000000,0.476190,0.181704,0.723810,0.657895
Toy,Upsample uniform,SVM (linear),1,1.000000,,,,,0.818296,,,,,...,1.000000,1.000000,1.0,0.000000,1.000000,1.000000,0.476190,0.181704,0.723810,0.657895


In [11]:
test123.groupby(level=[0, 1, 2]).agg(['mean', 'std'])[
    [
        'Accuracy',
        'Accuracy_race_White_0/race_White_1',
        'TPR_race_White_0/race_White_1',
        'prob_pos_race_White_0/race_White_1',
    ]
]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Accuracy,Accuracy,Accuracy_race_White_0/race_White_1,Accuracy_race_White_0/race_White_1,TPR_race_White_0/race_White_1,TPR_race_White_0/race_White_1,prob_pos_race_White_0/race_White_1,prob_pos_race_White_0/race_White_1
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,mean,std,mean,std,mean,std,mean,std
dataset,transform,model,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
Adult Race,Upsample uniform,Kamiran & Calders LR,0.846282,0.004343,0.943993,0.003641,0.993049,0.006444,0.678947,0.01653
Adult Race,Upsample uniform,Logistic Regression (C=1.0),0.846103,0.004596,0.943158,0.004163,0.992139,0.006096,0.681747,0.017785
Adult Race,Upsample uniform,"Logistic Regression, C=1.0",0.846213,0.004472,0.942541,0.003595,0.992009,0.00595,0.67822,0.014873
Adult Race,Upsample uniform,SVM,0.845357,0.005522,0.942112,0.00165,0.963621,0.01857,0.70843,0.022022
Adult Race,Upsample uniform,SVM (linear),0.845384,0.005553,0.942148,0.001691,0.963814,0.018791,0.708329,0.022138
Adult Race,no_transform,Kamiran & Calders LR,0.847626,0.003439,0.942877,0.00047,0.989479,0.006133,0.677023,0.004261
Adult Race,no_transform,Logistic Regression (C=1.0),0.847485,0.004532,0.939919,0.000435,0.985575,0.004648,0.644207,0.009436
Adult Race,no_transform,"Logistic Regression, C=1.0",0.847651,0.005249,0.942011,0.002222,0.976032,0.005259,0.644771,0.01476
Adult Race,no_transform,SVM,0.848557,0.004663,0.940585,0.000752,0.965114,0.010313,0.6261,0.020961
Adult Race,no_transform,SVM (linear),0.847706,0.004915,0.940723,0.000793,0.966997,0.010871,0.629927,0.022094


In [12]:
datasets = [em.adult()]
preprocess_models = []
inprocess_models = [em.Agarwal(), em.Kamishima(), em.LR(), em.SVMAsync()]
postprocess_models = []
metrics = [em.Accuracy(), em.CV()]
per_sens_metrics = [em.Accuracy(), em.TPR(), em.ProbPos()]
test123 = await em.evaluate_models_async(
    datasets,
    preprocess_models,
    inprocess_models,
    postprocess_models,
    metrics,
    per_sens_metrics,
    test_mode=False,
    repeats=10,
    max_parallel=3,
)

synchronous algorithms...


100%|██████████| 10/10 [00:04<00:00,  2.46it/s, model=Logistic Regression (C=1.0), dataset=Adult Sex - Train (9)]

asynchronous algorithms...



100%|██████████| 30/30 [16:53<00:00, 33.78s/it, model=SVM, dataset=Adult Sex - Train (9), worker_id=1]


In [13]:
test123.groupby(level=[0, 1, 2]).agg(['mean', 'std'])[
    [
        'Accuracy',
        'Accuracy_sex_Male_0/sex_Male_1',
        'TPR_sex_Male_0/sex_Male_1',
        'prob_pos_sex_Male_0/sex_Male_1',
    ]
]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Accuracy,Accuracy,Accuracy_sex_Male_0/sex_Male_1,Accuracy_sex_Male_0/sex_Male_1,TPR_sex_Male_0/sex_Male_1,TPR_sex_Male_0/sex_Male_1,prob_pos_sex_Male_0/sex_Male_1,prob_pos_sex_Male_0/sex_Male_1
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,mean,std,mean,std,mean,std,mean,std
dataset,transform,model,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
Adult Sex,no_transform,Agarwal LR,0.846136,0.003351,0.881397,0.006163,0.952476,0.035648,0.40037,0.0349
Adult Sex,no_transform,"Agarwal, LR, DP",0.8459,0.002629,0.879809,0.006846,0.942288,0.044544,0.388597,0.028221
Adult Sex,no_transform,Kamishima,0.848135,0.003341,0.881765,0.006238,0.824512,0.050735,0.307629,0.020741
Adult Sex,no_transform,Logistic Regression (C=1.0),0.847556,0.003511,0.880436,0.005718,0.828655,0.055215,0.30715,0.023203
Adult Sex,no_transform,"Logistic Regression, C=1.0",0.846965,0.003432,0.880456,0.005667,0.839254,0.049004,0.315986,0.022258
Adult Sex,no_transform,SVM,0.861606,0.002751,0.896412,0.00389,0.744793,0.04237,0.263773,0.018194
