# EthicML

## Runnning experiments on the Adult dataset

### Installation

First we need to install EthicML. Currently, the toolkit isn't on PyPi, but this will change soon.

For now, the toolkit has to be cloned, then installed as an editable package
```
cd <Location to clone to>
git clone https://github.com/predictive-analytics-lab/EthicML.git
cd EthicML
pip install --editable ./
```

(Obviously this notebook is within the package, so we can skip this step)

### Loading the data

EthicML includes some often used datasets from fairness literature.
First, we load one of these... in this example we load the UCI Adult dataset

In [1]:
from ethicml.algorithms.utils import DataTuple
from ethicml.data.load import load_data
from ethicml.data import Adult, Compas, Credit, German, Sqf, Toy

data: DataTuple = load_data(Adult())
assert (48842, 102) == data.x.shape
assert (48842, 1) == data.s.shape
assert (48842, 1) == data.y.shape

This loads the dataset as a DataTuple, which comprises $x$ (features), $s$ (sensitive attribute and $y$ (class label). Each member of the DataTuple is stored as a Pandas DataFrame.

By default, the Adult dataset uses the binary attribute `sex_Male` as the sensitive feature.

In [2]:
data.s.head()

Unnamed: 0,sex_Male
0,1
1,1
2,1
3,1
4,0


If we want to run experiments using race as the sensitive attribute we could change that manually, or, as this is a common task, EthicML can split the data for you.

In [3]:
data: DataTuple = load_data(Adult(split="Race"))
assert (48842, 99) == data.x.shape
assert (48842, 5) == data.s.shape
assert (48842, 1) == data.y.shape

In [4]:
data.s.head()

Unnamed: 0,race_Amer-Indian-Eskimo,race_Asian-Pac-Islander,race_Black,race_Other,race_White
0,0,0,0,0,1
1,0,0,0,0,1
2,0,0,0,0,1
3,0,0,1,0,0
4,0,0,1,0,0


However, we're going to be repeating some of the experiments from FairGP. In that paper they do experiments with race as the sensitive attribute, but the value is binary. The value of race is White or Not_White.

Fortunately, race has been one-hot-encoded so to replicate this we can just drop the features from the sensitive attribute that aren't `race_White`.

The Dataset class is really just a guide that tells EthicML how to read the underlying CSV. So to remove the other race attributes, we can just not include them in our list of sensitive attribute columns.

In [5]:
dataset = Adult("Race")
dataset.sens_attrs = ["race_White"]
data = load_data(dataset)

In [6]:
data.s.head()

Unnamed: 0,race_White
0,1
1,1
2,1
3,0
4,0


### Evaluating some models

In [7]:
from ethicml.algorithms.inprocess import Agarwal, InAlgorithm, LR, SVM, Kamishima
from ethicml.algorithms.preprocess import Kamiran
from ethicml.metrics import Accuracy, CV, TPR, ProbPos
from ethicml.evaluators.evaluate_models import evaluate_models

datasets = [dataset]
preprocess_models = [Kamiran()]
inprocess_models = [Agarwal(), Kamishima(), LR(), SVM()]
postprocess_models = []
metrics = [Accuracy(), CV()]
per_sens_metrics = [Accuracy(), TPR(), ProbPos()]
test123 = evaluate_models(datasets, preprocess_models, inprocess_models, postprocess_models, metrics, per_sens_metrics, test_mode=False, repeats=10)

110it [1:28:00, 48.00s/it]


In [8]:
test123.groupby(level=[0,1,2]).agg(['mean', 'std'])[['Accuracy', 'Accuracy_race_White_0/race_White_1', 'TPR_race_White_0/race_White_1', 'prob_pos_race_White_0/race_White_1']]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Accuracy,Accuracy,Accuracy_race_White_0/race_White_1,Accuracy_race_White_0/race_White_1,TPR_race_White_0/race_White_1,TPR_race_White_0/race_White_1,prob_pos_race_White_0/race_White_1,prob_pos_race_White_0/race_White_1
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,mean,std,mean,std,mean,std,mean,std
dataset,transform,model,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
Adult,Kamiran & Calders,Agarwal,0.849831,0.002914,1.060362,0.004983,0.972334,0.051636,0.63627,0.041602
Adult,Kamiran & Calders,Kamishima,0.850578,0.00389,1.058375,0.006047,0.869808,0.050598,0.549457,0.045157
Adult,Kamiran & Calders,Logistic Regression,0.849299,0.002863,1.060375,0.006307,1.003061,0.048908,0.666103,0.042811
Adult,Kamiran & Calders,SVM,0.864664,0.003033,1.058535,0.006453,1.019931,0.039817,0.659173,0.035216
Adult,no_transform,Agarwal,0.849831,0.002914,1.060362,0.004983,0.972334,0.051636,0.63627,0.041602
Adult,no_transform,Kamishima,0.850599,0.003768,1.057965,0.006058,0.872979,0.04991,0.55375,0.043847
Adult,no_transform,Logistic Regression,0.849759,0.002845,1.061392,0.005757,0.971281,0.049727,0.631601,0.043435
Adult,no_transform,SVM,0.864705,0.003187,1.058761,0.006175,1.01146,0.036556,0.649305,0.036004


In [9]:
from ethicml.algorithms.inprocess import Agarwal, InAlgorithm, LR, SVM, Kamishima
from ethicml.algorithms.preprocess import Kamiran
from ethicml.metrics import Accuracy, CV, TPR, ProbPos
from ethicml.evaluators.evaluate_models import evaluate_models

datasets = [Adult()]
preprocess_models = []
inprocess_models = [Agarwal(), Kamishima(), LR(), SVM()]
postprocess_models = []
metrics = [Accuracy(), CV()]
per_sens_metrics = [Accuracy(), TPR(), ProbPos()]
test123 = evaluate_models(datasets, preprocess_models, inprocess_models, postprocess_models, metrics, per_sens_metrics, test_mode=False, repeats=10)

 98%|█████████▊| 50/51 [41:59<00:50, 50.40s/it]


In [13]:
test123.groupby(level=[0,1,2]).agg(['mean', 'std'])[['Accuracy', 'Accuracy_sex_Male_0/sex_Male_1', 'TPR_sex_Male_0/sex_Male_1', 'prob_pos_sex_Male_0/sex_Male_1']]


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Accuracy,Accuracy,Accuracy_sex_Male_0/sex_Male_1,Accuracy_sex_Male_0/sex_Male_1,TPR_sex_Male_0/sex_Male_1,TPR_sex_Male_0/sex_Male_1,prob_pos_sex_Male_0/sex_Male_1,prob_pos_sex_Male_0/sex_Male_1
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,mean,std,mean,std,mean,std,mean,std
dataset,transform,model,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
Adult,no_transform,Agarwal,0.848572,0.003176,1.13996,0.006937,0.931315,0.08621,0.354343,0.040643
Adult,no_transform,Kamishima,0.850384,0.002961,1.137531,0.007479,0.815207,0.037357,0.29161,0.01369
Adult,no_transform,Logistic Regression,0.850097,0.003253,1.138135,0.007381,0.825997,0.034157,0.297058,0.014001
Adult,no_transform,SVM,0.864725,0.002844,1.117742,0.006871,0.757971,0.044756,0.26067,0.018487
