# EthicML

## Runnning experiments on the Adult dataset

### Installation

First we need to install EthicML. Currently, the toolkit isn't on PyPi, but this will change soon.

For now, the toolkit can be installed as an editable package
```
pip install -e git+https://github.com/predictive-analytics-lab/EthicML.git#egg=ethicml
```

(Obviously this notebook is within the package, so we can skip this step)

### Loading the data

EthicML includes some often used datasets from fairness literature.
First, we load one of these... in this example we load the UCI Adult dataset

In [1]:
from ethicml.algorithms.utils import DataTuple
from ethicml.data.load import load_data
from ethicml.data import Adult, Compas, Credit, German, Sqf, Toy

data: DataTuple = load_data(Adult())
assert (48842, 102) == data.x.shape
assert (48842, 1) == data.s.shape
assert (48842, 1) == data.y.shape

This loads the dataset as a DataTuple, which comprises $x$ (features), $s$ (sensitive attribute and $y$ (class label). Each member of the DataTuple is stored as a Pandas DataFrame.

By default, the Adult dataset uses the binary attribute `sex_Male` as the sensitive feature.

In [2]:
data.s.head()

Unnamed: 0,sex_Male
0,1
1,1
2,1
3,1
4,0


If we want to run experiments using race as the sensitive attribute we could change that manually, or, as this is a common task, EthicML can split the data for you.

In [3]:
data: DataTuple = load_data(Adult(split="Race"))
assert (48842, 99) == data.x.shape
assert (48842, 5) == data.s.shape
assert (48842, 1) == data.y.shape

In [4]:
data.s.head()

Unnamed: 0,race_Amer-Indian-Eskimo,race_Asian-Pac-Islander,race_Black,race_Other,race_White
0,0,0,0,0,1
1,0,0,0,0,1
2,0,0,0,0,1
3,0,0,1,0,0
4,0,0,1,0,0


However, we're going to be repeating some of the experiments from FairGP. In that paper they do experiments with race as the sensitive attribute, but the value is binary. The value of race is White or Not_White.

Fortunately, race has been one-hot-encoded so to replicate this we can just drop the features from the sensitive attribute that aren't `race_White`.

The Dataset class is really just a guide that tells EthicML how to read the underlying CSV. So to remove the other race attributes, we can just not include them in our list of sensitive attribute columns.

In [5]:
dataset = Adult("Race")
dataset.sens_attrs = ["race_White"]
data = load_data(dataset)

In [6]:
data.s.head()

Unnamed: 0,race_White
0,1
1,1
2,1
3,0
4,0


### Evaluating some models

In [None]:
from ethicml.algorithms.inprocess import Agarwal, InAlgorithm, LR, SVM, Kamishima
from ethicml.algorithms.preprocess import Kamiran
from ethicml.metrics import Accuracy, CV, TPR, ProbPos
from ethicml.evaluators.evaluate_models import evaluate_models

datasets = [dataset]
preprocess_models = [Kamiran()]
inprocess_models = [Agarwal(), Kamishima(), LR(), SVM()]
postprocess_models = []
metrics = [Accuracy(), CV()]
per_sens_metrics = [Accuracy(), TPR(), ProbPos()]
test123 = evaluate_models(datasets, preprocess_models, inprocess_models, postprocess_models, metrics, per_sens_metrics, test_mode=False, repeats=10)

 26%|██▌       | 26/101 [19:34<45:02, 36.03s/it]  

In [None]:
test123.groupby(level=[0,1,2]).agg(['mean', 'std'])[['Accuracy', 'Accuracy_race_White_0/race_White_1', 'TPR_race_White_0/race_White_1', 'prob_pos_race_White_0/race_White_1']]