In this repository, we implement FDM in Streaming and Sliding-Window Models. Algorithms(Fair-Swap, Fair-Flow and Fair-GMM) in Diverse Data Selection under Fairness Constraints and Fair-Greedy-Flow in Improved Approximation and Scalability for Fair Max-Min Diversification are also implemented, which are treated as baseline.
Four real-world datasets ,i.e. Adult, CelebA, Census, Lyrics and one sythetic dataset are used in our experiments. Download links are included in the paper. To generate sythetic dataset, run the command like the following :
python generate_synthetic_dataset.py -n 1000 -m 2 -d 2
Dataset | Group | n | m | #feats | Distance Metric |
Adult | Sex | 48, 842 | 2 | 6 | Euclidean |
Race | 5 | ||||
S+R | 10 | ||||
CelebA | Sex | 202, 599 | 2 | 41 | Manhattan |
Age | 2 | ||||
S+A | 4 | ||||
Census | Sex | 2, 426, 116 | 2 | 25 | Manhattan |
Age | 7 | ||||
S+A | 14 | ||||
Lyrics | Genre | 122, 448 | 15 | 15 | Angular |
SYN | - | 1000-10,000,000 | 2-20 | 2 | Euclidean |
- Ubuntu 20.04 (or higher version)
- Python 3.8 (or higher version)
Once you have downloaded the datasets and put them in the directory datasets, you can reproduce the experiments by simply running the following commands:
python run_exp_varying_k_m_n_stream.py
python run_exp_varying_k_window.py
python run_exp_varying_m_w_window.py