Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


This is an implementation (written by the authors) of the fair k-median clustering algorithm from the paper:

"Scalable Fair Clustering"

by Arturs Backurs, Piotr Indyk, Krzysztof Onak, Baruch Schieber, Ali Vakilian, Tal Wagner

Published in ICML 2019, available at

For the definition of fair clustering, please refer to the paper:

"Fair Clustering Through Fairlets"

by Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Sergei Vassilvitskii

Published in NIPS 2017, available at

The main file is Usage: p q k dataset.csv [N]

p,q: Two integers that define the desired cluster balance. Namely, for p<=q, each output cluster with R red points and B blue points would satisfy min(R/B, B/R) >= p/q.

k: number of output clusters (k in k-median)

dataset.csv: Dataset file name, in CSV format. Each row represents a data point, and is a list of comma-separated numerical values. The first value is 0/1 and represents the sensitive color class (red/blue) of the point, the remaining values are its features.

N (optional): Integer sample size. If given, the code would run on a random sample of N records from the dataset. If not given, the code would run on the whole dataset.

The main part of the code computes a fairlet decomposition of the dataset (as defined in Chierichetti et al). Then a usual (non-fair) k-median algorithm is invoked on the fairlet centers. The resulting clustering is then extended to the whole dataset by assigning each data point to the cluster that contains its fairlet center. The yields the final fair clustering.

For usual k-median, the code invokes MATLAB. This requires Python 3.6 or older. Alternatively, the MATLAB invocation can be replaced by any other k-median implementation.

ACKNOWLEDGMENT: This work was supported in part by MIT-IBM Watson AI Lab collaboration.


No description, website, or topics provided.




No releases published


No packages published