Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



64 Commits

Repository files navigation

This repository lists the prototype algorithms our group developed for data collection/analysis under local differential privacy and is maintained by Tianhao Wang.



Frequency Oracle (primitive to estimate the histograms)

Related Paper: Locally Differentially Private Protocols for Frequency Estimation

Extended-OLH for Sparse Aggregation

Related Paper: Locally Differentially Private Sparse Vector Aggregation

Source code is to be open sourced.

Square Wave

Density Oracle (for numerical/ordinal values)

Related Paper: Estimating Numerical Distributions under Local Differential Privacy

Clarification: Citation 33 should be Ning Wang et al. Collecting and Analyzing Multidimensional Data with Local Differential Privacy. ICDE 2019.


Frequent Itemset Mining under LDP

Related Paper: Locally Differentially Private Frequent Itemset Mining

Errata: In Equation (10) of Section V, there are three terms, two of them misses the coefficient $\ell$.

Clarification: To find top-k itemsets, we also consider singleton estimates from SVIM (the method for singleton mining).


A list of Post-Porcess Methods for LDP

Related Paper: Locally Differentially Private Frequency Estimation with Consistency


Publish streaming data under LDP

Related Paper: Continuous Release of Data Streams under both Centralized and Local Differential Privacy

Code available at this repository.


Multi-Dimensional Range Query under LDP

Related Paper: Answering Multi-Dimensional Range Queries under Local Differential Privacy

Code available at this repository.

I am slowly cleaning code for the protocols below:


Heavy Hitter Identification

Related Paper: Locally Differentially Private Heavy Hitter Identification

Errata: For the AOL dataset, there are 0.12M, instead of 0.2M unique queries. This is a typo that does not change any result.

Clarification: For the QUANTCAST data, I downloaded the data for one month (which contains 10 billion clicks), and sample for 5min (i.e., divide the # clicks by 30 * 24 * 5 * 60). The dataset is available upon request.


Multi-Dimensional Analytics

Related Paper: Multi-Dimensional Analytics Related Paper: Answering Multi-Dimensional Analytical Queries under Local Differential Privacy


Marginal Estimation

The source code is not opened yet, but the similar code (plus a data synthesizing component) for the central DP setting is opened at DPSyn (related info at nist challenge 1) and DPSyn2 (related info at nist challenge 2).

Related Paper: CALM: Consistent Adaptive Local Marginal for Marginal Release under Local Differential Privacy


Shuffler Model

Related Paper: Improving Utility and Security of the Shuffler-based Differential Privacy


Training GBDTs in the Federated Model

Related Paper: Federated Boosted Decision Trees with Differential Privacy Source code is open sourced at Sam's repo


Clustering in the Vertical Federated Model

Related Paper: Differentially Private Vertical Federated Clustering (accepted to VLDB 23) Source code is to be open sourced.


I mainly used Python 3 with numpy. Although not tested, the code should be compatible with any recent version. I also use a special package called xxhash for high hashing throughput. It can be changed to the builtin hash function. In case your local environment does not work, this is the package versions I used from my local side.

Python 3.6

xxhash 1.0.1

numpy 1.11.3

pytest 3.4.0

Or, run

pip install -r requirements.txt


Sample LDP implementation in Python







No releases published


No packages published