Skip to content

vvv214/LDP_Protocols

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository lists the prototype algorithms our group developed for data collection/analysis under local differential privacy and is maintained by Tianhao Wang.

Protocols

OLH

Frequency Oracle (primitive to estimate the histograms)

Related Paper: Locally Differentially Private Protocols for Frequency Estimation

Extended-OLH for Sparse Aggregation

Related Paper: Locally Differentially Private Sparse Vector Aggregation

Source code is to be open sourced.

Square Wave

Density Oracle (for numerical/ordinal values)

Related Paper: Estimating Numerical Distributions under Local Differential Privacy

Clarification: Citation 33 should be Ning Wang et al. Collecting and Analyzing Multidimensional Data with Local Differential Privacy. ICDE 2019.

SVSM

Frequent Itemset Mining under LDP

Related Paper: Locally Differentially Private Frequent Itemset Mining

Errata: In Equation (10) of Section V, there are three terms, two of them misses the coefficient $\ell$.

Clarification: To find top-k itemsets, we also consider singleton estimates from SVIM (the method for singleton mining).

Post-Porcessing

A list of Post-Porcess Methods for LDP

Related Paper: Locally Differentially Private Frequency Estimation with Consistency

ToPL

Publish streaming data under LDP

Related Paper: Continuous Release of Data Streams under both Centralized and Local Differential Privacy

Code available at this repository.

HDG

Multi-Dimensional Range Query under LDP

Related Paper: Answering Multi-Dimensional Range Queries under Local Differential Privacy

Code available at this repository.

I am slowly cleaning code for the protocols below:

PEM

Heavy Hitter Identification

Related Paper: Locally Differentially Private Heavy Hitter Identification

Errata: For the AOL dataset, there are 0.12M, instead of 0.2M unique queries. This is a typo that does not change any result.

Clarification: For the QUANTCAST data, I downloaded the data for one month (which contains 10 billion clicks), and sample for 5min (i.e., divide the # clicks by 30 * 24 * 5 * 60). The dataset is available upon request.

HIO

Multi-Dimensional Analytics

Related Paper: Multi-Dimensional Analytics Related Paper: Answering Multi-Dimensional Analytical Queries under Local Differential Privacy

CALM

Marginal Estimation

The source code is not opened yet, but the similar code (plus a data synthesizing component) for the central DP setting is opened at DPSyn (related info at nist challenge 1) and DPSyn2 (related info at nist challenge 2).

Related Paper: CALM: Consistent Adaptive Local Marginal for Marginal Release under Local Differential Privacy

MURS

Shuffler Model

Related Paper: Improving Utility and Security of the Shuffler-based Differential Privacy

DP-FL-GBDT

Training GBDTs in the Federated Model

Related Paper: Federated Boosted Decision Trees with Differential Privacy Source code is open sourced at Sam's repo

DP-VFL-Clustering

Clustering in the Vertical Federated Model

Related Paper: Differentially Private Vertical Federated Clustering (accepted to VLDB 23) Source code is to be open sourced.

Environment

I mainly used Python 3 with numpy. Although not tested, the code should be compatible with any recent version. I also use a special package called xxhash for high hashing throughput. It can be changed to the builtin hash function. In case your local environment does not work, this is the package versions I used from my local side.

Python 3.6

xxhash 1.0.1

numpy 1.11.3

pytest 3.4.0

Or, run

pip install -r requirements.txt
pytest

About

Sample LDP implementation in Python

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages