Disclosure Avoidance Repository

Motivation

The Census Bureau is by law required to keep its survey responses confidential, and is beginning the transition from “ad-hoc” privacy techniques towards a formally private framework known as differential privacy. All public data releases must go thorough the Disclosure Review Board (DRB), whose newest policy states that any data release at the sub-state level or lower must be protected with noise injection techniques.

External researchers using restricted census data at Federal Statistical Research Data Centers (FSRDCs) are among the first affected by these policies, but all census data products will eventually require these methods. Researchers generally do not have a background in formal privacy, so they face a road block if they are interested in publishing sub-state results. This repository strives to deliver the tools and documentation to address this problem. Content here is WIP and all releases utilizing this library still require official approval from the DRB.

Differentially Private Computations

Differential privacy states that any information-related risk to a person should not change significantly as a result of that person's information being included, or not, in the analysis. It provides provable privacy guarantees with respect to the cumulative risk from successive data releases using a privacy "budget." Algorithms maintain differential privacy via the introduction of carefully crafted random noise into the computation. Types of computations that can be made differentiallly private:

descriptive statistics
- counts
- mean
- median
- histograms
- boxplots
- cdf
supervised and unsupervised ML tasks
- regression
- classification
generation of synthetic data

Getting started

Repository Overview

The notebooks/ folder contains tutorials for some of the main workflows researchers practice when releasing sub-state data analyses. These tutorials can be viewed statically in the browser, or run locally using Jupyter. See below for how to install and run a Jupyter notebook locally.

The census_dp/ folder contains implementations of common noise injections algorithms and error metrics. NOTE: These algorithms are not necessarily "formally" private. One reason for this is that many of our implementations currently use python's numpy library, which uses a random number generator that is not cryptographically secure. Read more in dp-future.

The tests/ folder contains unit tests for the implementations in census_dp/ using the pytest library. See instructions for running these tests below.

How to install & run a notebook

Install Anaconda (Census employees must submit a Remedy ticket).
Open an Anaconda prompt
Install git by typing the following into your Anaconda prompt and pressing enter.

conda install git

Navigate to the directory you would like to download this repository in.

For example:

cd Downloads/privacy/

Clone this repository

git clone https://github.com/umadesai/census-dp.git

Navigate to the notebooks folder.

cd census-dp/notebooks

Run Jupyter Notebook.

jupyter notebook

This command should launch Jupyter Notebook locally in your browser. If it does not, open your browser and navigate to the localhost address that is provided in your Anaconda prompt.

Click on the IPython Notebook you would like to open. We recommend starting with dp-count.
Reference this sheet for help using Jupyter Notebook.

Setting up your conda environment

Create the environment from the env.yml file:

conda env create -f env.yml

Activate the new environment:

conda activate env

Once you've finished your work in this environment, you can deactivate the environment using:

conda deactivate

Importing a module from the library

If you want to use a module or algorithm from the library in your own python script, you can follow the structure of the example below.

from census_dp import laplace

my_laplace = laplace.laplace_mech(mu=0, epsilon=1, sensitivity=1)

Running tests

There are tests for each of the library modules, implemented with pytest. To run all the tests at once, run pytest from the base directory of the project.

pytest

Contributors

This project is the work of members of the CED-Disclosure Avoidance team at the US Census Bureau.

Uma Desai (umadesai)
Sophie Song (sophiesong)
Rolando Rodríguez (rrod515)
Amy Lauger (amydlauger)
Caleb Floyd (calebfloyd)
Michael Freiman (mfreiman)

Hear more about the repository at the Annual Conference of the Federal Statistical Research Data Centers on September 5, 2019 at the Pyle Center, University of Wisconsin–Madison.

Acknowledgements

Thank you to the incredible contributions of those who have been researching differential privacy at the Census Bureau and academic institutions, specifically:

Philip Leclerc, US Census Bureau
Simson Garfinkel, US Census Bureau
John Abowd, US Census Bureau
Ashwin Machanavajjhala, Duke University
Michael Hay, Colgate University
Gerome Miklau, University of Mass., Amherst
Daniel Kifer, Penn State University
Cynthia Dwork, Harvard University

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
census_dp		census_dp
notebooks		notebooks
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env.yml		env.yml
issues		issues
mypy.ini		mypy.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

census_dp

census_dp

notebooks

notebooks

tests

tests

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

env.yml

env.yml

issues

issues

mypy.ini

mypy.ini

Repository files navigation

Disclosure Avoidance Repository

Motivation

Differentially Private Computations

Read more

Getting started

Repository Overview

How to install & run a notebook

Setting up your conda environment

Importing a module from the library

Running tests

Contributors

Acknowledgements

About

Releases

Packages

Contributors 3

Languages

License

umadesai/census-dp

Folders and files

Latest commit

History

Repository files navigation

Disclosure Avoidance Repository

Motivation

Differentially Private Computations

Read more

Getting started

Repository Overview

How to install & run a notebook

Setting up your conda environment

Importing a module from the library

Running tests

Contributors

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Languages