Privacy-Preserving Dynamic Learning of Tor Network Traffic
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
.gitignore
Gemfile
README.md
_config.yml

README.md

Overview

This is the landing page for the following research publication:

Privacy-Preserving Dynamic Learning of Tor Network Traffic
Proceedings of the 25th ACM Conference on Computer and Communication Security (CCS 2018)
by Rob Jansen, Matthew Traudt, and Nicholas Hopper
[Full paper available here]

If you reference this paper or use any of the data or models provided on this page, please cite the paper. Here is a bibtex entry for latex users:

@inproceedings{tmodel-ccs2018,
author = {Rob Jansen and Matthew Traudt and Nicholas Hopper},
title = {Privacy-Preserving Dynamic Learning of {Tor} Network Traffic},
booktitle = {25th ACM Conference on Computer and Communications Security (CCS)},
year = {2018},
note = {See also \url{https://tmodel-ccs2018.github.io}},
}

The research included privacy-preserving measurement and Tor network simulation components.

Measurement

Measurement of Tor was done using PrivCount, a tool for privacy-preserving Tor statistics aggregation, along with a modified version of Tor. We modified each of these tools for the purposes of dynamically learning and modeling Tor traffic:

PrivCount Code

Traffic learning and modeling changes have been merged upstream!

The version we used for our experiments:

Tor Code

Traffic learning and modeling changes have been merged upstream!

  • git repo: git@github.com:privcount/tor.git
  • at branch privcount-master
    (since commit 38d6e2dafbc0669b38d2564426b21e67d83fea3f)
  • git web: https://github.com/privcount/tor

The version we used for our experiments:

Data

See data/privcount in the repo. Each measurement number corresponds to the measurement number listed in Table 2 in the paper. Measurements 1-7 are ground truth measurements, measurement 8 includes 14 iterations for learning the packet model, and measurement 9 includes 14 iterations for learning the stream model.

The best packet model was from measurement 8-8, and the best stream model was from measurement 9-8.

Simulation

Simulation was done using Shadow, a full network simulation tool that directly executes Tor.

Shadow Code

Changes have been merged upstream!

  • git repo: git@github.com:shadow/shadow.git
  • at branch: master
    (our experiments were run at commit: 322d8e047ae9adbc7ddbdfdae0c6aec073eb2374)
  • git web: https://github.com/shadow/shadow

You can run Shadow with your own version of Tor to help with your own research.

If you want to export PrivCount events in order to count the number of streams, circuits, bytes, etc. as we did in the paper, you'll need to use the Tor research/tmodel/train-v3-03210 branch listed above. Additionally, due to a bug in PrivCount, you should also apply our workaround patch to TGen to make sure all stream events get recorded correctly: data/shadow/workaround_for_privcount_stream_bug.patch

Shadow Network Configuration

Section 6.1.1 in the paper describes our approach to creating an Internet model for using as Shadow's network configuration. That methodology yielded a network graph graphml file that we used in our Shadow simulations. We also back-ported the network graph for a previous stable version of Shadow. These files should be decompressed and copied to ~/.shadow/share.

Shadow Host Configuration

Our Shadow experiments used the client behavior models that we discuss in the paper. You can incorporate these models into your own Shadow experiments.

If you want to repeat our experiments, Section 6.1.2 in the paper describes our host configuration for each of the 3 TGen models we tested. Here are the Shadow configurations needed to run each experiment. Run time is estimated and assumes a 32-core server with 30 Shadow worker threads.

Client Model # Relays # Clients RAM Run time
Single file 2,000 60,000 ~1.25 TiB ~1 week
Protocol 2,000 13,730 ~300 GiB ~1 week
PrivCount 2,000 129,419 ~2.75 TiB ~1 month