The Tor Path Simulator
C++ Python Java Shell Makefile Protocol Buffer Other
Latest commit 9dce9ad Jan 16, 2017 @ajohnson-nrl ajohnson-nrl fixing file close bug
Permalink
Failed to load latest commit information.
ext Completely revamped the way congestion is passed and selected. Apr 20, 2013
in adding trace file for user models Sep 17, 2013
util adding more accurate description to consensus process examination uti… Jun 18, 2016
.gitignore Added .gitignore Apr 20, 2013
LICENSE adding explicit licensing terms Sep 5, 2013
README.md updating README and quick simulation script for new guard num default… Jun 18, 2016
TODO making output class an option Aug 18, 2014
analyze_and_plot.sh fixing pwd exec Aug 15, 2014
congestion_aware_pathsim.py fixing rare situation where need expires and is reinstated before cov… Aug 11, 2014
event_callbacks.py adding start fn to callback interface Jun 18, 2016
models.py adding best/worst models based on portscan results Apr 29, 2013
network_modifiers.py fixing file close bug Jan 16, 2017
pathsim.py making arguments to AdversaryInsertion explicit Jun 20, 2016
pathsim_analysis.py adding explicit binary read mode for pickle files Aug 17, 2014
pathsim_plot.py Revert "removing ntor stuff and setting 1-2 month guard rotation" Sep 16, 2015
plot_torcat-3guards.py scripts for plotting Tor vs CAT results May 9, 2013
plot_torcat-all.py scripts for plotting Tor vs CAT results May 9, 2013
plot_torcat.py scripts for plotting Tor vs CAT results May 9, 2013
process_consensuses.py fixing file close bug Jan 16, 2017
run_quick_simulation.sh updating num guards in quick simulation script Jun 19, 2016
run_simulations_cat.sh updating run scripts to use argparse argument format Aug 23, 2014
run_simulations_delayed_entry.sh updating run scripts to use argparse argument format Aug 23, 2014
run_simulations_guard_exit_bw.sh minor edits Jun 17, 2016
run_simulations_tot_bw.sh updating run scripts to use argparse argument format Aug 23, 2014
run_simulations_user_models.sh updating run scripts to use argparse argument format Aug 23, 2014
vcs_pathsim.py updating to use NetworkState object Aug 12, 2014

README.md

Top-level simulation code:

  • pathsim.py: Path simulator code. Needs Tor's stem library, consensuses, and descriptors
  • congestion_aware_pathsim.py: Path simulator code for congestion-aware Tor (CAT) variant
  • vcs_pathsim.py: Path simulator code for SAFEST (i.e. virtual-coordinate system) variant

Top-level analysis scripts:

  • pathsim_analysis.py: Turns simulator output into statistics.
  • pathsim_plot.py: Turns simulator statistics into plots.

Useful shell scripts:

  • run_quick_simulation.sh: Runs a simple simulation with the input paramaters
  • run_simulations_cat.sh: Runs parallel CAT simulations
  • run_simulations_delayed_entry.sh: Runs parallel simulations where adversary enters after start.
  • run_simulations_guard_exit_bw.sh: Runs parallel simulations where guard/exit bandwidths are varied
  • run_simulations_tot_bw.sh: Runs parallel simulations where total bandwidth is varied
  • run_simulations_user_models.sh: Runs parallel simulations where user models are varied
  • analyze_and_plot.sh: Moves simulation files around, runs analysis scripts on them, runs plot scripts on the output, archives the output.

Directories:

  • ext: Code for SAFEST extension
  • util: Code for various useful intermediate operations

For an example of how TorPS can be used, see

Users Get Routed: Traffic Correlation on Tor by Realistic Adversaries
by Aaron Johnson, Chris Wacek, Rob Jansen, Micah Sherr, and Paul Syverson
To appear in Proceedings of the 20th ACM Conference on Computer and Communications Security (CCS 2013).

The BibTeX citation for this paper is

@inproceedings{usersrouted-ccs13,
      author = {Aaron Johnson and Chris Wacek and Rob Jansen and Micah Sherr and Paul Syverson},
      title = {Users Get Routed: Traffic Correlation on Tor by Realistic Adversaries},
      booktitle = {Proceedings of the 20th ACM Conference on Computer and Communications Security (CCS 2013)},
      year = {2013},
      publisher = {ACM}
    }

Path Simulation HOWTO

Basic path simulation can be done entirely with pathsim.py. It requires Stem (https://stem.torproject.org/). Simulation is a two-step process:

  1. Process Tor consensuses and descriptors into a faster and more compact format for later path simulation. This is done with the following command:

    python pathsim.py process [args]
    
    Replace [args] with "-h" for argument details. An example of this command is:
    python pathsim.py process --start_year 2013 --start_month 8 --end_year 2014 --end_month 7
    --in_dir in --out_dir out --initial_descriptor_dir in/server-descriptors-2013-07
    
    TorPS expects to find all consensuses and descriptors for a given month in the format and organization of the metrics.torproject.org consensus archives. Extract the consensus archive for a month into a directory named "[in-dir]/consensuses-[year]-[month]", where [year] is in YYYY format and [month] in is MM format. Similarly, extract the archive of descriptors for a given month into the directory "[in-dir]/server-descriptors-[year]-[month]".

    The processing command will go through each month from [start_year]/[start_month] to [end_year]/[end_month]. It will output the processed "network state files" for a given month into the directory "[out_dir]/network-state-[year]-[month]", which will be created if it doesn't exist.

    If --fat is provided, then the network state files will contain all data from the Tor consensuses and descriptors. However, the resulting "fat" network state files cannot be used by TorPS for simulation. They may be useful to inspect more fully the network states of a given simulation.

    If the consensuses being processed start at the very beginning of a month, which is true assuming you just extract some monthly consensus archives as provided by Tor Metrics, then the --initial_descriptor_dir argument should be included with a directory containing the descriptors from the month before the first consensus month. If this argument is omitted, then the first ~18 hours of network state files of the first month of the period being processed will incorrectly contain many fewer relays than actually existed in the Tor network at that time. This is because a relay is only included if its descriptor has been found in a descriptor archive, but a relay only publishes a new descriptor after ~18 hours. Thus the for the initial hours, the needed descriptors are in the descriptor archive of the month before the period being processed. You can see how many relays are included in each network state file by looking at the output lines of the process command. For example, the relevant lines should look something like:

    Processing consensus file 2013-09-01-00-00-00-consensus
    Wrote descriptors for 2 relays.
    Did not find descriptors for 4277 relays
    
    Notice in this example that nearly all relays are missing descriptors here (and thus would not exist in the network state file), which occurred in this case because the consensuses to process started 2013-09-01-00-00-00 and --initial_descriptor_dir was omitted. Output from the second day of this example shows that indeed there are no missing descriptors after at most 24 hours of consensuses:
    Processing consensus file 2013-09-02-00-00-00-consensus
    ...
    Wrote descriptors for 4261 relays.
    Did not find descriptors for 0 relays
    
    The script util/examine_process_output.py can be fed the output of the process command to provide convenient statistics about the relays and descriptors produced in each network state file.

  2. Run simulations over a given period. This is done with the following command:
    python pathsim.py simulate [args]
    
    Replace [args] with "-h" for argument details. An example of the command for a 5000-sample simulation in which the client makes a connection to Google (74.125.131.105) every 10 minutes (i.e. 600 seconds) is:
    python pathsim.py simulate --nsf_dir out/ns-2013-08--2014-07 --num_samples 5000 
    --user_model simple=600 --format normal tor
    
    Following is another example of the simulate command. This example executes a simulation in which the user has "typical" behavior as given in the included trace file, a malicious guard relay is added with consensus bandwidth 15000, a malicious exit relay is added with consensus bandwidth 10000, the output indicates only if a malicious guard and/or exit is selected, the number of client guards is adjusted to 2, and guard expiration occurs randomly between 270 and 300 days after initial selection:
    python pathsim.py simulate --nsf_dir out/ns-2013-08--2014-07 --num_samples 5000
    --trace_file in/users2-processed.traces.pickle --user_model typical --format relay-adv
    --adv_guard_cons_bw 15000 --adv_exit_cons_bw 10000 --adv_time 0 --num_adv_guards 1
    --num_adv_exits 1 --num_guards 2 --guard_expiration 270 --loglevel INFO tor
    

    The included trace file (in/users2-processed.traces.pickle) includes six 20-minute traces recorded from a volunteer using Tor for the following activities: Facebook, Gmail / Google Chat (now Hangouts), Google Calendar / Google Docs, Web search, IRC, and BitTorrent. These are repeated on a weekly schedule to create user models that fill the simulated time period. Also, a "typical" model is provided including all of the first four traces (i.e. Facebook, Gmail/GChat, GCal/GDocs, Web search) in the schedule, and "best" and "worst" models are provided by replacing the TCP ports in the typical model with ports 443 and 6523, respectively. See the paper "Users Get Routed: Traffic Correlation on Tor by Realistic Adversaries" cited above for details on these traces and models.

Plotting Simulation Data

TorPS includes some basic functions to quickly analyze and view the results of your simulations. Note that the shell script analyze_and_plot.sh gives an example of how to use this functionality.

  1. pathsim_analysis.py will process a number of log files in parallel and store the result for each one as a file containing pickled objects. It has command options: "simulation-set" and "simulation-top". simulation-set will compute statistics for the case that the adversary controls a set of relays. simulation-top will compute statistics as if the adversary controls a varying number of the "top" relays. See the script output for command options.
  2. pathsim_plot.py requires numpy and matplotlib. It takes the files output by pathsim_analysis.py and produces a set of graphs showing the CDFs of compromise time and rate for the guard/exit/guard&exit of user circuits. See the script output for command options.

Versions

The latest version of TorPS (tag "tor-0.2.4.23") simulates path selection as performed by Tor stable release 0.2.4.23. The TorPS version at tag "tor-0.2.3.25" simulates path selection as performed by Tor stable release 0.2.3.25.