This software is an implementation of the following paper:
Huandong Wang, Chen Gao, Yong Li, Gang Wang, Depeng Jin, and Jingbo Sun "De-anonymization of Mobility Trajectories: Dissecting the Gaps between Theory and Practice", in Proceedings of the Network and Distributed System Security Symposium (NDSS), 2018.
The goal of this software is to de-aonymize ISP trajectories using external information, e.g., users' check-ins at online social network. In this software, our proposed algorithms and 9 baselines algorithms are all implemented to de-anonymize ISP trajectories.
- GM [NDSS 2018]
- POIS [WWW 2016]
- ME [AIHC 2016]
- HMM [IEEE SP 2011]
- WYCI [WOSN 2014]
- HIST [TIFS 2016]
- NFLX [IEEE SP 2008]
- MSQ [TON 2013]
- LRCF [WWW 2013]
- MKV [WPES 2008]
|input_path||Path of trace data||./input||-|
|output_path||Path of output||./output||-|
|save_weights||Save candidates' weight or not||False||Weight will be saved in output_path if set to True|
|region_centre||Location of regions' centers||./RegionCenters||RegionID and its location|
|region_popularity||Popularity of regions||./RegionPopularity||Only used in LRCF & POIS|
|Tmax||The number of time-bins||24*8||Only used in HMM & GMM|
|scale||Spatial granularity||0.01||Granularity of lng&lat|
|K||Top-K in hit-precision||10||-|
How to use (taking POIS as an example)
- Get help
$ python algorithms/pois.py --help
$ python algorithms/pois.py \ --input_path ./algorithms/input/ \ --output_path ./algorithms/output/pois \ --save_weights False \ --region_centre ./algorithms/RegionCenters \ --region_popularity ./algorithms/RegionPopularity \ --scale 0.01 \ --K 10
We leave a synthesized dataset of trajectories data_example, which can be used as input of the de-anonymization algorithms.
In this example, the first line data_example represents the external trajectories.
The string before semi-colon is the userID. Spatio-temporal points in the trajectories are divided by vertical bar. The element before comma is the ID of the time-bin (1-hour in this exmaple), and the element after comma is latitude and longitude.
From the second line to the last line are ISP trajectory, of which an example is:
Different with external trajectory, locations in ISP trajectories are represented by the ID of region. The geographic coordinates of the centers of regions can be found in algorithm\RegionCenters.
Run the code
You can simply use the following command to run all the algorithms on data_example:
algorithms/output/XX/result is the obtained result, where
XX represents the name of the de-anonymization algorithm. For each input file,
there is a line in
result with two fields separated by "\t". The first field is the name of input file. The second field is the hit-precision.
If you set
True, weight will be saved in output_path with the same name of input file.
RegionPopularity gives the total number of access logs in each region. De-anonymization algorithms of LRCF and POIS need this file as input.
You can use the scripts
ParaLearn.pyto learn the parameters of GM algorithm.
Please cite the reference appropriately in case they are used.
[NDSS 2018] H. Wang, C. Gao, Y. Li, G. Wang, D. Jin, J. Sun, "De-anonymization of Mobility Trajectories: Dissecting the Gaps between Theory and Practice," in Proc. NDSS, 2018.
[WWW 2016] C. Riederer, Y. Kim, A. Chaintreau, N. Korula, and S. Lattanzi, “Linking users across domains with location data: Theory and validation,” in Proc. WWW, 2016.
[AIHC 2016] A. Cecaj, M. Mamei, and F. Zambonelli, “Re-identification and information fusion between anonymized cdr and social network data,” Journal of Ambient Intelligence and Humanized Computing, vol. 7, no. 1, pp. 83–96, 2016.
[WOSN 2014] L. Rossi and M. Musolesi, “It’s the way you check-in: identifying users in location-based social networks,” in Proc. ACM WOSN, 2014.
[TIFS 2016] F. M. Naini, J. Unnikrishnan, P. Thiran, and M. Vetterli, “Where you are is who you are: User identification by matching statistics,” IEEE Transactions on Information Forensics and Security (TIFS), vol. 11, no. 2, pp. 358–372, 2016.
[IEEE SP 2008] A. Narayanan and V. Shmatikov, “Robust de-anonymization of large sparse datasets,” in Proc. IEEE SP, 2008.
[IEEE SP 2011] R. Shokri, G. Theodorakopoulos, J.-Y. Le Boudec, and J.-P. Hubaux, “Quantifying location privacy,” in Proc. IEEE SP, 2011.
[TON 2013] C. Y. Ma, D. K. Yau, N. K. Yip, and N. S. Rao, “Privacy vulnerability of published anonymous mobility traces,” IEEE/ACM Transactions on Networking (TON), vol. 21, no. 3, pp. 720–733, 2013.
[WWW 2013] O. Goga, H. Lei, S. H. K. Parthasarathi, G. Friedland, R. Sommer, and R. Teixeira, “Exploiting innocuous activity for correlating users across sites,” in Proc. WWW, 2013.
[WPES 2008] Y. De Mulder, G. Danezis, L. Batina, and B. Preneel, “Identification via location-profiling in gsm networks,” in Proc. ACM WPES, 2008.