You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi. I’m an engineer on the Topics API for Chrome. I took a brief look at your code after seeing rather surprising results in the related paper and it’s important to point out an issue that I came across as it has a significant impact on the simulation (and therefore the paper’s) results.
You’re using a worker pool to create the topics for each user on sites A and B, but you’re not reseeding the random number generator on each worker (which is forked off the original process). The result is that each worker creates the same stream of random numbers!
This means that in your simulator, sites A and B are getting the same Topics for the same user, rather than chosen at random.
This is a significant problem with your published work. For example, fixing this bug in your code reduces the 5-epoch reidentification rate from ~57% to ~3% with params[1] provided in the README.
An easy fix is to add os.register_at_fork(after_in_child=np.random.seed) before creating your worker pool.
Thanks for reaching out and reporting this to our attention!
We looked into this subtle bug regarding the initialization of the random number generator seed across these forked processes. We confirm that numpy preserves the random state across forks and that the proposed solution fixes it by forcing an auto-seed for each new fork. Thus, we re-ran our simulation on these real dataset of browsing histories.
While the results that we now obtain have changed quantitatively; 2.3%, 2.9%, and 4.1% of these users are uniquely re-identified after 1, 2, and 3 observations of their topics, respectively, our findings do not change qualitatively: real users can be fingerprinted by the Topics API and the information leakage worsens over time as more users get uniquely re-identified.
Hi. I’m an engineer on the Topics API for Chrome. I took a brief look at your code after seeing rather surprising results in the related paper and it’s important to point out an issue that I came across as it has a significant impact on the simulation (and therefore the paper’s) results.
You’re using a worker pool to create the topics for each user on sites A and B, but you’re not reseeding the random number generator on each worker (which is forked off the original process). The result is that each worker creates the same stream of random numbers!
This means that in your simulator, sites A and B are getting the same Topics for the same user, rather than chosen at random.
This is a significant problem with your published work. For example, fixing this bug in your code reduces the 5-epoch reidentification rate from ~57% to ~3% with params[1] provided in the README.
An easy fix is to add
os.register_at_fork(after_in_child=np.random.seed)
before creating your worker pool.Josh
[1] python3 topics_simulator.py data/web_data/users_topics_5_weeks.tsv 5 topics_classifier/chrome4/config.json data/crux/crux_202401_chrome4_topics-api.tsv 10 1 data/reidentification_exp/5_weeks_10_unobserved`
The text was updated successfully, but these errors were encountered: