-
Notifications
You must be signed in to change notification settings - Fork 22
Spec the multi-epoch unoptimization #211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Prior to this, the algorithm for deducting privacy budget used the L1 norm of the histogram, which the analysis shows is only possible when impressions from a single epoch are involved. Again, this is pretty suboptimal if implemented directly, which is a recurring pattern. This iterates over all impressions twice. I didn't bother to cache the impressions that were selected. I didn't even bother to break out of the loop when the count hits two. Those would just make the "code" harder to read than it already is. This is contrary to what is implied in #78. Still, this closes #78, just in the opposite way to what was invisaged.
|
This doesn't look correct to me, and I think the existing spec is fine. The The L1 norm of the contribution vector is not used in the current spec (nor in this PR). If we supported scaling noise based on this, we would need to output the actual L1 norm from the attribution algorithm and deduct privacy budget after the histogram is filled (e.g. after 4.4.1.6). In this case, for some attribution algorithms, the L1 norm would be smaller than |
|
I agree with @csharrison. Moreover, if I understand correctly, the change proposed in this PR is enabling/disabling multi-epoch accounting based on In case you'd like extra context, here is our reference implementation for Cookie Monster. There is a bit of extra code irrelevant for PPA Level 1 (epoch-source losses and two-phase commit are for Big Bird).
Also, one question: are you certain that your report global sensitivity is just |
Yes, I agree with this too.
Let me file an issue to clarify this. Our |
|
Ah, I see where this caused problems. Nevermind that then. I'm not following the 2x question. I get the point about the effect of removing an impression from the database, but I don't think that our privacy unit is the impression, but the browser instance. So I think that the difference is accounted for. @csharrison, if you think we need something more concrete, maybe open a separate issue to track that. |
Let's move the discussion to #212. I can also draft up an example. |
Prior to this, the algorithm for deducting privacy budget used the L1 norm of the histogram, which the analysis shows is only possible when impressions from a single epoch are involved.
Again, this is pretty suboptimal if implemented directly, which is a recurring pattern. This iterates over all impressions twice. I didn't bother to cache the impressions that were selected. I didn't even bother to break out of the loop when the count hits two. Those would just make the "code" harder to read than it already is.
This is contrary to what is implied in #78.
Still, this closes #78, just in the opposite way to what was invisaged.
Preview | Diff