-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct noised reach calculation in fulfillDirectReachAndFrequencyMeasurement in EDP simulator #814
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion (waiting on @iverson52000, @kungfucraig, and @stevenwarejones)
src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt
line 835 at r1 (raw file):
* @return Pair of noised reach value and frequency map */ fun addPublisherNoise(
I wonder if it's necessary to have this function public.
Previously, riemanli (Rieman) wrote…
I think so since it needs to be unit tested? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @riemanli and @stevenwarejones)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @riemanli and @stevenwarejones)
src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt
line 835 at r1 (raw file):
I think so since it needs to be unit tested?
This is backwards. Functions don't get exposed so they can be tested. Testing should be done against the public API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @riemanli and @SanjayVas)
src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt
line 835 at r1 (raw file):
Previously, SanjayVas (Sanjay Vasandani) wrote…
I think so since it needs to be unit tested?
This is backwards. Functions don't get exposed so they can be tested. Testing should be done against the public API.
I would argue this belongs in a public function but not in this class. It should be accessible by anyone setting up their own EDP. What about offering this as an interface of something like a Noiser
and add this as an initial impl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @riemanli)
src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt
line 835 at r1 (raw file):
Previously, stevenwarejones (Steven Ware Jones) wrote…
I would argue this belongs in a public function but not in this class. It should be accessible by anyone setting up their own EDP. What about offering this as an interface of something like a
Noiser
and add this as an initial impl.
Addressing these as two separate concerns:
- It is certainly a valid strategy to split functionality out into different components which could then be tested independently. The consideration is whether this functionality should be considered an implementation detail of the EDP simulator. This may indeed be something that we want more expansive testing on, which warrants splitting it out. This would mean that EDP simulator test could use a test double (e.g. mock) for this if desired.
- We technically do not yet offer an official EDP library. Code from this repo hasn't been designed to be depended on externally1. The EDP libraries we do offer are in separate repositories (e.g. consent-signaling-client, any-sketch). If we want to provide functionality from this repo as a separate library, we should have a separate design covering that which identifies everything that should be included.
Footnotes
-
Aside from gRPC APIs, namely the Reporting API. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @riemanli and @SanjayVas)
src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt
line 835 at r1 (raw file):
Previously, SanjayVas (Sanjay Vasandani) wrote…
Addressing these as two separate concerns:
- It is certainly a valid strategy to split functionality out into different components which could then be tested independently. The consideration is whether this functionality should be considered an implementation detail of the EDP simulator. This may indeed be something that we want more expansive testing on, which warrants splitting it out. This would mean that EDP simulator test could use a test double (e.g. mock) for this if desired.
- We technically do not yet offer an official EDP library. Code from this repo hasn't been designed to be depended on externally1. The EDP libraries we do offer are in separate repositories (e.g. consent-signaling-client, any-sketch). If we want to provide functionality from this repo as a separate library, we should have a separate design covering that which identifies everything that should be included.
My sense is that we should start moving these kinds of things into the main repo under some "EDP integration util" category. Eventually moving anysketch, the PBM, and similar things into that would also make sense.
However, I don't think it's a good use of time at the moment as we have more urgent things to concern ourselves with. Perhaps we can take this up after we get some of our more urgent items sorted out, and think about this more broadly in the context of the monorepo work.
Footnotes
-
Aside from gRPC APIs, namely the Reporting API. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @riemanli and @SanjayVas)
src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt
line 835 at r1 (raw file):
Previously, kungfucraig (Craig Wright) wrote…
My sense is that we should start moving these kinds of things into the main repo under some "EDP integration util" category. Eventually moving anysketch, the PBM, and similar things into that would also make sense.
However, I don't think it's a good use of time at the moment as we have more urgent things to concern ourselves with. Perhaps we can take this up after we get some of our more urgent items sorted out, and think about this more broadly in the context of the monorepo work.
I'm fine with leaving it as-is, but I'd prefer the function keep a test with the idea that in the future, it won't be private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @riemanli and @SanjayVas)
src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt
line 835 at r1 (raw file):
Previously, stevenwarejones (Steven Ware Jones) wrote…
I'm fine with leaving it as-is, but I'd prefer the function keep a test with the idea that in the future, it won't be private.
@stevenwarejones that makes sense to me
8ed87de
to
b70ed32
Compare
Previously, kungfucraig (Craig Wright) wrote…
Thanks for your feedback Steven! I discussed with the team yesterday and there’s a need to support adding Gaussian noise in the future. In that case, it will make sense to create an interface to support different types of noise distributions. I would like to handle that in a separate PR since the main purpose of this PR is to fix the calculation of noised reach value. For this PR, I will keep this function private for now and test the changes with public API in EDP simulator as Sanjay suggested. |
Previously, iverson52000 (Albert Hsu) wrote…
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: 1 of 3 files reviewed, 1 unresolved discussion (waiting on @kungfucraig, @riemanli, and @SanjayVas)
src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt
line 835 at r1 (raw file):
Previously, iverson52000 (Albert Hsu) wrote…
Done
I agree we will need to add different types of noise -- Laplace, Gaussian etc and an interface makes sense.
Sorry - I realize I wasn't clear - I'd like to keep the tests that exercise this function. Please make this function not private so it can be tested. I'm fine with internal for now.... with a TODO to move it into its own class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 3 files reviewed, 1 unresolved discussion (waiting on @kungfucraig, @riemanli, and @stevenwarejones)
src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt
line 835 at r1 (raw file):
Previously, stevenwarejones (Steven Ware Jones) wrote…
I agree we will need to add different types of noise -- Laplace, Gaussian etc and an interface makes sense.
Sorry - I realize I wasn't clear - I'd like to keep the tests that exercise this function. Please make this function not private so it can be tested. I'm fine with internal for now.... with a TODO to move it into its own class.
I disagree with the above comment. Keep it private for now. Tests against that method would be added if/when it gets moved to another class.
Methods should not be exposed just so they can be tested directly at any time. Always test via public API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 3 files reviewed, 2 unresolved discussions (waiting on @iverson52000, @kungfucraig, and @riemanli)
src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt
line 856 at r2 (raw file):
// Differentially private reach value is calculated by reach_dp = (reach + noise) / // sampling_rate. val noisedReachValue =
it feels a little weird to be upscaling the value inside of `addPublisherNoise. I think the upscaling should be done outside this function. Even if a publisher chooses not to add noise, they still need to upscale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 3 files reviewed, 2 unresolved discussions (waiting on @kungfucraig, @riemanli, and @stevenwarejones)
src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt
line 856 at r2 (raw file):
Previously, stevenwarejones (Steven Ware Jones) wrote…
it feels a little weird to be upscaling the value inside of `addPublisherNoise. I think the upscaling should be done outside this function. Even if a publisher chooses not to add noise, they still need to upscale.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: 1 of 3 files reviewed, 2 unresolved discussions (waiting on @iverson52000, @kungfucraig, and @riemanli)
src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt
line 682 at r3 (raw file):
) { logger.info("Calculating direct reach and frequency...") val vidSampler = VidSampler(VID_SAMPLER_HASH_FUNCTION)
require(samplingRate > 0 && samplingRate <= 1.0) { "Invalid samplingRate $samplingRate" }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 3 files reviewed, 2 unresolved discussions (waiting on @iverson52000, @kungfucraig, and @riemanli)
src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt
line 682 at r3 (raw file):
Previously, stevenwarejones (Steven Ware Jones) wrote…
require(samplingRate > 0 && samplingRate <= 1.0) { "Invalid samplingRate $samplingRate" }
for vidSamplingIntervalWidth
I mean
Code quote:
vidSamplingIntervalWidth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: 2 of 3 files reviewed, 3 unresolved discussions (waiting on @iverson52000, @riemanli, @stevenwarejones, and @VisibleForTesting)
src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt
line 835 at r1 (raw file):
Previously, SanjayVas (Sanjay Vasandani) wrote…
I disagree with the above comment. Keep it private for now. Tests against that method would be added if/when it gets moved to another class.
Methods should not be exposed just so they can be tested directly at any time. Always test via public API.
This is a fairly fundamental bit of functionality, and kind of a pain to test indirectly. In pure Java land I'd annotate it @VisibleForTesting and test the heck out of it.
Similarly the scaling method.
We've already agreed they belong in a library, but for sake of expediency we are not moving them right now, so making them public and writing tests doesn't really hurt much even if it's breaking a typically useful convention.
src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt
line 682 at r3 (raw file):
Previously, stevenwarejones (Steven Ware Jones) wrote…
for
vidSamplingIntervalWidth
I mean
I think you want (start < 1 && start >=0 && width >0 && width+start < 1)
src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt
line 851 at r3 (raw file):
val laplaceForFrequency = LaplaceDistribution(0.0, 1 / reachAndFrequency.frequencyPrivacyParams.epsilon) laplaceForFrequency.reseedRandomGenerator(1)
Please leave a comment in the code indicating why we are reseeding the RNG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 3 files reviewed, 3 unresolved discussions (waiting on @iverson52000, @kungfucraig, @riemanli, and @VisibleForTesting)
src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt
line 682 at r3 (raw file):
Previously, kungfucraig (Craig Wright) wrote…
I think you want (start < 1 && start >=0 && width >0 && width+start < 1)
yeah - both those with separate requires would be great
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 3 files reviewed, 3 unresolved discussions (waiting on @kungfucraig, @riemanli, @stevenwarejones, and @VisibleForTesting)
src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt
line 682 at r3 (raw file):
Previously, stevenwarejones (Steven Ware Jones) wrote…
yeah - both those with separate requires would be great
Sure. I will add both. In Craig's require, width+start <= 1
since width can be 1.0f?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 3 files reviewed, 5 unresolved discussions (waiting on @iverson52000, @kungfucraig, @riemanli, and @VisibleForTesting)
src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt
line 693 at r3 (raw file):
vidSampler.vidIsInSamplingBucket(vid, vidSamplingIntervalStart, vidSamplingIntervalWidth) } val (reachValue, frequencyMap) = calculateDirectReachAndFrequency(vidList)
maybe call this sampledReachValue
val (sampledReachValue, frequencyMap) =
Code quote:
val (reachValue, frequencyMap)
src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt
line 696 at r3 (raw file):
logger.info("Adding publisher noise to direct reach and frequency...") val (noisedReachValue, noisedFrequencyMap) =
ditto maybe sampledNoisedReachValue
35b3759
to
f88ecda
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r2, 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @iverson52000, @riemanli, @stevenwarejones, and @VisibleForTesting)
src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt
line 851 at r3 (raw file):
Previously, kungfucraig (Craig Wright) wrote…
Please leave a comment in the code indicating why we are reseeding the RNG.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @riemanli, @stevenwarejones, and @VisibleForTesting)
src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt
line 835 at r1 (raw file):
Previously, kungfucraig (Craig Wright) wrote…
This is a fairly fundamental bit of functionality, and kind of a pain to test indirectly. In pure Java land I'd annotate it @VisibleForTesting and test the heck out of it.
Similarly the scaling method.
We've already agreed they belong in a library, but for sake of expediency we are not moving them right now, so making them public and writing tests doesn't really hurt much even if it's breaking a typically useful convention.
Discussed with Sanjay and Steven yesterday and will leave the function private for now. The desired behavior, noised reach value should be scaled by sampling rate, is tested through public API executeRequisitionFulfillingWorkflow
src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt
line 693 at r3 (raw file):
Previously, stevenwarejones (Steven Ware Jones) wrote…
maybe call this
sampledReachValue
val (sampledReachValue, frequencyMap) =
Sure it's more explicit with this name
src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt
line 696 at r3 (raw file):
Previously, stevenwarejones (Steven Ware Jones) wrote…
ditto maybe
sampledNoisedReachValue
Done.
src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt
line 851 at r3 (raw file):
Previously, kungfucraig (Craig Wright) wrote…
Please leave a comment in the code indicating why we are reseeding the RNG.
Done.
Previously, iverson52000 (Albert Hsu) wrote…
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @riemanli and @VisibleForTesting)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r1, 1 of 2 files at r2, 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @stevenwarejones)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r2, 1 of 1 files at r4.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @iverson52000)
src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt
line 859 at r4 (raw file):
LaplaceDistribution(0.0, 1 / reachAndFrequency.reachPrivacyParams.epsilon) // Reseed random number generator so the results can be verified in tests. // TODO(alberthsuu): make reandomSeed an input once create noiser class
make randomSeed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r1, 1 of 2 files at r2, 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @iverson52000)
src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt
line 843 at r4 (raw file):
} // TODO(alberthsuu): Create a noiser class for this function when we need to add Gaussian noise
nit: move the TODO inside of the KDoc block rather than having a separate comment
Also follow TODO format described in code style doc: https://github.com/world-federation-of-advertisers/cross-media-measurement/blob/main/docs/code-style.md#format
…surement in EDP simulator
… private. Test the changes with public API executeRequisitionFulfillingWorkflow
…ing. Add code comments
f88ecda
to
16da062
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @iverson52000)
This PR should fix #808
This change is![Reviewable](https://camo.githubusercontent.com/23b05f5fb48215c989e92cc44cf6512512d083132bd3daf689867c8d9d386888/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)