Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct noised reach calculation in fulfillDirectReachAndFrequencyMeasurement in EDP simulator #814

Merged
merged 4 commits into from
Jan 30, 2023

Conversation

iverson52000
Copy link
Contributor

@iverson52000 iverson52000 commented Jan 17, 2023

This PR should fix #808


This change is Reviewable

Copy link
Member

@riemanli riemanli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion (waiting on @iverson52000, @kungfucraig, and @stevenwarejones)


src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 835 at r1 (raw file):

     * @return Pair of noised reach value and frequency map
     */
    fun addPublisherNoise(

I wonder if it's necessary to have this function public.

@iverson52000
Copy link
Contributor Author

src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 835 at r1 (raw file):

Previously, riemanli (Rieman) wrote…

I wonder if it's necessary to have this function public.

I think so since it needs to be unit tested?

Copy link
Member

@kungfucraig kungfucraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @riemanli and @stevenwarejones)

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @riemanli and @stevenwarejones)


src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 835 at r1 (raw file):

I think so since it needs to be unit tested?

This is backwards. Functions don't get exposed so they can be tested. Testing should be done against the public API.

Copy link
Collaborator

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @riemanli and @SanjayVas)


src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 835 at r1 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

I think so since it needs to be unit tested?

This is backwards. Functions don't get exposed so they can be tested. Testing should be done against the public API.

I would argue this belongs in a public function but not in this class. It should be accessible by anyone setting up their own EDP. What about offering this as an interface of something like a Noiser and add this as an initial impl.

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @riemanli)


src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 835 at r1 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…

I would argue this belongs in a public function but not in this class. It should be accessible by anyone setting up their own EDP. What about offering this as an interface of something like a Noiser and add this as an initial impl.

Addressing these as two separate concerns:

  1. It is certainly a valid strategy to split functionality out into different components which could then be tested independently. The consideration is whether this functionality should be considered an implementation detail of the EDP simulator. This may indeed be something that we want more expansive testing on, which warrants splitting it out. This would mean that EDP simulator test could use a test double (e.g. mock) for this if desired.
  2. We technically do not yet offer an official EDP library. Code from this repo hasn't been designed to be depended on externally1. The EDP libraries we do offer are in separate repositories (e.g. consent-signaling-client, any-sketch). If we want to provide functionality from this repo as a separate library, we should have a separate design covering that which identifies everything that should be included.

Footnotes

  1. Aside from gRPC APIs, namely the Reporting API.

Copy link
Member

@kungfucraig kungfucraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @riemanli and @SanjayVas)


src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 835 at r1 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

Addressing these as two separate concerns:

  1. It is certainly a valid strategy to split functionality out into different components which could then be tested independently. The consideration is whether this functionality should be considered an implementation detail of the EDP simulator. This may indeed be something that we want more expansive testing on, which warrants splitting it out. This would mean that EDP simulator test could use a test double (e.g. mock) for this if desired.
  2. We technically do not yet offer an official EDP library. Code from this repo hasn't been designed to be depended on externally1. The EDP libraries we do offer are in separate repositories (e.g. consent-signaling-client, any-sketch). If we want to provide functionality from this repo as a separate library, we should have a separate design covering that which identifies everything that should be included.

My sense is that we should start moving these kinds of things into the main repo under some "EDP integration util" category. Eventually moving anysketch, the PBM, and similar things into that would also make sense.

However, I don't think it's a good use of time at the moment as we have more urgent things to concern ourselves with. Perhaps we can take this up after we get some of our more urgent items sorted out, and think about this more broadly in the context of the monorepo work.

Footnotes

  1. Aside from gRPC APIs, namely the Reporting API.

Copy link
Collaborator

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @riemanli and @SanjayVas)


src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 835 at r1 (raw file):

Previously, kungfucraig (Craig Wright) wrote…

My sense is that we should start moving these kinds of things into the main repo under some "EDP integration util" category. Eventually moving anysketch, the PBM, and similar things into that would also make sense.

However, I don't think it's a good use of time at the moment as we have more urgent things to concern ourselves with. Perhaps we can take this up after we get some of our more urgent items sorted out, and think about this more broadly in the context of the monorepo work.

I'm fine with leaving it as-is, but I'd prefer the function keep a test with the idea that in the future, it won't be private.

Copy link
Member

@kungfucraig kungfucraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @riemanli and @SanjayVas)


src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 835 at r1 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…

I'm fine with leaving it as-is, but I'd prefer the function keep a test with the idea that in the future, it won't be private.

@stevenwarejones that makes sense to me

@iverson52000
Copy link
Contributor Author

src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 835 at r1 (raw file):

Previously, kungfucraig (Craig Wright) wrote…

@stevenwarejones that makes sense to me

Thanks for your feedback Steven! I discussed with the team yesterday and there’s a need to support adding Gaussian noise in the future. In that case, it will make sense to create an interface to support different types of noise distributions. I would like to handle that in a separate PR since the main purpose of this PR is to fix the calculation of noised reach value. For this PR, I will keep this function private for now and test the changes with public API in EDP simulator as Sanjay suggested.

@iverson52000
Copy link
Contributor Author

src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 835 at r1 (raw file):

Previously, iverson52000 (Albert Hsu) wrote…

Thanks for your feedback Steven! I discussed with the team yesterday and there’s a need to support adding Gaussian noise in the future. In that case, it will make sense to create an interface to support different types of noise distributions. I would like to handle that in a separate PR since the main purpose of this PR is to fix the calculation of noised reach value. For this PR, I will keep this function private for now and test the changes with public API in EDP simulator as Sanjay suggested.

Done

Copy link
Collaborator

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: 1 of 3 files reviewed, 1 unresolved discussion (waiting on @kungfucraig, @riemanli, and @SanjayVas)


src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 835 at r1 (raw file):

Previously, iverson52000 (Albert Hsu) wrote…

Done

I agree we will need to add different types of noise -- Laplace, Gaussian etc and an interface makes sense.

Sorry - I realize I wasn't clear - I'd like to keep the tests that exercise this function. Please make this function not private so it can be tested. I'm fine with internal for now.... with a TODO to move it into its own class.

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 3 files reviewed, 1 unresolved discussion (waiting on @kungfucraig, @riemanli, and @stevenwarejones)


src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 835 at r1 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…

I agree we will need to add different types of noise -- Laplace, Gaussian etc and an interface makes sense.

Sorry - I realize I wasn't clear - I'd like to keep the tests that exercise this function. Please make this function not private so it can be tested. I'm fine with internal for now.... with a TODO to move it into its own class.

I disagree with the above comment. Keep it private for now. Tests against that method would be added if/when it gets moved to another class.

Methods should not be exposed just so they can be tested directly at any time. Always test via public API.

Copy link
Collaborator

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 3 files reviewed, 2 unresolved discussions (waiting on @iverson52000, @kungfucraig, and @riemanli)


src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 856 at r2 (raw file):

      // Differentially private reach value is calculated by reach_dp = (reach + noise) /
      // sampling_rate.
      val noisedReachValue =

it feels a little weird to be upscaling the value inside of `addPublisherNoise. I think the upscaling should be done outside this function. Even if a publisher chooses not to add noise, they still need to upscale.

Copy link
Contributor Author

@iverson52000 iverson52000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 3 files reviewed, 2 unresolved discussions (waiting on @kungfucraig, @riemanli, and @stevenwarejones)


src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 856 at r2 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…

it feels a little weird to be upscaling the value inside of `addPublisherNoise. I think the upscaling should be done outside this function. Even if a publisher chooses not to add noise, they still need to upscale.

Done.

Copy link
Collaborator

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: 1 of 3 files reviewed, 2 unresolved discussions (waiting on @iverson52000, @kungfucraig, and @riemanli)


src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 682 at r3 (raw file):

  ) {
    logger.info("Calculating direct reach and frequency...")
    val vidSampler = VidSampler(VID_SAMPLER_HASH_FUNCTION)
require(samplingRate > 0 && samplingRate <= 1.0) { "Invalid samplingRate $samplingRate" }

Copy link
Collaborator

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 3 files reviewed, 2 unresolved discussions (waiting on @iverson52000, @kungfucraig, and @riemanli)


src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 682 at r3 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…
require(samplingRate > 0 && samplingRate <= 1.0) { "Invalid samplingRate $samplingRate" }

for vidSamplingIntervalWidth I mean

Code quote:

vidSamplingIntervalWidth

Copy link
Member

@kungfucraig kungfucraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: 2 of 3 files reviewed, 3 unresolved discussions (waiting on @iverson52000, @riemanli, @stevenwarejones, and @VisibleForTesting)


src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 835 at r1 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

I disagree with the above comment. Keep it private for now. Tests against that method would be added if/when it gets moved to another class.

Methods should not be exposed just so they can be tested directly at any time. Always test via public API.

This is a fairly fundamental bit of functionality, and kind of a pain to test indirectly. In pure Java land I'd annotate it @VisibleForTesting and test the heck out of it.

Similarly the scaling method.

We've already agreed they belong in a library, but for sake of expediency we are not moving them right now, so making them public and writing tests doesn't really hurt much even if it's breaking a typically useful convention.


src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 682 at r3 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…

for vidSamplingIntervalWidth I mean

I think you want (start < 1 && start >=0 && width >0 && width+start < 1)


src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 851 at r3 (raw file):

      val laplaceForFrequency =
        LaplaceDistribution(0.0, 1 / reachAndFrequency.frequencyPrivacyParams.epsilon)
      laplaceForFrequency.reseedRandomGenerator(1)

Please leave a comment in the code indicating why we are reseeding the RNG.

Copy link
Collaborator

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 3 files reviewed, 3 unresolved discussions (waiting on @iverson52000, @kungfucraig, @riemanli, and @VisibleForTesting)


src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 682 at r3 (raw file):

Previously, kungfucraig (Craig Wright) wrote…

I think you want (start < 1 && start >=0 && width >0 && width+start < 1)

yeah - both those with separate requires would be great

Copy link
Contributor Author

@iverson52000 iverson52000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 3 files reviewed, 3 unresolved discussions (waiting on @kungfucraig, @riemanli, @stevenwarejones, and @VisibleForTesting)


src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 682 at r3 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…

yeah - both those with separate requires would be great

Sure. I will add both. In Craig's require, width+start <= 1 since width can be 1.0f?

Copy link
Collaborator

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 3 files reviewed, 5 unresolved discussions (waiting on @iverson52000, @kungfucraig, @riemanli, and @VisibleForTesting)


src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 693 at r3 (raw file):

          vidSampler.vidIsInSamplingBucket(vid, vidSamplingIntervalStart, vidSamplingIntervalWidth)
        }
    val (reachValue, frequencyMap) = calculateDirectReachAndFrequency(vidList)

maybe call this sampledReachValue

val (sampledReachValue, frequencyMap) = 

Code quote:

val (reachValue, frequencyMap)

src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 696 at r3 (raw file):

    logger.info("Adding publisher noise to direct reach and frequency...")
    val (noisedReachValue, noisedFrequencyMap) =

ditto maybe sampledNoisedReachValue

Copy link
Member

@kungfucraig kungfucraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r2, 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @iverson52000, @riemanli, @stevenwarejones, and @VisibleForTesting)


src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 851 at r3 (raw file):

Previously, kungfucraig (Craig Wright) wrote…

Please leave a comment in the code indicating why we are reseeding the RNG.

lgtm

Copy link
Contributor Author

@iverson52000 iverson52000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @riemanli, @stevenwarejones, and @VisibleForTesting)


src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 835 at r1 (raw file):

Previously, kungfucraig (Craig Wright) wrote…

This is a fairly fundamental bit of functionality, and kind of a pain to test indirectly. In pure Java land I'd annotate it @VisibleForTesting and test the heck out of it.

Similarly the scaling method.

We've already agreed they belong in a library, but for sake of expediency we are not moving them right now, so making them public and writing tests doesn't really hurt much even if it's breaking a typically useful convention.

Discussed with Sanjay and Steven yesterday and will leave the function private for now. The desired behavior, noised reach value should be scaled by sampling rate, is tested through public API executeRequisitionFulfillingWorkflow


src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 693 at r3 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…

maybe call this sampledReachValue

val (sampledReachValue, frequencyMap) = 

Sure it's more explicit with this name


src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 696 at r3 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…

ditto maybe sampledNoisedReachValue

Done.


src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 851 at r3 (raw file):

Previously, kungfucraig (Craig Wright) wrote…

Please leave a comment in the code indicating why we are reseeding the RNG.

Done.

@iverson52000
Copy link
Contributor Author

src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 693 at r3 (raw file):

Previously, iverson52000 (Albert Hsu) wrote…

Sure it's more explicit with this name

Done

Copy link
Collaborator

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @riemanli and @VisibleForTesting)

Copy link
Member

@riemanli riemanli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r1, 1 of 2 files at r2, 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @stevenwarejones)

Copy link
Collaborator

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r2, 1 of 1 files at r4.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @iverson52000)


src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 859 at r4 (raw file):

        LaplaceDistribution(0.0, 1 / reachAndFrequency.reachPrivacyParams.epsilon)
      // Reseed random number generator so the results can be verified in tests.
      // TODO(alberthsuu): make reandomSeed an input once create noiser class

make randomSeed

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r1, 1 of 2 files at r2, 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @iverson52000)


src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider/EdpSimulator.kt line 843 at r4 (raw file):

    }

    // TODO(alberthsuu): Create a noiser class for this function when we need to add Gaussian noise

nit: move the TODO inside of the KDoc block rather than having a separate comment

Also follow TODO format described in code style doc: https://github.com/world-federation-of-advertisers/cross-media-measurement/blob/main/docs/code-style.md#format

Copy link
Collaborator

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @iverson52000)

@iverson52000 iverson52000 merged commit 9a2190e into main Jan 30, 2023
@iverson52000 iverson52000 deleted the alberthsuu-scale-reach-noise branch January 30, 2023 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reach value is not multiple of the reciprocal of vidSamplingRate
5 participants