Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

big difference in ArrayDigest vs AVLTreeDigest quantile result #89

Open
xysun opened this issue May 26, 2017 · 5 comments
Open

big difference in ArrayDigest vs AVLTreeDigest quantile result #89

xysun opened this issue May 26, 2017 · 5 comments

Comments

@xysun
Copy link

xysun commented May 26, 2017

Hi,

I'm running a very simple simulation and surprisingly found some big difference in quantile result from ArrayDigest and AVLTreeDigest. AVLTreeDigest is giving less accurate results.

Simulation setup:

  1. send 20k measurement, where each measurement has 0.1% chance of being 100, and 99.9% chance of being 5
  2. create a ArrayDigest or AVLTreeDigest with compression = 15
  3. Expected result should be: 95-99 quantile = 5; 99.9 quantile = 100

Result (from one simulation run, but each run is very close to each other):

ArrayDigest:

0.95, quantile value 5.0
0.96, quantile value 5.0
0.97, quantile value 5.0
0.98, quantile value 5.0
0.99, quantile value 5.0
0.999, quantile value 100.0

AVLTreeDigest:

0.95, quantile value 5.0
0.96, quantile value 5.0
0.97, quantile value 5.0
0.98, quantile value 8.790523690773068
0.99, quantile value 56.1720698254364
0.999, quantile value 98.81546134663341

Is this expected?

Code can be found here

@tdunning
Copy link
Owner

tdunning commented May 28, 2017

Very good observation. What you are seeing is a serious bug that is related to the handling of duplicate values in the AVLTreeDigest.

The ArrayDigest is now deprecated, but the good news is that the new MergingDigest behaves correctly. Here is the new test, similar to what you have, but a bit more extreme:

  public void testSingletonInACrowd() {
        final double compression = 100;
        TDigest dist = factory(compression).create();
        for (int i = 0; i < 10000; i++) {
            dist.add(10);
        }
        dist.add(20);
        dist.compress();
        assertEquals(10.0, dist.quantile(0), 0);
        assertEquals(10.0, dist.quantile(0.5), 0);
        assertEquals(10.0, dist.quantile(0.8), 0);
        assertEquals(10.0, dist.quantile(0.9), 0);
        assertEquals(10.0, dist.quantile(0.99), 0);
        assertEquals(10.0, dist.quantile(0.999), 0);
        assertEquals(20.0, dist.quantile(1), 0);
    }

I think I know a way to make the AVLDigest do the right thing. As I see it, this is a blocking bug for the 3.0 release. The work-around is to use the MergingDigest.

@xysun
Copy link
Author

xysun commented May 29, 2017

Thanks! This is very helpful.

@tdunning
Copy link
Owner

tdunning commented Aug 6, 2017

I am going to push this to after the 3.2 release so that 3.2 can go out.

@tdunning
Copy link
Owner

The new interpolation methods look like they may fix this issue.

@tdunning
Copy link
Owner

tdunning commented Apr 2, 2021

This is half fixed, but I won't finish checking that the fix is complete until after this 3.3 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants