Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MinHasher32 test is flakey #500

Closed
johnynek opened this issue Nov 19, 2015 · 5 comments
Closed

MinHasher32 test is flakey #500

johnynek opened this issue Nov 19, 2015 · 5 comments
Labels

Comments

@johnynek
Copy link
Collaborator

[info] MinHasher32
[info] - should measure 0.5 similarity in 1024 bytes with < 0.1 error
[info] - should measure 0.8 similarity in 1024 bytes with < 0.05 error *** FAILED ***
[info]   0.05833333333333324 was not less than 0.05 (MinHasherTest.scala:30)
[info] - should measure 1.0 similarity in 1024 bytes with < 0.01 error

This is quite common. Need to weaken the bound or figure out if there is really some bug (doubt it).

@Gabriella439
Copy link
Contributor

Another instance of this occurring:

https://travis-ci.org/twitter/algebird/jobs/111090636

[info] - should measure 0.8 similarity in 1024 bytes with < 0.05 error *** FAILED ***
[info]   0.07916666666666672 was not less than 0.05 (MinHasherTest.scala:30)

@johnynek
Copy link
Collaborator Author

we have that approach that @sid-kap added. @Gabriel439 want to take a stab at porting this test to that framework?

@Gabriella439
Copy link
Contributor

Sure, I will give it a stab

@Gabriella439
Copy link
Contributor

So I think this is a case of the error bounds being incorrect. The test that occasionally fails is this one:

    "measure 0.8 similarity in 1024 bytes with < 0.05 error" in {
      test(new MinHasher32(0.8, 1024), 0.8, 0.05)
    }

... and if you follow the code for that specific MinHasher32 constructor initialization it initializes numHashes to 247. The expected error for the Min Hash algorithm is 1 / sqrt numHashes which is in this case evaluates to approximately 0.064. This explains why the test occasionally fails because the test requires an error less 0.05 which is below the expected error. We should probably bump the expected error up to at least 0.1.

@johnynek
Copy link
Collaborator Author

sounds fine to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants