Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Proof of bounds on merging digest size #77

Closed
jcrist opened this issue Dec 12, 2016 · 3 comments
Closed

Question: Proof of bounds on merging digest size #77

jcrist opened this issue Dec 12, 2016 · 3 comments

Comments

@jcrist
Copy link

jcrist commented Dec 12, 2016

I've been working on a python implementation of the merging digest (written mostly in c). Code is here. I'm running into bounds issues on the centroid array, following the theoretical size of ceil(compression * pi/2) (as seen here).

I modified the java implementation to test this bound, and get out-of-bounds errors when running the existing test suite. I also modified the java tests to output the input data, and ran my implementation on those inputs - finding the same results. As such, I trust my implementation matches yours - and since both have out-of-bounds errors on buffers of that size, I'm skeptical of that theoretical size.

From reading the paper, it appears you concluded that the bounds on the number of maximally merged centroids is 2 * compression (see here), which differs from what's stated in the code. I can kind of convince myself of this proof by iterating through a worst case scenario on paper, but am not math-y enough to write out a formal proof. Using a bound of 2 * compression seems to squash all the out-of-bounds errors though.

Question:

  • What is the actual theoretical bound?
  • Can you sketch out a proof of how you came by this number?
@tdunning
Copy link
Owner

tdunning commented Dec 13, 2016 via email

@jcrist
Copy link
Author

jcrist commented Dec 13, 2016

Thanks, that was a very clear proof. I really appreciate you taking the time to sketch that out.

This comment should probably be updated to reflect this - there are a few implementations that are using the incorrect ceil(compression * pi / 2) that it states (see here and here).

@jcrist jcrist closed this as completed Dec 13, 2016
@tdunning
Copy link
Owner

tdunning commented Dec 13, 2016 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants