-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Range queries for CountMinSketch #58
Comments
BTW: the above could also be used with |
This looks good to me. Some tests and I'll be convinced. About the monoid approach, I like the idea. The problem is we are going to have some issues proving bounds. So, I wonder if we have a monoid on V and a metric on V, can we push the proof through so we can show that |sum(V) - approxSum(V)| < epsilon except with probability delta for some known epsilon, delta. By the way, pushing this all through for any monoid with a metric is probably worth a paper, if it is possible. We may need to assume something about the structure of the metric. |
Nice! This blog post describes the dyadic range approach as well. Apparently http://madlib.incubator.apache.org/ uses this algorithm to do range queries and estimate percentiles on big datasets. |
See eg http://www.cse.cuhk.edu.hk/~taoyf/course/5705f10/lec8.pdf for how this works.
I have a basic working implementation of the DyadicRange logic (for positive integer keys, which you can map whatever else to), but it needs to be hooked up to
levels
instances of CMS. We also want to implement binary search using this to get quantile queries.The text was updated successfully, but these errors were encountered: