New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Qtree quantileBounds off-by-one error #472
Conversation
Added test that checks that for a QTree constructed from the range (1 to 2k+1), the true median (k+1) is contained in the estimated median (qtree.quantileBounds(0.5))
val (leftCount, rightCount) = mapChildrenWithDefault(0L)(_.count) | ||
val parentCount = count - leftCount - rightCount | ||
|
||
if (0 <= rank && rank < leftCount) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 <= rank is redundant since we'll always run the require validation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's just for clarity.
I can remove it if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can comment // note 0 <= rank due to assert above
or something, but let's not repeat the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's more confusing actually. If the code looks like it handles the case where rank < 0, a reasonable person may (initially) infer that it's possible for that to be true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I've fixed this in the latest commit.
} else if (leftCount <= rank && rank < leftCount + parentCount) { | ||
(lowerBound, upperBound) | ||
} else { | ||
// so leftCount + parentCount <= rank < count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO just the first and last comment lines here are sufficient
upperChild.flatMap{ _.findRankUpperBound(rank - lowerCount) }.orElse(Some(upperBound)) | ||
} | ||
/** | ||
* Precondition: if 0 <= rank < count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO remove the 'if' here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Random thought: we could easily optimize
On second thought, this probably won't yield huge benefits -- it would only make a difference for deep QTrees, and the default QTree depth is only 9. The benefits are probably small, but given that it won't uglify the code, it might be a good idea. |
Thats a great thing to benchmark sid, might be great, good to quantify tho |
@sid-kap it would be interesting to quantify, but I doubt it will help. 1) the scala compiler already does tailrec when it can. Since the method is private (isn't it?), it should have already done it here. |
I ran some Caliper tests, and saw no difference between the tailrec'd version and the default version. Even at depth k=50, there was no difference. (scalac, it turns out, doesn't actually tail-rec this function by default.) |
Sorry my bad, git foo on cmd line broke stuff and closed all of these |
Conflicts: algebird-core/src/main/scala/com/twitter/algebird/QTree.scala
if (rank < leftCount) { | ||
// Note that 0 <= rank < leftCount because of the require above. | ||
// So leftCount > 0, so lowerChild is not None. | ||
lowerChild.get.findRankBounds(rank) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use lowerChildNullable and upperChildNullable in this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If its called from the 'hot path' i.e. in plus or sumOption then yes worth doing it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only called in quantileBounds, so optimizing it is probably not necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I wouldn't bother there, best to keep the less functional stuff out of the non-critical speed paths. -- That said, it could be worth adding a benchmark on how the results are usually extracted from QTree, all the benchmarks now are for combinations. We had some big perf woes in HLL around some of them. So might be worth the experiment
601715d
to
4a71905
Compare
I added a benchmark for quantile bounds. |
lgtm, merge when green |
Fix Qtree quantileBounds off-by-one error
@sid-kap, thanks so much for working on this! |
See issue #377