Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Qtree quantileBounds off-by-one error #472

Merged
merged 11 commits into from Aug 4, 2015

Conversation

sid-kap
Copy link
Contributor

@sid-kap sid-kap commented Jul 28, 2015

See issue #377

Added test that checks that for a QTree constructed from the range
(1 to 2k+1), the true median (k+1) is contained in the estimated
median (qtree.quantileBounds(0.5))
val (leftCount, rightCount) = mapChildrenWithDefault(0L)(_.count)
val parentCount = count - leftCount - rightCount

if (0 <= rank && rank < leftCount) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 <= rank is redundant since we'll always run the require validation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's just for clarity.
I can remove it if you want.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can comment // note 0 <= rank due to assert above
or something, but let's not repeat the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more confusing actually. If the code looks like it handles the case where rank < 0, a reasonable person may (initially) infer that it's possible for that to be true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I've fixed this in the latest commit.

} else if (leftCount <= rank && rank < leftCount + parentCount) {
(lowerBound, upperBound)
} else {
// so leftCount + parentCount <= rank < count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO just the first and last comment lines here are sufficient

upperChild.flatMap{ _.findRankUpperBound(rank - lowerCount) }.orElse(Some(upperBound))
}
/**
* Precondition: if 0 <= rank < count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO remove the 'if' here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@sid-kap
Copy link
Contributor Author

sid-kap commented Jul 29, 2015

Random thought: we could easily optimize findRankBounds with the following changes:

  • @tailrec the function (trivial)
  • (maybe) drop the require, since we can prove that it is satisfied everywhere it is called anyway.

On second thought, this probably won't yield huge benefits -- it would only make a difference for deep QTrees, and the default QTree depth is only 9.

The benefits are probably small, but given that it won't uglify the code, it might be a good idea.

@ianoc
Copy link
Collaborator

ianoc commented Jul 29, 2015

Thats a great thing to benchmark sid, might be great, good to quantify tho

@johnynek
Copy link
Collaborator

@sid-kap it would be interesting to quantify, but I doubt it will help. 1) the scala compiler already does tailrec when it can. Since the method is private (isn't it?), it should have already done it here. @tailrec just means error if you can't make it tailrec, it does not mean it won't happen without it. 2) require is probably free since if it is always true the branch predictor should notice that and assume it is true and keep on going.

@sid-kap
Copy link
Contributor Author

sid-kap commented Jul 29, 2015

I ran some Caliper tests, and saw no difference between the tailrec'd version and the default version. Even at depth k=50, there was no difference. (scalac, it turns out, doesn't actually tail-rec this function by default.)

@ianoc ianoc closed this Aug 4, 2015
@ianoc
Copy link
Collaborator

ianoc commented Aug 4, 2015

Sorry my bad, git foo on cmd line broke stuff and closed all of these

@ianoc ianoc reopened this Aug 4, 2015
Conflicts:
	algebird-core/src/main/scala/com/twitter/algebird/QTree.scala
if (rank < leftCount) {
// Note that 0 <= rank < leftCount because of the require above.
// So leftCount > 0, so lowerChild is not None.
lowerChild.get.findRankBounds(rank)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use lowerChildNullable and upperChildNullable in this method?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If its called from the 'hot path' i.e. in plus or sumOption then yes worth doing it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only called in quantileBounds, so optimizing it is probably not necessary

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I wouldn't bother there, best to keep the less functional stuff out of the non-critical speed paths. -- That said, it could be worth adding a benchmark on how the results are usually extracted from QTree, all the benchmarks now are for combinations. We had some big perf woes in HLL around some of them. So might be worth the experiment

@sid-kap
Copy link
Contributor Author

sid-kap commented Aug 4, 2015

I added a benchmark for quantile bounds.
I didn't see any performance improvements from using nullables in findRankBounds, so I'll let that function be for now.
Can we merge this when green?

@ianoc
Copy link
Collaborator

ianoc commented Aug 4, 2015

lgtm, merge when green

johnynek added a commit that referenced this pull request Aug 4, 2015
Fix Qtree quantileBounds off-by-one error
@johnynek johnynek merged commit d1aed85 into twitter:develop Aug 4, 2015
@sid-kap sid-kap deleted the qtree_off_by_one branch August 4, 2015 23:24
@joshualande
Copy link
Contributor

@sid-kap, thanks so much for working on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants