Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-implement MedianIterator to use a proper sequence of medians #88

Merged
merged 2 commits into from Feb 26, 2019

Conversation

amichair
Copy link
Contributor

@amichair amichair commented Feb 14, 2019

The previous implementation of MedianIterator didn't actually iterate over medians, it just started from the global median and then flip-flopped between higher and lower elements in sequential order (not sure how the sign flipping was better than just a simple sequential loop.)

The new implementation actually follows a full sequence of medians in a recursive-like manner. This allows search trees constructed from a sorted wordlist to be well-balanced, providing much better tree search performance. In the existing performance unit test it shows around 40% improvement in search speed.

If you merge the other PRs from today first, you can also print out the tree structure before/after this change and see how much more balanced it looks, or print out the node depth histogram and see the numbers. I've also attached here for your amusement a plot of the histograms for various iterator implementations (depth of all end-of-word nodes vs how many times each depth occurs in the tree) - the orange curve is the previous implementation, the green curve is a random insertion order, the purple curve is an implementation that follows medians of a power of two greater than or equal to to the data size and skips the indices that are beyond the data size, and the yellow curve is the current implementation which is the series of medians for the exact data size (it almost exactly coincides with the purple curve, but ever so slightly better, and is more elegant in implementation and in theory). Note that even the longest paths in the current implementation are shorter than the average path length in the previous implementation!

This was fun :-)

tree hist3

@dfish3r
Copy link
Member

dfish3r commented Feb 26, 2019

If they give out awards for PRs, you should get one.
Kudos on your work. Clean, concise and quite the improvement.

@dfish3r dfish3r merged commit a24d807 into vt-middleware:master Feb 26, 2019
@amichair amichair deleted the tree-median-iterator branch Feb 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants