Change split function from largest axis to minimum overlap #6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The current splitting mechanism seems quite simple and based on recent papers on node splitting for r-trees I think there is a lot of room for improving it.
One way is instead of splitting the rectangle (when it reached maximum entries) based on its largest axis
we can split it based on the axis which makes a smaller overlapping area after the split.
This way it generates more efficient splitting because the smaller the overlapping area helps find the right rectangle faster.
When there is no overlapping area (the overlapping area is zero) we can use the largest axis like before.
I run this benchmark from the test file for 10 million records for 5 times
go test -bench=. -run=TestGeoIndex -count=5
which gives these results.
Largest axis:
Minimum overlapping:
Searching took about 810 ns/op for minimum_overlap and 1005 ns/op for the largest axis which seems a fairly good imporvment.
Insertion time increased a bit from about 669 ns/op to about 730 ns/op because we do more jobs in the splitting phase.
I think in geo indexing people insert the data once and query it many times so this algorithm may speed up queries for most of the users.