-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix polygon merging failure. #1848
Conversation
Sometimes the GEOS/Shapely polygon merging routine `unary_union` fails with a `ValueError` and message "No Shapely geometry can be created from null value." This means that the union failed, and GEOS is passing back a null value to Shapely. This was repeatably happening with tile `11/1052/672` near Monnickendam in the Netherlands when trying to merge together a lot of very small and closely-spaced field polygons. The problem appears to be one of numerical instability; because the polygons are very closely spaced, some intersection tests are coming up with nonsensical answers. One way of fixing this is to `buffer` out a small distance, so that these points of numerical instability are more rare (although we can't guarantee to get rid of them entirely). This patch detects failure of the merge and tries again with a small (1/16th pixel) buffer. If that also fails, then it bails and returns an empty result set. The reasoning behind that is that I think we'd prefer to have a tile, even if it's missing some data, than not have a tile at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, with one possible efficiency catch.
|
||
# don't buffer by the full pixel, instead choose a smaller value that | ||
# shouldn't be noticable. | ||
buffer_size = tolerance / 16.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to pre-divide the tolerance value so that python doesn't have to do floating point divisions for every pass through here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea, thanks.
This method is called pretty dynamically; the caller can configurably call either this, _merge_polygons_with_buffer
or _merge_lines
and they all do different things with tolerance
. We could have a tolerance
and tolerance_16th
stored in the RecursiveMerger
object, but that seems a little inelegant.
Also, I think the time saving from not repeating the division is probably dwarfed by the attempt (and failure, to get to this branch) to unary_union
the list of polygons. For example, the 11/1052/672
tile that was failing, had over 4,000 polygons in that list!
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking this was called more frequently. You're right – let's keep it clean and as-is.
Sometimes the GEOS/Shapely polygon merging routine
unary_union
fails with aValueError
and message "No Shapely geometry can be created from null value." This means that the union failed, and GEOS is passing back a null value to Shapely. This was repeatably happening with tile11/1052/672
near Monnickendam in the Netherlands when trying to merge together a lot of very small and closely-spaced field polygons.The problem appears to be one of numerical instability; because the polygons are very closely spaced, some intersection tests are coming up with nonsensical answers.
I did initially try to fix this by isolating the problem, recursively attempting to merge subsets of the list of polygons until either some small set was unmergeable or the whole thing worked. Unfortunately, this turned out to not be robust because the issue is merging between two polygons, not in the polygon itself. This means that the issue simply resurfaced when aggregating the results of the recursive merge.
An alternative way of fixing this is to
buffer
out a small distance, so that these points of numerical instability are more rare (although we can't guarantee to get rid of them entirely). This seems to be much more stable in practice.This PR detects failure of the merge and tries again with a small (1/16th pixel) buffer. If that also fails, then it bails and returns an empty result set. The reasoning behind that is that I think we'd prefer to have a tile, even if it's missing some data, than not have a tile at all.