Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for edge cases with bbox overlap and out-of-bounds polygons #817

Merged
merged 4 commits into from
Feb 13, 2024

Conversation

nrabinowitz
Copy link
Collaborator

More follow-up for #800:

  • Adds tests for CONTAINMENT_OVERLAPPING_BBOX and fixes the out-of-bounds write when using bbox overlap on a one-vertex polygon
  • Adds a simpler version of this test case and fixes CONTAINMENT_OVERLAPPING results for polygons outside of the normal lat/lng range. The issue here was the test for cell-contains-polygon, which used latLngToCell - this function will return results for out-of-bounds input, presumably wrapping around the sphere, but because the rest of the logic (including the estimator) in polygonToCells is using cartesian coords, these false positives were unexpected. I updated this to include a bbox check first, which will exclude polygons with all points outside of the normal range.

@@ -142,24 +134,6 @@ SUITE(polyfillInternal) {
t_assert(iter.cell == H3_NULL, "Got null output for invalid cell");
}

TEST(iterStepPolygonCompact_invalidPolygonErrors) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test covered a block that AFAIK is now unreachable.

t_assertSuccess(H3_EXPORT(maxPolygonToCellsSizeExperimental)(
&pointGeoPolygon, res, CONTAINMENT_CENTER, &numHexagons));
t_assert(numHexagons >= 1 && numHexagons <= 5,
"got expected estimated size");
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated these assertions to expect a reasonable range rather than a specific value

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 5 the actual max that could be contained? I guess it doesn't really matter as long as it's equal or above the actual number that could be contained.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the max I saw in tests was 4, but 5 seemed reasonable 🤷

@coveralls
Copy link

coveralls commented Feb 12, 2024

Coverage Status

coverage: 98.827% (-0.002%) from 98.829%
when pulling a260cfa on nrabinowitz:bbox-overlap-tests
into b57df8b on uber:master.

t_assertSuccess(H3_EXPORT(maxPolygonToCellsSizeExperimental)(
&pointGeoPolygon, res, CONTAINMENT_CENTER, &numHexagons));
t_assert(numHexagons >= 1 && numHexagons <= 5,
"got expected estimated size");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 5 the actual max that could be contained? I guess it doesn't really matter as long as it's equal or above the actual number that could be contained.

@@ -398,6 +437,22 @@ SUITE(polygonToCells) {
free(hexagons);
}

TEST(polygonToCellsContainsPolygon_OverlappingBBox) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TEST(polygonToCellsContainsPolygon_OverlappingBBox) {
TEST(polygonToCellsContainsPolygon_overlappingBBox) {

nitpicking

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We weren't consistent here - normalized all to _PascalCase and made sure the center containment tests were noted as such.

@@ -226,6 +234,21 @@ SUITE(polygonToCells) {
free(hexagons);
}

TEST(polygonToCells_OverlappingBBox) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TEST(polygonToCells_OverlappingBBox) {
TEST(polygonToCells_overlappingBBox) {

nitpicking

Comment on lines +460 to +467
// We have to check whether the point is in the expected range
// first, because out-of-bounds values will yield false
// positives with latLngToCell
if (bboxContains(&VALID_RANGE_BBOX, &firstVertex)) {
H3Index polygonCell;
H3Error polygonCellErr = H3_EXPORT(latLngToCell)(
&(iter->_polygon->geoloop.verts[0]), cellRes,
&polygonCell);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only an optimization so it does not matter if the first vertex would actually overlap the cell or if other parts of the polygon overlap the cell?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is here because our other logic didn't catch polygons that were entirely contained by the cell (a situation that only applies to CONTAINMENT_OVERLAPPING). This catches that case, and additionally catches other cases that would be caught by the more expensive line-intersection check. So it's not an optimization - it's required for the cell-contains-polygon case - but it might cheaply catch other cases as well.

Does that answer the question?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it be the case that a user passes in a polygon that is entirely contained by a cell, but the first vertex is expressed as over the antimeridian (> M_PI)? In that case, wouldn't it be necessary to check the later vertexes since one of them might be validly in range? Or that the user would expect that single out-of-range vertex to result in latLngToCell?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm comfortable requiring input for polygonToCells to be within valid coordinate range if you want reliable output. Because we use Cartesian logic for most of the checks here, coordinates > M_PI are likely to fail in a number of places. We should definitely guard against bad input causing segfaults or insecure memory access, but I don't feel like we need to try to produce good results for bad input. FWIW I'm pretty sure our previous algo wouldn't properly fill a polygon with out-of-bounds coords either.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with this if it not worse than the previous algorithm.

@nrabinowitz nrabinowitz merged commit a523fc6 into uber:master Feb 13, 2024
34 checks passed
@nrabinowitz nrabinowitz deleted the bbox-overlap-tests branch February 13, 2024 22:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants