Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid UB on converting NaN to int in annotations #2992

Merged
merged 2 commits into from
Apr 13, 2021

Conversation

averkhaturau
Copy link
Contributor

Issue

Speed annotation may fail with converting NaN to integer if the same geometry point in duplicated in a route.

Tasklist

  • Add tests
  • Add #fixes with the issue number that this PR addresses
  • Update the docs with any new request parameters or changes to behavior described
  • Update the changelog

Requirements / Relations

Link any requirements here. Other pull requests this PR is based on?

@kevinkreiser
Copy link
Member

Why does the duplicate occur? I thought the cutting logic was robust to this which means it has to be duplicated shape in OSM itself? If so that is an easy gruka test to write to actually get test coverage here. Shouldn't we add that quickly before merging?

Copy link
Member

@danpat danpat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can someone add a test for this case? @kevinkreiser yeah, this is occuring when the underlying OSM data contains a zero-length, duplicate-coordinate node.

TEST(ShapeAttributes, test_shape_attributes_duplicated_point) {
tyr::actor_t actor(conf);

auto result_json = actor.trace_attributes(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test passes on master as well...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@purew the test will pass, but is ubsan triggered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to test UB without ubsan, I'll enable it on CI

Copy link
Member

@kevinkreiser kevinkreiser Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be fine to locally on your computer:

  1. undo the fix in triplegbuilder but keep the unit test
  2. turn on ubsan locally and run the unit test
  3. show that the sanitizer finds the issue via the unit test and post it to this PR

we shouldnt change the behavior of CI in this PR, iirc when we had that enabled we either got randomly OOM killed or the build took forever

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, CI should test valhalla for UB, it would help to avoid UB issues in future, It's not the only issue found with ubsan.

@kevinkreiser
Copy link
Member

i wonder if this line is a contributing factor:

double speed = (edge->length() * (tgt_pct - src_pct)) / edge_seconds;

maybe speed is 0 if at the beginning or end of a map match we use 0% of an edge or maybe edge_seconds is 0. @averkhaturau i assume you ran into this sometime. did you happen to debug it to see the root cause of the nan?

@averkhaturau
Copy link
Contributor Author

did you happen to debug it to see the root cause of the nan?

In short, if we have a duplicated coordinate in route geometry (we usually do in multileg routes), the speed is calculated as
speed = length / time
Both length and time are 0., so speed is set to NaN. Then it converts to integer in call of set_speed(int), which is undefined behavior. Happily (or actually unhappily) clang converts NaN to integer 0 and nothing awful happens. ubsan catches it.

So, the reason is actually a duplicated coordinate.

@kevinkreiser
Copy link
Member

kevinkreiser commented Apr 13, 2021

I do not understand your comment about multileg routes! TripLegBuilder only works on one leg at a time, so surely multileg routes cannot cause this problem. I'm just wondering where the point duplication comes in. Is it in the base OSM data like that or do we have a bug elsewhere that is erroneously repeating a coordinate (ie when cutting the shape for the first or last edge of the route, also done in triplegbuilder)? its good to handle it because OSM could have duplicate adjacent points (and im certain we dont weed that out at tile building time), but my main question is what is causing duplicated points and do we need to fix something more in a separate PR

@merkispavel
Copy link
Contributor

merkispavel commented Apr 13, 2021

I do not understand your comment about multileg routes! TripLegBuilder only works on one leg at a time, so surely multileg routes cannot cause this problem. I'm just wondering where the point duplication comes in. Is it in the base OSM data like that or do we have a bug elsewhere that is erroneously repeating a coordinate (ie when cutting the shape for the first or last edge of the route, also done in triplegbuilder)? its good to handle it because OSM could have duplicate adjacent points (and im certain we dont weed that out at tile building time), but my main question is what is causing duplicated points and do we need to fix something more in a separate PR

@kevinkreiser as far as i see we do have a request that reproduces the issue. I'll come back to you soon and let you know if we need one more fix somewhere else. But I believe this activity is not a blocker for this PR

@averkhaturau
Copy link
Contributor Author

This PR is a hot workaround of UB. Duplicated point is a separate issue.

@kevinkreiser
Copy link
Member

kevinkreiser commented Apr 13, 2021

yeah im not trying to block the PR. let me reiterate my points more concisely:

its good to handle it because OSM could have duplicate adjacent points

^^ we should ship this to protect us from NaN regardless of the cause

what is causing duplicated points and do we need to fix something more in a separate PR

^^ lets investigate the cause of the duplicate adjacent shape points in a separate issue/PR

we shouldnt change the behavior of CI in this PR

^^ revert the changes to CI

@averkhaturau
Copy link
Contributor Author

@kevinkreiser , I reverted CI changes. Should I revert a test as well?

Copy link
Contributor

@merkispavel merkispavel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@danpat danpat self-requested a review April 13, 2021 15:27
@merkispavel merkispavel merged commit 8d66f40 into master Apr 13, 2021
@danpat
Copy link
Member

danpat commented Apr 13, 2021

Duplicated coordinates can easily exist in the original OSM data - I don't think there's any need to look for another cause here. A simple scan of OSM I'm sure will expose thousands, if not millions, of cases of this. Valhalla needs to be robust against that, and as @averkhaturau pointed out, we've been lucky that it just so happens that NaN->0 for our serialization in most cases.

@purew purew deleted the av-avoid-NaN-in-annotations branch April 14, 2021 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants