-
Notifications
You must be signed in to change notification settings - Fork 660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ubsan findings fixes #2498
Ubsan findings fixes #2498
Conversation
5606bed
to
2bc3b5a
Compare
An update regarding performance. On CI benchmark results on
for
|
While testing locally I see the following numbers
for
|
src/baldr/nodeinfo.cc
Outdated
uint64_t shift = localidx * 8; // 8 bits per index | ||
return static_cast<uint32_t>(std::round( | ||
((headings_ & (static_cast<uint64_t>(255) << shift)) >> shift) * kHeadingExpandFactor)); | ||
if (localidx > kMaxLocalEdgeIndex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this actually happen? this should be limited by nodeinfo::local_edge_count_
which is 3 bits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does happen in the unit tests in my experience and others also ran into this issue (I don't know how exactly). See the discussion from the initial PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is it shouldn't happen and indicates another bug upstream, right? Instead of papering over it we should follow the call stack and figure our why bogus values are being sent here and fix that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I assumed it might be an expected behavior as long as there was such check in set_heading
. I'll try to look into it considering your remark regarding local_edge_count_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first time I encounter this issue is during "Building Utrecht Tiles..."
custom command, namely on
COMMAND ${CMAKE_BINARY_DIR}/valhalla_build_tiles
--inline-config '{"mjolnir":{"id_table_size":1000,"tile_dir":"test/data/utrecht_tiles","timezone":"test/data/tz.sqlite","admin":"${VALHALLA_SOURCE_DIR}/test/data/netherlands_admin.sqlite","hierarchy":true,"shortcuts":true,"concurrency":1,"logging":{"type":""}}}'
-s build -e cleanup
${VALHALLA_SOURCE_DIR}/test/data/utrecht_netherlands.osm.pbf
The callstack at that moment is
...
10 valhalla::baldr::NodeInfo::heading(unsigned int) const nodeinfo.cc 227
11 (anonymous namespace)::GetTurnTypes() graphenhancer.cc 214
12 (anonymous namespace)::UpdateTurnLanes() graphenhancer.cc 418
13 (anonymous namespace)::enhance(...) graphenhancer.cc 1790
...
When during second pass node with index 1116
is processed, it has edge_index
equal to 3310
(which is a valid value for edge?) and without any further checks it goes down to GetTurnTypes
function which requests heading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that the index to get heading is calculated as nodeinfo.edge_index() + j
where j
is j = 0; j < nodeinfo.edge_count(); j++
. So this does not look like a correct calculation for localidx
to be used in this function. The heading
value is needed for GetTurnTypes
function. If enhancement should be performed for each edge (not only for those 8 local edges) looks like we need to get heading from somewhere else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding this PR, should I create a separate issue for this problem and drop this change from in order to deal with it separately?
What's your opinion @kevinkreiser @dnesbitt61 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think that would be great yes. it think its clear that something is wrong with the UpdateTurnLanes
code. It assumes that certain objects are gauranteed to be in vectors but i think sometimes the vectors are empty and so it uses junk data. splitting that out and working on that separately is a good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropping this change. Separate issue was created
I measured no difference between 0f41d17 and master when running 23886 routes. originally posted at SvetlanaBayarovich#1 (comment) |
@@ -61,7 +61,7 @@ struct TrafficSpeeds { | |||
}; | |||
|
|||
// Convert big endian bytes to little endian | |||
int16_t to_little_endian(const int16_t val) { | |||
int16_t to_little_endian(const uint16_t val) { | |||
return (val << 8) | ((val >> 8) & 0x00ff); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah so bitshifting signed numbers is implementation dependent so im not sure what was happening here. i'm not sure if we have any negative numbers in the input but if we do this change will definitely have an impact on what we do with the data. i'll follow up with you offline with some sample data to see if we need to worry about it
@@ -88,7 +88,7 @@ std::string encode(const container_t& points, const int precision = 1e6) { | |||
// handy lambda to turn an integer into an encoded string | |||
auto serialize = [&output](int number) { | |||
// move the bits left 1 position and flip all the bits if it was a negative number | |||
number = number < 0 ? ~(number << 1) : (number << 1); | |||
number = number < 0 ? ~(static_cast<unsigned int>(number) << 1) : (number << 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is again bit shifting a signed number and in this case we know its negative. im surprised this doesnt change lots of stuff.. one good test for this would be to run valhalla_export_edges on a larger tileset since it dumps out the whole tilesets shape. then we can look at the diffs in there to see what this does. maybe we were just lucky on the platforms that we run that the implementation dependent handling was equivalent to the above? we need to test this more rigorously
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even more lucky, because implementation dependent
is for right shift of negative value, for left it's UB.
What exactly do you suggest to test? As for correctness, there is a unit-test eg this one which does encoding-decoding and definitely creates the case with negative values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well my suggestion was as i stated:
one good test for this would be to run valhalla_export_edges on a larger tileset ...
yep i am aware of the unit test (i wrote the original), but its not extensive. the reason i want to do a broader test is because this touches literally everything. every single route, mapmatch, how the data is stored. this is a fundamental change so we better be sure its ok. same with the other one above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wrote a small program just to check int << 1
vs static_cast<unisigned int>(int) << 1
and it seems that at least on my computer the two operations have identical results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevinkreiser I did the testing we discussed
- Build tiles
- export edges -> compare
- Add live traffic -> compare tiles
for Utrecht tiles. Comparison showed no difference. Is there any testing to be done, eg regarding this comment ?
58cbe10
to
6b80089
Compare
6b80089
to
58cbe10
Compare
Issue
What issue is this PR targeting? If there is no issue that addresses the problem, please open a corresponding issue and link it here.
Resolves findings of the Undefined behavior sanitizer.
This PR is to changes from PR in my fork (SvetlanaBayarovich#1)
Changes are the same only the one regarding directededge.cc file is dropped as it is already merged to master
Tasklist
Requirements / Relations
Link any requirements here. Other pull requests this PR is based on?