Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ubsan findings fixes #2498

Merged
merged 10 commits into from
Aug 12, 2020
Merged

Conversation

SvetlanaBayarovich
Copy link
Contributor

@SvetlanaBayarovich SvetlanaBayarovich commented Jul 28, 2020

Issue

What issue is this PR targeting? If there is no issue that addresses the problem, please open a corresponding issue and link it here.

Resolves findings of the Undefined behavior sanitizer.

This PR is to changes from PR in my fork (SvetlanaBayarovich#1)
Changes are the same only the one regarding directededge.cc file is dropped as it is already merged to master

Tasklist

  • Add tests
  • Add #fixes with the issue number that this PR addresses
  • Generally use squash merge to rebase and clean comments before merging
  • Update the changelog

Requirements / Relations

Link any requirements here. Other pull requests this PR is based on?

@SvetlanaBayarovich
Copy link
Contributor Author

An update regarding performance. On CI benchmark results on build-release job are
for master (link)

-----------------------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------
BM_UtrechtCostMatrix/1        0.114 ms        0.114 ms         6026 Routes=8.76286k/s
BM_UtrechtCostMatrix/2         3.44 ms         3.44 ms          216 Routes=582.236/s
BM_UtrechtCostMatrix/4         32.5 ms         32.4 ms           22 Routes=123.34/s
BM_UtrechtCostMatrix/8         75.0 ms         74.9 ms            9 Routes=106.757/s
BM_UtrechtCostMatrix/16         182 ms          182 ms            4 Routes=87.9684/s
BM_UtrechtCostMatrix/32         395 ms          395 ms            2 Routes=81.1087/s
BM_UtrechtCostMatrix/64         992 ms          991 ms            1 Routes=64.5778/s
BM_UtrechtCostMatrix/128       2627 ms         2620 ms            1 Routes=48.8553/s
BM_UtrechtCostMatrix/256       8499 ms         8481 ms            1 Routes=30.1863/s

-----------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations
-----------------------------------------------------------------------------------
OfflineMapmatchFixture/BasicOfflineMatch     520011 ns       517812 ns         1204
BM_ManyCases/0                              2533241 ns      2525737 ns          301
BM_ManyCases/1                              3206183 ns      3206149 ns          216
BM_ManyCases/2                              3386477 ns      3380603 ns          203
BM_ManyCases/3                              7969016 ns      7957341 ns           79

for ub-fixes (link)

-----------------------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------
BM_UtrechtCostMatrix/1        0.115 ms        0.114 ms         6404 Routes=8.75265k/s
BM_UtrechtCostMatrix/2         2.69 ms         2.69 ms          256 Routes=743.667/s
BM_UtrechtCostMatrix/4         23.1 ms         23.1 ms           30 Routes=173.103/s
BM_UtrechtCostMatrix/8         58.8 ms         58.8 ms           12 Routes=136.057/s
BM_UtrechtCostMatrix/16         135 ms          135 ms            5 Routes=118.086/s
BM_UtrechtCostMatrix/32         292 ms          292 ms            2 Routes=109.459/s
BM_UtrechtCostMatrix/64         978 ms          978 ms            1 Routes=65.4333/s
BM_UtrechtCostMatrix/128       2721 ms         2721 ms            1 Routes=47.049/s
BM_UtrechtCostMatrix/256       7988 ms         7988 ms            1 Routes=32.0499/s

-----------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations
-----------------------------------------------------------------------------------
OfflineMapmatchFixture/BasicOfflineMatch     587367 ns       587355 ns         1255
BM_ManyCases/0                              1921540 ns      1921491 ns          294
BM_ManyCases/1                              2217112 ns      2217072 ns          335
BM_ManyCases/2                              2549283 ns      2549260 ns          259
BM_ManyCases/3                              7325829 ns      7325599 ns           92

@SvetlanaBayarovich
Copy link
Contributor Author

While testing locally I see the following numbers
for master

-----------------------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------
BM_UtrechtCostMatrix/1        0.036 ms        0.036 ms        19888 Routes=27.9032k/s
BM_UtrechtCostMatrix/2         1.59 ms         1.59 ms          422 Routes=1.25584k/s
BM_UtrechtCostMatrix/4         19.8 ms         19.7 ms           31 Routes=202.588/s
BM_UtrechtCostMatrix/8         51.5 ms         51.5 ms           10 Routes=155.326/s
BM_UtrechtCostMatrix/16         127 ms          127 ms            5 Routes=125.528/s
BM_UtrechtCostMatrix/32         294 ms          294 ms            2 Routes=108.929/s
BM_UtrechtCostMatrix/64         758 ms          758 ms            1 Routes=84.415/s
BM_UtrechtCostMatrix/128       1813 ms         1813 ms            1 Routes=70.6176/s
BM_UtrechtCostMatrix/256       5632 ms         5631 ms            1 Routes=45.4632/s

-----------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations
-----------------------------------------------------------------------------------
OfflineMapmatchFixture/BasicOfflineMatch     597513 ns       597417 ns          866
BM_ManyCases/0                              1830187 ns      1829279 ns          319
BM_ManyCases/1                              2251921 ns      2251327 ns          281
BM_ManyCases/2                              2637502 ns      2636133 ns          263
BM_ManyCases/3                              9742034 ns      9737754 ns           57

for ub-fixes

-----------------------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------
BM_UtrechtCostMatrix/1        0.035 ms        0.035 ms        20040 Routes=28.6054k/s
BM_UtrechtCostMatrix/2         1.62 ms         1.62 ms          424 Routes=1.23668k/s
BM_UtrechtCostMatrix/4         20.0 ms         20.0 ms           32 Routes=200.248/s
BM_UtrechtCostMatrix/8         55.9 ms         55.9 ms           12 Routes=143.227/s
BM_UtrechtCostMatrix/16         127 ms          127 ms            5 Routes=125.924/s
BM_UtrechtCostMatrix/32         305 ms          305 ms            2 Routes=105.028/s
BM_UtrechtCostMatrix/64         747 ms          747 ms            1 Routes=85.7047/s
BM_UtrechtCostMatrix/128       1805 ms         1805 ms            1 Routes=70.9263/s
BM_UtrechtCostMatrix/256       5664 ms         5662 ms            1 Routes=45.211/s

-----------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations
-----------------------------------------------------------------------------------
OfflineMapmatchFixture/BasicOfflineMatch     605216 ns       605045 ns          893
BM_ManyCases/0                              1816144 ns      1815517 ns          329
BM_ManyCases/1                              2269957 ns      2268949 ns          295
BM_ManyCases/2                              2560568 ns      2559744 ns          262
BM_ManyCases/3                              9740998 ns      9736263 ns           57

uint64_t shift = localidx * 8; // 8 bits per index
return static_cast<uint32_t>(std::round(
((headings_ & (static_cast<uint64_t>(255) << shift)) >> shift) * kHeadingExpandFactor));
if (localidx > kMaxLocalEdgeIndex) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this actually happen? this should be limited by nodeinfo::local_edge_count_ which is 3 bits

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does happen in the unit tests in my experience and others also ran into this issue (I don't know how exactly). See the discussion from the initial PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is it shouldn't happen and indicates another bug upstream, right? Instead of papering over it we should follow the call stack and figure our why bogus values are being sent here and fix that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I assumed it might be an expected behavior as long as there was such check in set_heading. I'll try to look into it considering your remark regarding local_edge_count_

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first time I encounter this issue is during "Building Utrecht Tiles..." custom command, namely on

  COMMAND ${CMAKE_BINARY_DIR}/valhalla_build_tiles
      --inline-config '{"mjolnir":{"id_table_size":1000,"tile_dir":"test/data/utrecht_tiles","timezone":"test/data/tz.sqlite","admin":"${VALHALLA_SOURCE_DIR}/test/data/netherlands_admin.sqlite","hierarchy":true,"shortcuts":true,"concurrency":1,"logging":{"type":""}}}'
      -s build -e cleanup
      ${VALHALLA_SOURCE_DIR}/test/data/utrecht_netherlands.osm.pbf

The callstack at that moment is

...
10 valhalla::baldr::NodeInfo::heading(unsigned int) const      nodeinfo.cc   227
11 (anonymous namespace)::GetTurnTypes()                       graphenhancer.cc   214    
12 (anonymous namespace)::UpdateTurnLanes()                    graphenhancer.cc    418  
13 (anonymous namespace)::enhance(...)                         graphenhancer.cc    1790    
...

When during second pass node with index 1116 is processed, it has edge_index equal to 3310 (which is a valid value for edge?) and without any further checks it goes down to GetTurnTypes function which requests heading

Copy link
Contributor Author

@SvetlanaBayarovich SvetlanaBayarovich Jul 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the index to get heading is calculated as nodeinfo.edge_index() + j where j is j = 0; j < nodeinfo.edge_count(); j++. So this does not look like a correct calculation for localidx to be used in this function. The heading value is needed for GetTurnTypes function. If enhancement should be performed for each edge (not only for those 8 local edges) looks like we need to get heading from somewhere else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this PR, should I create a separate issue for this problem and drop this change from in order to deal with it separately?
What's your opinion @kevinkreiser @dnesbitt61 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that would be great yes. it think its clear that something is wrong with the UpdateTurnLanes code. It assumes that certain objects are gauranteed to be in vectors but i think sometimes the vectors are empty and so it uses junk data. splitting that out and working on that separately is a good idea

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping this change. Separate issue was created

@purew
Copy link
Contributor

purew commented Jul 29, 2020

I measured no difference between 0f41d17 and master when running 23886 routes.

originally posted at SvetlanaBayarovich#1 (comment)

@@ -61,7 +61,7 @@ struct TrafficSpeeds {
};

// Convert big endian bytes to little endian
int16_t to_little_endian(const int16_t val) {
int16_t to_little_endian(const uint16_t val) {
return (val << 8) | ((val >> 8) & 0x00ff);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah so bitshifting signed numbers is implementation dependent so im not sure what was happening here. i'm not sure if we have any negative numbers in the input but if we do this change will definitely have an impact on what we do with the data. i'll follow up with you offline with some sample data to see if we need to worry about it

@@ -88,7 +88,7 @@ std::string encode(const container_t& points, const int precision = 1e6) {
// handy lambda to turn an integer into an encoded string
auto serialize = [&output](int number) {
// move the bits left 1 position and flip all the bits if it was a negative number
number = number < 0 ? ~(number << 1) : (number << 1);
number = number < 0 ? ~(static_cast<unsigned int>(number) << 1) : (number << 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is again bit shifting a signed number and in this case we know its negative. im surprised this doesnt change lots of stuff.. one good test for this would be to run valhalla_export_edges on a larger tileset since it dumps out the whole tilesets shape. then we can look at the diffs in there to see what this does. maybe we were just lucky on the platforms that we run that the implementation dependent handling was equivalent to the above? we need to test this more rigorously

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even more lucky, because implementation dependent is for right shift of negative value, for left it's UB.
What exactly do you suggest to test? As for correctness, there is a unit-test eg this one which does encoding-decoding and definitely creates the case with negative values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well my suggestion was as i stated:

one good test for this would be to run valhalla_export_edges on a larger tileset ...

yep i am aware of the unit test (i wrote the original), but its not extensive. the reason i want to do a broader test is because this touches literally everything. every single route, mapmatch, how the data is stored. this is a fundamental change so we better be sure its ok. same with the other one above

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wrote a small program just to check int << 1 vs static_cast<unisigned int>(int) << 1 and it seems that at least on my computer the two operations have identical results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinkreiser I did the testing we discussed

  1. Build tiles
  2. export edges -> compare
  3. Add live traffic -> compare tiles
    for Utrecht tiles. Comparison showed no difference. Is there any testing to be done, eg regarding this comment ?

@kevinkreiser kevinkreiser merged commit 87a6997 into valhalla:master Aug 12, 2020
@SvetlanaBayarovich SvetlanaBayarovich deleted the ub-fixes branch August 12, 2020 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants