Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixes(live-speed): Fixes underflow for traffic speed > 140kph #2325

Merged
merged 10 commits into from
Apr 22, 2020

Conversation

purew
Copy link
Contributor

@purew purew commented Apr 16, 2020

No description provided.

@purew purew force-pushed the fixes-live-traffic-underflow branch from aa21816 to b6f06c7 Compare April 16, 2020 21:52
@mandeepsandhu
Copy link
Contributor

No description provided.

Can you add some notes on what the problem was and how this change fixes it? Are there tests we can add to verify the change (I understand that its already been tested in prod, but it will be good to have some unit tests, if possible)?

@kevinkreiser
Copy link
Member

@mandeepsandhu i can. basically when we do costing we compute the time it took to traverse an edge. to do so we need to do m/(m/s) to get s. instead of doing the calculation like that what we do is we compute a look up table of m/s to s/m (the reciprocal). this allows us to use multiplication, which is a cheeper operation than division (at least historically speaking and as far as i'm aware). anyway the bounds of that lookup table was set by our maximum allowed speed value, 140kph. the problem though is that now that we have real time data in the mix (and its maximum speed is 256kph, since its 8bits wide) its possible to send a speed value that accesses that table at a memory location that is past the end.

@@ -314,7 +315,7 @@ class AutoCost : public DynamicCost {
// We expose it within the source file for testing purposes
public:
VehicleType type_; // Vehicle type: car (default), motorcycle, etc
float speedfactor_[kMaxSpeedKph + 1];
float speedfactor_[valhalla::baldr::traffic::MAX_TRAFFIC_SPEED_KPH + 1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do no other costings do this? not even motorcycle, motorscooter or truck?

@kevinkreiser
Copy link
Member

can anyone think of a way to write a unit test that exposes undefined behavior because of out of bounds access. i always struggle with this one. what is a good pattern for testing regressions of this type?

kevinkreiser
kevinkreiser previously approved these changes Apr 17, 2020
@danpat
Copy link
Member

danpat commented Apr 17, 2020

@kevinkreiser That's a good use for assert() and a test that only works in debug mode - then just write any test that'd break the assertion before the fix. I think GTest can catch asserts if we run tests in debug mode (they're not enabled in release mode).

If we made it an std::array or std::vector, I think adding -D_GLIBCXX_DEBUG see here also enables range checks for operator[] at least on libstdc++ in debug mode.

@kevinkreiser
Copy link
Member

@danpat that one did occur to me but it felt a bit like cheating TDD to put the assert in the function thats buggy haha. but i think you're right its the only way to easily unit test it without an extremely complicated test setup. ah and a quick google shows that cmake already does the work of defining NDEBUG if we arent in a debug build.

@mandeepsandhu
Copy link
Contributor

@mandeepsandhu i can. basically when we do costing we compute the time it took to traverse an edge. to do so we need to do m/(m/s) to get s. instead of doing the calculation like that what we do is we compute a look up table of m/s to s/m (the reciprocal). this allows us to use multiplication, which is a cheeper operation than division (at least historically speaking and as far as i'm aware). anyway the bounds of that lookup table was set by our maximum allowed speed value, 140kph. the problem though is that now that we have real time data in the mix (and its maximum speed is 256kph, since its 8bits wide) its possible to send a speed value that accesses that table at a memory location that is past the end.

Nice catch! And thanks for the clear explanation. Would love to see it in the commit message for posterity. I think your assumption about division being slower than multiplication still holds true for modern processors.

Small anecdote: I've been bitten by a very similar lookup table bug before. The code was trying to calculate "square" of a video pixel value by using a lookup table. The lookup table had 2000 entries since, when it was created, max resolution of a video frame was 1920x1080. Then came 4K videos and this program started crashing (since it was trying to find squares beyond the lookup table's bounds). On measuring performance of lookup table we found that just calculating the square (x*x) was much faster than fetching data from main memory for the lookup table (and this was on a measly ~1000 DMIPS mips processor). I ended up just deleting the lookup table :)

@purew
Copy link
Contributor Author

purew commented Apr 17, 2020

do no other costings do this? not even motorcycle, motorscooter or truck?

See new code that renames the constants a bit. I'm trying to clarify the meaning of the various constants.

Would love to see it in the commit message for posterity.

Agreed, commit amended.

If we made it an std::array or std::vector, I think adding -D_GLIBCXX_DEBUG see here also enables range checks for operator[] at least on libstdc++ in debug mode.

This is a good idea but it seems difficult in practice. All dependencies seem to require being built with GLIBCXX_DEBUG as well: https://circleci.com/gh/valhalla/valhalla/21431 The reason is the following sentence from the manual:

Note that this flag changes the sizes and behavior of standard class templates such as std::vector, and therefore you can only link code compiled with debug mode and code compiled without debug mode if no instantiation of a container is passed between the two translation units.

I'm still working on a unit-test.

@@ -270,7 +270,7 @@ class AutoCost : public DynamicCost {
* estimate is less than the least possible time along roads.
*/
virtual float AStarCostFactor() const {
return speedfactor_[kMaxSpeedKph];
return speedfactor_[kMaxAssumedSpeed];
Copy link
Contributor Author

@purew purew Apr 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks the assumptions of the heuristic. But changing it to use the new kMaxSpeedKph would change existing routes without traffic I believe...

EDIT: Err, the larger speed would lead to a potentially larger expansion of the graph before finding a route.

@purew purew force-pushed the fixes-live-traffic-underflow branch 2 times, most recently from cfc0c44 to cf3ca96 Compare April 17, 2020 16:02
@purew purew added this to the Sprint 8 - Left Turn milestone Apr 17, 2020
@purew purew force-pushed the fixes-live-traffic-underflow branch from 0f8f9f2 to a3b4fff Compare April 17, 2020 20:42
@purew
Copy link
Contributor Author

purew commented Apr 17, 2020

Ok, I just force pushed two commits.

The first commit adds a test that inserts a high traffic speed and thus the test fails.

The second commit fixes the out-of-bounds and thereby the test.

The apparent underflow happened because of the use of the precomputed
division tables in the costing functions. They were pre-computed using
the assumed max speed of 140kph.

Thus, when faced with speeds larger than that from traffic,
out-of-bounds access occured and garbage data was read.

This change tweaks the max speed to account for higher speeds in traffic
data.

Checks speedfactor out-of-bounds in debug with assert
@purew purew force-pushed the fixes-live-traffic-underflow branch from b3752f4 to 43ca54d Compare April 20, 2020 13:35
src/sif/autocost.cc Outdated Show resolved Hide resolved
@@ -7,6 +7,7 @@
#include "baldr/nodeinfo.h"
#include "midgard/constants.h"
#include "midgard/util.h"
#include <assert.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@@ -31,6 +32,9 @@ constexpr uint16_t INVALID_SPEED_AGE_BUCKET = 0;
constexpr uint16_t MAX_SPEED_AGE_BUCKET = 15;
constexpr uint16_t SPEED_AGE_BUCKET_SIZE = 2; // 2 minutes per bucket

// Traffic speeds are encoded as 8 bits in `Speed` below
constexpr uint32_t MAX_TRAFFIC_SPEED_KPH = 255;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be defined in graphconstants and used from there though right? its a bit annoying that we have the value in two places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I was struggling with this. I wanted to avoid pulling in the graphconstants initially in order to avoid keep the C-interface a single file and instead settled with the static assert that the two were the same value.


// Assert these constants are the same
// (We want to avoid including this file in graphconstants.h)
static_assert(MAX_TRAFFIC_SPEED_KPH == valhalla::baldr::kMaxTrafficSpeed,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps there was another reason why you didnt want to use the value from graphconstants? i was kind of thinking a static assert that made sure the value wasnt above 1<<8 - 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@purew
Copy link
Contributor Author

purew commented Apr 21, 2020

Ugh, seems windows has a special C++

error C2039: 'max': is not a member of 'std'

@kevinkreiser
Copy link
Member

@purew that usually just means that we are relying on some header that got included by another header we explicitly included. we probably just need to do an explicit include of algorithm inside of whatever source file that is happening in

@kevinkreiser kevinkreiser merged commit a4e0db5 into master Apr 22, 2020
@kevinkreiser kevinkreiser deleted the fixes-live-traffic-underflow branch August 19, 2020 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants