Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing A Centroid/Converge/Rendezvous/Meet API #2734

Merged
merged 28 commits into from
Jan 25, 2021
Merged

Conversation

kevinkreiser
Copy link
Member

@kevinkreiser kevinkreiser commented Dec 14, 2020

image

What is this?

This past weekend I was thinking about a problem. That problem was specifically:

Given a bunch of people at different locations, what is the optimal location for those people to meet?

My first thoughts on such an algorithm were essentially that classical algorithms won't work because:

  • They require the destination to be known and they use this as the stopping criterion but here we only know the origin locations. Isochrones is an exception but it uses a maximum distance/time as its stopping criterion
  • We need to track multiple independent paths concurrently. We do this with the matrix but its very inefficient

My second thought was well what even is a "best" meeting place?

  • In the plane, or geometrically speaking, I immediately thought of the centroid, as it's the point that minimizes the sum of distances to all input points
  • Here our metric is actually path cost rather than distance. That is, we want to find a location from all input locations, for which the sum of all paths' costs to that location is minimal

Once I had formulated the problem properly it was time to consider how the algorithm should work. For that I came up with a few requirements:

  • Convergence happens when all origins find their shortests paths (settle) to the same edge in the graph
  • We need an efficient way to do path multiplexing rather than relying on tracking all of the paths in their own different queues
  • We need an API over HTTP
  • We need a demo UI

Problem 1: Efficient Path Multiplexing

The existing implementation of the priority queue (combo of edgelabel, edgestatus and double bucket queue) doesn't currently support tracking paths from multiple locations at once. In bidirectional a* and in cost matrix for example, where we need this kind of implementation, we instatiate two queues for each origin/destination pair. This has some drawbacks:

  • This means that allocations happen across all copies of these data structures
  • The algorithms have to jump around in memory as we access each separate copy

So I wanted to see if I could find a way to mark entries in the queue in such a way that I could tell which path expansion (location) they came from and differentiate them. Thus using a single queue for many path expansions at the same time (multiplexing).

To do this I was able to mark both the edge labels in the labelset and the edge statuses (index into the labelset) with a path id/index/color to differentiate which location a particular path was tracking. I was able to use the 7 spare bits in the tile id of the edge status and free up some bits in the edge label. Double bucket queue didn't need any changes because it works with labelset indices directly.

After those changes everything worked as normal without any changes to the other algorithms since the path id/index/color is optional. But what I want to try out is to switch to using just one queue in bidirectional a* to see if I can get a performance boost from not having to jump back and forth to 2 different memory locations. I'm considering PRing just this change separately but I'll get to that a bit later!

Problem 2: Core Algorithm

The algorithm itself is quite simple. Remember that we need to come up with a destination, which means we can't use any of the directed search algorithms (a* et al) that rely on knowing the destination. It would be nice if we could because they have better performance than Dijkstra's but the way they get that performance is precisely by using the goal heuristic to coach the path expansion toward the goal. In our case we cannot pick a goal that would generate an admissable heuristic which means thats off the table for us. But there are other tricks we can do to cull the search space. I'll get to those in the future work section.

So it's dijsktra's for us. No problem. Remember what the main objective is for convergence, we need to return the first edge in the graph to which all locations have found a shortest path. What that means is that, as we pop edges off the queue (ie. find shortest path from an origin to that edge) we need to track which other edges also found paths to this edge. To do that I created a small struct which holds 2 64bit masks. Each bit in the mask represents an origin that has either found or not found its shortest path to this edge. When I get the callback from dijsktras that we've settled an edge, I check which path/origin it was for and I flip that bit on my tracking struct. If that bit I just flipped was the last one to need flipping, then we have converged. Once that happens dijkstras stopps and we call FormPaths, which recoveres the edges of the path for each individual origin location (looks like an alternate routes result, ie has multiple routes in the output).

Problem 3: HTTP API

More cool stuff to elaborate on here but the great news is that the input to the API already looks exactly like a normal /route request in that you need 2 or more locations. So I quickly added a new action to the request called centroid and focused on the output. The good news there is that output looks exactly like a route with alternates=n except now n is the total number of input locations. Another thing I did here was modify the valhalla route serializer to support alternates (THANK GOD). The current implementation has something like {"trip":{... your route here ...}}. This made it quite clunky to add alternates but I found a reasonable way: {"trip":{... your route here ...}, "alternates":[{... your alternate here ...}, ...]}. I can PR this separately as well.

Demo:

I cracked open vim and quickly hacked together a leaflet demo to show off this API. You simply click the locations on the map you want to use as your input locations and then press the button at the bottom to fire off the request. The green dots are origins and the red dot is the destination. PR in the demos repo is here: valhalla/demos#234

Future Work:

There are a number of things to do to make this work practical and useful. Some of which make sense to add to the API, some of which should be saved for other projects that make use of it. I'll list off a few:

Things that make sense to add:

  • We can make the process more efficient in the general case by pruning the dijkstras expansion when it leaves some bounding box of the input locations. In extreme cases this could cause failure of the path finding however we could fall back to no bounding box on a retry.
  • Allow caller to specify a max road class to allow a meeting on. You dont want to meet someone on a limited access highway, that could be rough 😄
  • If we let the algorithm run a bit longer we could find multiple meeting locations. We'd have to come up with criteria for acceptance of alternate meeting locations but a quick first one could be distance based: no alternate meeting locations within x meters of each other as they are too similar.
  • A way to penalize areas in the graph that you wouldnt want to meet at. We have hard avoids which would work but maybe soft avoids are better?

Things that make sense for users of the API to add themselves:

  • Seems like the results of this API could be intersected with a POI database and that could be used to give back more relevant results. Like if you knew a person wanted to eat and the lowest cost meeting place had no restaurants nearby but the first alternate had 10 restaurants then maybe you suggest the alternate

Apart from that there are a lot of nice TODOs listed in the code which I'm not really worried about tackling in this iteration. I think its quite alright to offer this service as a fun little beta API for people to try out!

Unit Testing

This PR still needs unit tests, I'm working on those.

@@ -276,7 +276,7 @@ static void BM_Sif_Allowed(benchmark::State& state) {
// auto pred = sif::EdgeLabel(0, tgt_edge_id, edge, costs, 1.0, 1.0,
// sif::TravelMode::kDrive,10,sif::Cost());
auto pred = sif::EdgeLabel();
int restriction_idx;
uint8_t restriction_idx;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'll see changes similar to this throughout, basically we didnt need 2^32 values to represent the restriction index for a restriction at a particular edge. if we ever find an edge that even has 256 restrictions i'll eat my hat 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious what the benefit is vs. the potential for overflow (though rare do you now have to do some bounds checking anywhere?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah inside the function that actually checks for restrictions (in dynamic cost) if the index is larger than 254 we cant tell what restriction was there in the route. note that this doesnt mean that we wont adhere to the restriction, it just means that the serializer doesnt know that a restriction was there.

* @param reader provides access to graph primitives
* @return the constructed location
*/
valhalla::Location make_centroid(const valhalla::baldr::GraphId& edge_id,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use an edge to make a destination location for the route

namespace thor {

// constructor
PathIntersection::PathIntersection(uint64_t edge_id, uint64_t opp_id, uint8_t location_count)
Copy link
Member Author

@kevinkreiser kevinkreiser Dec 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class tracks potential convergence points. it uses masks to figure out which locations have found shortest paths to this edge. i had actually forgotten that i dont need to track both the edge id as well as its opposing. since we only use the smallest of the 2 edge ids to track that particular meeting point. so this is another TODO, we can remove the opp_id and save some ram. <-- Done!


// this is fired when the edge in the label has been settled (shortest path found) so we need to check
// our intersections and add or update them
thor::ExpansionRecommendation Centroid::ShouldExpand(baldr::GraphReader& reader,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the meat of the algorithm. here we get informed that a specific location settled a specific edge, we check if we are already tracking it and we flip the bit corresponding to the input location that settled it. if we flipped the last outstanding bit in the mask, then its over and we found a least cost convergence point

// walk edge labels to form paths for each location to the centroid
template <typename label_container_t>
std::vector<std::vector<PathInfo>>
Centroid::FormPaths(const google::protobuf::RepeatedPtrField<valhalla::Location>& locations,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing interesting here, just loop over the locations and recover their paths individually

@@ -41,7 +41,8 @@ namespace thor {

// Default constructor
Dijkstras::Dijkstras()
: access_mode_(kAutoAccess), mode_(TravelMode::kDrive), adjacencylist_(nullptr) {
: access_mode_(kAutoAccess), mode_(TravelMode::kDrive), adjacencylist_(nullptr),
multipath_(false) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main changes to dijkstras are a boolean to flag we want to do path multiplexing (this means assigning an id to each locations initial edges in the labelset) as well as actually passing those to the different functions on the labelset/edgestatus

@@ -205,6 +205,37 @@ std::string thor_worker_t::expansion(Api& request) {
return rapidjson::to_string(dom, 5);
}

void thor_worker_t::centroid(Api& request) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the main place where new service/api related work was needed. here we do the same thing that a regular route request does but we call into the new algorithm and build a leg for each individual route that came out of it.

@@ -49,6 +49,16 @@ const std::unordered_map<std::string, float> kMaxDistances = {
constexpr float kDistanceScale = 10.f;
constexpr double kMilePerMeter = 0.000621371;

std::string serialize_to_pbf(Api& request) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved this into the anonymous namespace

@@ -142,7 +142,7 @@ thor_worker_t::work(const std::list<zmq::message_t>& job,
}
case Options::isochrone:
result = to_response(isochrones(request), info, request);
denominator = options.sources_size() * options.targets_size();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noticed this bug and fixed it

tyr::route_references(trip_json, api.trip().routes(0), api.options());
auto json = json::map({{"trip", trip_json}});
auto json = json::map({});
auto alternates = json::array({});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is where i added support for alternates in the valhalla route response format, pretty straight forward actually!

cost_(0, 0), sortcost_(0), distance_(0), transition_cost_(0, 0) {
origin_(0), toll_(0), not_thru_(0), deadend_(0), on_complex_rest_(0), path_id_(0),
restriction_idx_(0), cost_(0, 0), sortcost_(0), distance_(0), transition_cost_(0, 0) {
assert(path_id_ <= baldr::kMaxMultiPathId);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two main things i did here were add support for specifying a path id (optionally) and made restriction index manditory.

* @param set Label set for this directed edge.
* @param index Index of the edge label.
* @param tile Graph tile of the directed edge.
* @param path_id Identifies which path the edge status belongs to when tracking multiple paths
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically all the functions here now have an optional path_id which gets bitwise or'd into the spare bits of the tile/level id so we can use the same edgestatus to track up to 127 paths

dnesbitt61
dnesbitt61 previously approved these changes Dec 15, 2020
@dnesbitt61
Copy link
Member

May want to add a maximum distance between locations as a protection against long running requests. One location in MD and one in CO took 90 seconds on my laptop. For now this seems to be a nice feature for short distances (which I would argue as a "meetup" most requests would likely be over short distances).

@nilsnolde
Copy link
Member

nilsnolde commented Dec 17, 2020

Just trying it out myself, love the idea. A little feedback from a user's POV:

  • rather style I guess: would be nice to find them all in alternates incl the first trip
  • name suggestion gravity? IMO centroid is a little too strongly rooted with geographic centroid and gravity comes closer to the concept that's actually bs.. it's rather centroid than gravity..

+1 for maximum distance config, took around 3-5 seconds (depending on options) for total 200 km of trips.

Couldn't help but quickly implement it in the QGIS plugin to try it out;) (also returns the meeting point with total distance/duration) You can install from here if you wanna try: https://qgisrepo.gis-ops.com

@kevinkreiser
Copy link
Member Author

TODO:

  1. service limits on distance and number of locations
  2. unit tests

@kevinkreiser
Copy link
Member Author

Since I initially PRd this a lot has changed in the repo. I've resolved all the conflicts but I did do two large quality of life things in the tests:

  1. i removed all inline configs from all tests and made them use a test::build_config method. this helps with the next time we make a configuration change, we only have to do it in one place instead of 50
  2. i collapsed all the main gurka route/match/locate functions into 2 generic functions to which you pass the action you want to call. this way we can add new apis to test easily rather than duplicatin the same boiler plate over and over

I also added a minimal centroid unit test.

@kevinkreiser kevinkreiser merged commit d7a4e78 into master Jan 25, 2021
Copy link
Member

@dgearhart dgearhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants