Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Smaller PBF Responses #3507

Merged
merged 21 commits into from
Jan 14, 2022
Merged

Allow Smaller PBF Responses #3507

merged 21 commits into from
Jan 14, 2022

Conversation

kevinkreiser
Copy link
Member

@kevinkreiser kevinkreiser commented Jan 12, 2022

NOTE: this pr is pointed at the other in progress PR #3506 once that is merged ill repoint this one at master

This PR tackles another one of the open items on the initial work we did to allow for for pbf interaction with the service namely this one:

Slim Responses: The Apiobject has lots of stuff the requester may not want, so we should allow for requesting only specific subfields to be returned (this will be the most efficient means of interaction).

The approach I took here was to add a small submessage to options proto which allows you to select which top level fields in the api protobuf message will be returned to the user. so if you are doing a route-like action you can say i just want directions to come back. this should make the response quite a lot slimmer. i'll get numbers on then shortly.

Ok I did a quick test with a medium sized route (Hummelstaedel to Altoona @dgearhart) and the size differences are as follows:

-rw-r--r--  1 kkreiser kkreiser 173678 Jan 13 10:50 pbf_fat
-rw-r--r--  1 kkreiser kkreiser  27897 Jan 13 10:51 json_output
-rw-r--r--  1 kkreiser kkreiser  20918 Jan 13 10:53 pbf_slim

So you'll see without slimming the pbf (ie keeping in the trip object) its about 625% larger than the json output, but when you slim it down to just the directions object you get a result that is 25% smaller and presumably way faster to parse.

CHANGELOG.md Outdated
* CHANGED: Moved all protos to proto3 for internal request/response handling [#3457](https://github.com/valhalla/valhalla/pull/3457)
* CHANGED: Allow up to 32 outgoing link edges on a node when reclassifying links [#3483](https://github.com/valhalla/valhalla/pull/3483)
* CHANGED: Reuse sample::get implementation [#3471](https://github.com/valhalla/valhalla/pull/3471)
* ADDED: Beta support for interacting with the http/bindings/library via serialized and pbf objects respectively [#3464](https://github.com/valhalla/valhalla/pull/3464)
* CHANGED: Update xcode to 12.4.0 [#3492](https://github.com/valhalla/valhalla/pull/3492)
* ADDED: Add JSON generator to conan [#3493](https://github.com/valhalla/valhalla/pull/3493)
* CHANGED: top_speed option: ignore live speed for speed based penalties [#3460](https://github.com/v alhalla/valhalla/pull/3460)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change log entry was out of order so i moved it down to where it belongs

@@ -23,7 +23,7 @@ To use protobuf as a request/input to the HTTP API you need to do two things:
* Send the proper HTTP header to signal a protobuf payload. The header should be: `Content-Type: application/x-protobuf`
* Send protocol buffer's serialized bytes as the body of the HTTP request

The message we use for the entire transaction is the `Api` message, whose definition you can find [here](../../proto/api.proto). All of the request parameters should be filled out via the `Options` message attached to the `Api` message. Most importantly, you will want to set your `format` to `pbf` and your `action` to the relevant API you are calling (though the HTTP request path also provides the latter). The rest of the request options depend on which API you are calling. For more information about what and which options to set for a given API please read that APIs specific docs regarding its request options.
The message we use for the entire transaction is the `Api` message, whose definition you can find [here](../../proto/api.proto). All of the request parameters should be filled out via the `Options` message attached to the `Api` message. Most importantly, you will want to set your `format` to `pbf` and your `action` to the relevant API you are calling (though the HTTP request path also provides the latter). The `options` object also contains a subobject named `pbf_field_selector` which can be used to turn on/off the top level fields in the response. For example, if you only want the `directions` part of the protobuf response to be present (much smaller payload) then turn on only that flag in the field selector. The rest of the request options depend on which API you are calling. For more information about what and which options to set for a given API please read that APIs specific docs regarding its request options.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a quick blerb about how to select only the parts of the response you want to have returned to you rather than the whole thing (which is the default)

@@ -43,4 +43,5 @@ message Info {
repeated Statistic statistics = 1; // stats that we collect during request processing
repeated CodedDescription errors = 2; // errors that occured during request processing
repeated CodedDescription warnings = 3; // warnings that occured during request processing
bool is_service = 4; // was this a service request/response rather than a direct call to the library
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this just marks whether the request was being handled by the server or by a direct library call, which is important to protobuf serialization because the former needs to know some of the options to properly handle serialization so they cant be omitted at certain parts of the processing

Comment on lines +49 to +52
bool options = 1;
bool trip = 2; // /trace_attributes
bool directions = 3; // /route /trace_route /optimized_route /centroid
bool status = 4; // /status
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are the parts of the response you can turn on and off. by default they are all on (ie if you omit setting this option at all in the larger options object.

return request.SerializeAsString();
return serializePbf(request);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'll see this in a couple places. basically because pbf serialization had to get a bit more complicated i needed to throw it into a separate function

@@ -156,6 +156,45 @@ void openlr(const valhalla::Api& api, int route_index, rapidjson::writer_wrapper
}
writer.end_array();
}

std::string serializePbf(Api& request) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the new function for serailizing pbf. basically the only tricky part is, if its a service request, we need to keep the options bits attached to the object until we are done processing the request, so we swap it out before we serialize rather than actually clearing it and then we swap it back in after we have the bytes we want to return.

@@ -1123,9 +1123,18 @@ std::string serialize_error(const valhalla_exception_t& exception, Api& request)
err_stat->set_value(1);
err_stat->set_type(count);

// pbf format output
// pbf format output, we only send back the info with errors in it
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah so we also have to handle errors properly. in this case i think the best signal to whether or not there is an error is to send back nothing but the error info. it is true that its possible to send back partial results but i thought it was better to send back only the error. again we have the same trick for working around the difference between service requests and direct library calls regarding having the options object present after seralization even if it was not requested to be included in the response

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, partial results are tricky..

api.Clear();
api.mutable_info()->set_is_service(true);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the http service path we immedately mark the request as being handled by the http service

EXPECT_EQ(actual_pbf.trip().SerializeAsString(), expected_pbf.trip().SerializeAsString());
EXPECT_TRUE(actual_pbf.has_options());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first i extended the existing testing to check that in the default case all the fields come back, then i made sure that the setting of the field selector works by selecting only the trip part and checking that only it came back (except info which is non optional and except for the /status service which doesnt have a trip associated with it)

@@ -91,6 +105,10 @@ TEST(pbf_api, pbf_error) {
auto pbf_bytes = serialize_error(e, api);
Api actual;
EXPECT_TRUE(actual.ParseFromString(pbf_bytes));
EXPECT_FALSE(actual.has_options());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here i just make sure that you only get back the info object with the error message and nothign else when an error occurs

@@ -91,7 +91,7 @@ std::string serializeTransitAvailable(const Api& request,
* @param results The vector of trip paths and match results for each match found
*/
std::string serializeTraceAttributes(
const Api& request,
Api& request,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because these serializers now call the new serializer below which modifies the api object, the const could no longer be used

merkispavel and others added 3 commits January 12, 2022 16:44
* Not use live/predicted speeds for speed penalties

* Drop only live speeds from top_speed penalties

* Apply top speed changes for TaxiCost

* remove unnecessary dynamicflowmask anymore

* Test that live speed is disabled to top_speed option

* Update CHANGELOG.md

* Drop newly added flow constant, use bitwise negation inplace

The compiler is expected to handle any constant operations so there's no need to bring new variables

* Apply top_speed changes to other costings

* Fix closure test: use common expected route for motorscooter

* Refactor: extract speed based penalty calculation
* Use roads under construction

 * updated lua code to include roads under construction;
 * added a new config option to include or exclude roads
   under construction from valhalla graph;
 * exclude roads under construction from routing;
 * exclude roads under construction from shortcuts building.

* Added tests for roads under construction

* Update changelog

* Update taginfo.json

* Add 'hov' costing to test constructions

* Put 'construction' use above the transit uses

* Turn off access for roads under construction

* Remove needless 's' from the option 'include_constructions'

* Turn off access for construction in lua;

 - exclude construction from density calculation;
 - do not use construction as an intersecting edge;

* Exclude roads under construction from stopimpact calculation

* Fix isochrone test

* Update changelog
Base automatically changed from kk_pbf_refactor to master January 12, 2022 23:14
@kevinkreiser kevinkreiser changed the base branch from master to 1lessalgo January 12, 2022 23:15
@kevinkreiser kevinkreiser changed the base branch from 1lessalgo to master January 12, 2022 23:15
@kevinkreiser kevinkreiser changed the base branch from master to 1lessalgo January 12, 2022 23:23
@kevinkreiser kevinkreiser changed the base branch from 1lessalgo to master January 12, 2022 23:23
@kevinkreiser kevinkreiser changed the base branch from master to cleanup January 12, 2022 23:24
@kevinkreiser kevinkreiser changed the base branch from cleanup to master January 12, 2022 23:24
@@ -47,15 +47,16 @@
* ADDED: isochrone action for /expansion endpoint to track dijkstra expansion [#3215](https://github.com/valhalla/valhalla/pull/3215)
* CHANGED: remove boost from dependencies and add conan as prep for #3346 [#3459](https://github.com/valhalla/valhalla/pull/3459)
* CHANGED: Remove boost.program_options in favor of cxxopts header-only lib and use conan to install header-only boost. [#3346](https://github.com/valhalla/valhalla/pull/3346)
* CHANGED: top_speed option: ignore live speed for speed based penalties [#3460)(https://github.com/valhalla/valhalla/pull/3460)
Copy link
Member Author

@kevinkreiser kevinkreiser Jan 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was at the wrong spot so i moved it to where it lands chronologically, also there was a typo in it

@@ -265,6 +265,7 @@ bool Options_Format_Enum_Parse(const std::string& format, Options::Format* f) {
{"json", Options::json},
{"gpx", Options::gpx},
{"osrm", Options::osrm},
{"pbf", Options::pbf},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed this bug in a previous pr. the result of not having this is that if you send a json request but you want a pbf response it would still send you json. pbf request with pbf response worked fine though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that worth a small test? (didn't check, maybe there is one now?)

nilsnolde
nilsnolde previously approved these changes Jan 14, 2022
Copy link
Member

@nilsnolde nilsnolde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm (well, apart from tiny fail on osx)

only thing I'm wondering: couldn't we infer from the action which pbf objects we return without the user needing to specify it? is it a clean mapping of action -> PBF object? e.g. I guess there's nothing useful in status or directions if it's a /trace_attributes call right?

EDIT: if that's the case, how about an auto member in the PbfFieldSelector (if possible defaulting to true)? then you'd still have the chance to get the entire thing for whatever reason, but opt-in instead of opt-out?

@@ -1123,9 +1123,18 @@ std::string serialize_error(const valhalla_exception_t& exception, Api& request)
err_stat->set_value(1);
err_stat->set_type(count);

// pbf format output
// pbf format output, we only send back the info with errors in it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, partial results are tricky..

@kevinkreiser
Copy link
Member Author

only thing I'm wondering: couldn't we infer from the action which pbf objects we return without the user needing to specify it

that was the other option that i considered but its very unflexible, here are some cases off the top of my head that you couldnt do if we did it that way:

  • what if you want to do a route but get back only the trip (ie all the metadata along the route only). that would not be possible by infering based on the action. yes you can get this type of information via trace_attributes, but map matching has big limitations compared to routing so you couldnt really get metadata for a route that way
  • another case you might be interested in is getting back the options object because it contains all the information about what edge candidates were found when doing the route
  • yet another option would be to do performance testing by getting back only the info object which contains the latencies for each part of the request

in any case i convinced myself that there was enough value in allowing the user to select it that i shouldnt select it for them. but i definitely did think about doing it the simplier way first

@nilsnolde
Copy link
Member

I'd argue that the majority of PBF requests will only want the most obvious, i.e. similar result as json. that's why I think design-wise it might more sense to opt-in to full responses due to the incurred size penalty (similar to e.g. opt-in verbose flag for /locate & /status). right now we'd have to opt-out.

not necessarily saying it should be done in this PR, could easily do a backwards-compatible follow-up PR (I think), but an auto approach defaulting to true while all others default to false could work no? if auto then infer from action, if any of the fields which are not inferred by the action are set, include those as well. if not you'd only get the inferred objects back. if auto is false and none of the others is set to true, raise (or silently put auto: true)?

@kevinkreiser
Copy link
Member Author

@nilsnolde its all beta at the moment so we dont need to worry about compatiblity at this point.

why dont we make the defaults the obvious thing then route -> directions and so forth and let people opt in to more or less if they want it? would that work for you? im happy to add that to this pr, its a tiny change

@nilsnolde
Copy link
Member

right, same thing.. I'd favor that, if it's cool with you too, perfect, thanks:)

CHANGELOG.md Outdated
* ADDED: Add `include_construction` option into the config to include/exclude roads under construction from the graph [#3455](https://github.com/valhalla/valhalla/pull/3455)
* CHANGED: Refactor options protobuf for Location and Costing objects [#3506](https://github.com/valhalla/valhalla/pull/3506)
* ADDED: New options to control what fields of the pbf are returned when pbf format responses are requested [#3207](https://github.com/valhalla/valhalla/pull/3507)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad merge will fix

@nilsnolde nilsnolde self-requested a review January 14, 2022 16:50
@kevinkreiser kevinkreiser merged commit a670418 into master Jan 14, 2022
@kevinkreiser kevinkreiser deleted the kk_slim_pbf branch January 14, 2022 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants