Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pass GTFS routing test #3988

Merged
merged 37 commits into from
Mar 10, 2023
Merged

pass GTFS routing test #3988

merged 37 commits into from
Mar 10, 2023

Conversation

nilsnolde
Copy link
Member

@nilsnolde nilsnolde commented Feb 24, 2023

Mostly fixes some erroneous stuff in the tests and some time related things to finally pass the transit routing test!

Things I noticed which need to be fixed or at least looked at at some point:

  • add a proper method to TimeInfo to advance the second_of_week properly with transfer cost in multimodal.cc
  • have timeinfo know about the pivot date (probably have a singleton pull it once on first use) so we can serve day
  • default a node's timezone to the agency's as most times stops don't have timezones specified (I guess mostly when the feed/agency is operating in a single timezone, so the vast majority) too much work without much benefit. let's see how it goes without overriding a stop's (mostly empty) timezone info with the agency's (required field) and fall back to our own timezone detection. there could be a rogue feed where that breaks, but I wouldn't expect so.
  • check ingest_transit.cc::add_stop_pair_shapes() again: looks like it's always returning the full shape length as dist_traveled for a stop_pair, in fact it should be the distance from the trip's shape origin to the current stop being passed to that function
  • adjust the test so time is not hard-coded but in some relation to the timepoint the test is running, otherwise it'll fail soon
  • generally test much more: start with the route response!
  • also more properly test isochrones to make sure we contain stations/stops, also with less time/dist in the request to check we also miss some stops
  • in both routing & isochrone multimodal algo we're pruning when we're exceeding the transfer_max_distance, however, we check with walking_distance which is not the transfer distance, it's collecting all walking parts. It's also unclear whether the costing option transit_transfer_max_distance is actually the total transfer distance or the max for each transfer. I'd assume the latter. If it's the former (as the code suggests), I think we'd need a new EdgeLabel member to keep track of this during the expansion.
  • in ingest_transit we don't assign a shape_id to a stop_pair. however, if there's shapes.txt, convert_transit would like to use that to set the right dist_traveled from the full shape, rather than our interpolated shapes for each transit edge
  • we currently assume that there'll be a shape associated with a trip. but that's rarely the case, often shapes.txt is missing. in ingest_traffic we should instead build shapes stuff in the pbf tiles from the stops. actually we do exactly that in convert_transit
  • obviously refactor the entire convert_transit.cc::AddToGraph thing, at the minimum it should remove all the crazy variable shadowing
  • make kTransitConnection edges (i.e. osm<->transit connection edges) visible for loki::Search, currently they're missing in the bins; also investigate why they're not being seen for node snaps
  • review what it's doing with the different frequencies.txt, not sure if that's proper, esp if there's frequencies.txt together with the exact stop_times.
  • frequencies.txt and how it's handled currently: if a stop_pair's trip has a frequency, we prefer that and don't create an exact departure for that stop_pair, which is totally fine. However, we assume that stop_pair's origin_departure_time is when that stop_pair's departure is and I don't think that's really how it's meant to be. From what I understand (but need to reason over real feeds): if there's an entry in frequencies.txt for a trip, commonly there'll one full sequence of all stops in stop_times.txt as an example to show how stops are timewise connected, but doesn't necessarily represent the start_time for that frequency. That should still come from the frequencies.txt entry, so we need to add it to the PBF definition. We should also add the exact_times field: if true, the frequency's start_time is the exact start time for the frequency and just rolls with one headway after the other. However, when false, then we can pick an approximate departure_time within the algorithm. I think it's fairly trivial to support both and just needs to extend a TransitDeparture::type_ to be an enum of 3 values (there 4 spare bits, so no problem IMO).
  • currently we only test schedule-based departures: if a frequency is defined we prefer that over exact stop_times, as we should IMO, but currently both test trips have a frequency assigned, need to un-assign one of them and do another routing test for the other trip's valid time range.
  • see if we can't come up with a way to limit the expansion of all those transit-related edges a bit more. Problem is they're so irrelevant, they hardly contribute cost, so they're encountered during the expansion super frequently but never settled.
  • similar to above: de-duplicate stations & egresses. it seems that it's quite common that providers only include stops in stops.txt without station/egress, even for train stations. which essentially means, in a train station with 10 platforms, we create 10 fake stations & egresses, each with their own set of directed edges & OSM connections
  • do we need a second pass for multimodal? not sure yet, what would be sensible to relax there, don't think there's anything?
  • transit schedules are only valid for 60 days after the transit tiles were created, why? There's a 64 bit mask, seems like the reason on first glance. (reasoning was that tilesets should be updated more often than 60 days and lots of feeds have validity ranges less than that) So meaning, transit tiles will have to be regenerated in case the feed is valid for longer than that (which I guess is quite common..).
  • implement transfers.txt for proper transfer times. Currently it allows for 30 seconds, otherwise it'll wait for the next departure.
  • implement pathways.txt maybe, though super low priority, apparently it's hardly ever there and if it is, it's not really correct
  • move the allowed check for continuing expansion based on amount of pedestrian distance walked back into costing. at the moment we pulled it out into the algorithms but now that we have it tracked separately from path distance in the edge label we can indeed check it in the costing again

Comment on lines +427 to +430
if (!trip_calendar || !service_dow->size()) {
LOG_WARN("Service ID " + currTrip->service_id + " has no calendar entry, skip.");
continue;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tentative: don't think we need to care about this stop pair when there's no valid days in the week. but gotta validate that this is fine.

src/thor/multimodal.cc Outdated Show resolved Hide resolved
test/gurka/test_gtfs.cc Outdated Show resolved Hide resolved
test/gurka/test_gtfs.cc Outdated Show resolved Hide resolved
@nilsnolde
Copy link
Member Author

BTW @kevinkreiser , I'm also happy to continue working on this branch until it's done. Shouldn't be much more. In case you wanna have a first look, feel free.

@nilsnolde
Copy link
Member Author

Conan being a PITA again..

@@ -754,26 +755,21 @@ void AddToGraph(GraphTileBuilder& tilebuilder_transit,
tilebuilder_transit.AddEdgeInfo(0, station_graphid, egress_graphid, 0, 0, 0, 0, shape,
names, tagged_values, pronunciations, 0, added);
directededge.set_edgeinfo_offset(edge_info_offset);
directededge.set_forward(true);
directededge.set_forward(false);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the set_forward flags I edited so that opposing edges also point in opposing directions. Somewhat irrelevant as all these are intra-station edges which (currently) don't even have a shape..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here it's from station -> egress, which seems intuitive to have backwards

@@ -886,7 +883,7 @@ void AddToGraph(GraphTileBuilder& tilebuilder_transit,
names, tagged_values, pronunciations, 0, added);

directededge.set_edgeinfo_offset(edge_info_offset);
directededge.set_forward(true);
directededge.set_forward(false);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here it's from platform -> station, which seems intuitive to have backwards

Comment on lines -593 to -606
auto now = time(nullptr);
auto* utc = gmtime(&now);
utc->tm_year += 1900;
++utc->tm_mon; // TODO: use timezone code?

auto database = pt.get_optional<std::string>("mjolnir.timezone");
auto path = pt.get<std::string>("mjolnir.transit_feeds_dir");
// Initialize the tz DB (if it exists)
sqlite3* tz_db_handle = database ? GetDBHandle(*database) : nullptr;
if (!database) {
LOG_WARN("Time zone db not found. Not saving time zone information from db.");
} else if (!tz_db_handle) {
LOG_WARN("Time zone db " + *database + " not found. Not saving time zone information from db.");
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need that here.

Comment on lines -650 to -651
if (!IsAccessible(edge) || (!pred.deadend() && pred.opp_local_idx() == edge->localedgeidx()) ||
(edge->surface() > minimal_allowed_surface_) || edge->is_shortcut() ||
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uturn detection was duplicated, I removed the redundant one. we need to check if the predecessor wasn't a transit edge.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment for the future

Comment on lines +683 to +684
if (!IsAccessible(opp_edge) || (opp_edge->surface() > minimal_allowed_surface_) ||
opp_edge->is_shortcut() || IsUserAvoidEdge(opp_edgeid) ||
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@@ -44,8 +44,8 @@ constexpr uint32_t kInitialEdgeLabelCount = 200000;
MultiModalPathAlgorithm::MultiModalPathAlgorithm(const boost::property_tree::ptree& config)
: PathAlgorithm(config.get<uint32_t>("max_reserved_labels_count", kInitialEdgeLabelCount),
config.get<bool>("clear_reserved_memory", false)),
walking_distance_(0), max_label_count_(std::numeric_limits<uint32_t>::max()),
mode_(travel_mode_t::kPedestrian), travel_type_(0) {
max_label_count_(std::numeric_limits<uint32_t>::max()), mode_(travel_mode_t::kPedestrian),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

walking_distance_ was only used in ExpandForward, similar to new_cost, no need to have it as a class member, was kinda confusing at first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we put it now in the MMEdgeLabel

Comment on lines 290 to 292
// TODO: this needs to update second_of_week: careful, needs to wrap around the
// end of the week and we also need to update the day_ and dow_ mask in that case
// maybe make it a TimeInfo function for convenience
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not right yet. I left a TODO in the main description of the PR. we're not looking at the TimeInfo.local_time because that's already corrected for timezone. we want the time without timezone. eventually it needs a new method in TimeInfo IMO

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use the TimeInfo::forward method here too

@gknisely
Copy link
Member

gknisely commented Mar 7, 2023

@nilsnolde @kevinkreiser I can review and discuss on Thursday. Let me know if that works.

std::unordered_map<feed_object_t, GraphId> write_stops(Transit& tile,
const tile_transit_info_t& tile_info) {
std::unordered_map<feed_object_t, GraphId>
write_stops(Transit& tile, const tile_transit_info_t& tile_info, feed_cache_t feeds) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take the reference of feeds

src/thor/triplegbuilder.cc Outdated Show resolved Hide resolved
@kevinkreiser
Copy link
Member

@gknisely we just reviewed. from my side its good. the changes are actually pretty small there are just a lot of renames that really make it difficult to review. we could make a summary of the changes at a high level, that might help just for posterity? i'll let it up to @nilsnolde if he wants to wait until thursday for further review

@nilsnolde
Copy link
Member Author

I can summarize tmrw, just had a way too long review-addressing session before realizing my mistake.. I'm ok with thursday @kevinkreiser @gknisely

@@ -685,7 +698,7 @@ void stitch_tiles(const boost::property_tree::ptree& pt,
do {

// open tile make a hash of missing stop to invalid graphid
auto tile = read_pbf(current_path, lock);
auto tile = read_pbf(current_path);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check again..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, this one is fine to not lock @kevinkreiser , just couldn't think of it in the rush of the review.

this is the tile this thread will process and no other thread will ever write to (only maybe read); the current thread will write this tile only. there's another read_pbf further down, which needs locking (where it's trying to find missing nodes in other tiles) and that one is locked.

src/thor/triplegbuilder.cc Outdated Show resolved Hide resolved
Copy link
Member

@kevinkreiser kevinkreiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing i regret about our pairing up to do code review is that i dont write any of my review comments in the pr, i just give them to you verbally. so i think it might be harder for people to follow along. i know you usually put a note though while we are reviewing. i think i'll try harder to actually write my review comments in the review while we are going over it.

the reason i say this is because we have reviewed this code together at least 3 or 4 times, and now im putting a ship on it but it looks like i didnt even review it once. i need to improve my workflow here. bear with me 😄

@nilsnolde
Copy link
Member Author

haha yeah, maybe it's more explicit if you write the comments during review.

@nilsnolde nilsnolde merged commit 7bf7cde into master Mar 10, 2023
@nilsnolde nilsnolde deleted the nn-fix-gtfs-route-test branch March 10, 2023 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants