Recost shortcuts in bidastar: second approach #2711

genadz · 2020-12-07T10:49:03Z

Tasklist

Add tests
Add #fixes with the issue number that this PR addresses
Generally use squash merge to rebase and clean comments before merging
Update the changelog

Requirements / Relations

Link any requirements here. Other pull requests this PR is based on?

kevinkreiser · 2020-12-07T13:21:15Z

src/thor/bidirectional_astar.cc

-
-    // Special case code if the last edge of the forward path is the destination edge
-    // which means we need to worry about partial distance on the edge
-    if (edgelabels_reverse_[idx2].predecessor() == kInvalidLabel) {


i know its a bit heavy to use recosting to fix this problem, but it feels really good to be able to remove this anyway 😄

src/sif/recost.cc

src/thor/bidirectional_astar.cc

test/astar.cc

kevinkreiser · 2020-12-07T15:30:56Z

src/sif/recost.cc

    // grab the edge
    edge = reader.directededge(edge_id, tile);
    if (!edge) {
      throw std::runtime_error("Edge cannot be found");
    }

+    // re-derive uturns, would have been nice to return this but we dont know the next edge yet
+    if (label.opp_local_idx() == edge->localedgeidx())


why add a branch to this code path, we simply store the value of this comparison?

because by default we use deadend flag from the edge https://github.com/valhalla/valhalla/blob/master/valhalla/sif/edgelabel.h#L67

hm. from the other hand, if it's deadend, this condition will be always true

kevinkreiser · 2020-12-07T15:31:39Z

src/thor/alternates.cc

@@ -58,7 +58,8 @@ void filter_alternates_by_stretch(std::vector<CandidateConnection>& connections)

 // Limited Sharing. Compare duration of edge segments shared between optimal path and
 // candidate path. If they share more than kAtMostShared throw out this alternate.
-bool validate_alternate_by_sharing(GraphReader& graphreader,
+// Note that you should recover all shortcuts before call this function.
+bool validate_alternate_by_sharing(GraphReader& /*graphreader*/,


can you actually just remove the graphreader argument completely, now that we dont ahve to recover shortcuts its not used.

kevinkreiser

two things:

remove the branch in recosting
remove the graphreader arg in path sharing

kevinkreiser · 2020-12-07T22:36:34Z

well crap... I performance tested this and saw that it added approximately a 25% slowdown. assuming it must be the shortcut recovery. i did this little bit of work: #2714 after testing with that enabled on this branch it was just as slow 😄

so i suspect some things..

i disabled recovering shortcuts on this branch and it was still about 12% slower, which must mean that recosting is slower
since having a cache of recovered shortcuts didnt speed it up it must be a function of having so many edges in the route, so maybe its in triplegbuilder or other serializers.

i'll look a bit closer to figure out where the performance drop lies

kevinkreiser · 2020-12-08T05:03:39Z

Ok did a bit more digging. First I enabled the shortcut cache. Then I took the variable path_edges and made it a member variable path_egdes_ (dont forget to clear it in the clear function). Then I wrote a bash alias like so:

alias stats='R -q -e "x <- read.csv(\"stdin\", header = F); quantile(x[ ,1] , c(0, .5, .90, .99, 1)); sd(x[ , 1])"'

It seemed to me that the worst offenders in terms of routes were driving up my benchmarking scores to the point that i couldnt really make sense of it. Non insane routes were almost unaffected by this change. Anyway to prove out this thought I ran up my server with master. And did a run of RAD but with the json output format:

./run_with_server.py --test-file auto.txt --url http://localhost:8002/route --concurrency 24 --format json

then i did the same but ran this branch with my small changes added. then i compared the percentiles between the two:

master:

grep -F response_time 20201207_235004_auto/* | sed -e "s/.* //g" | stats
> x <- read.csv("stdin", header = F); quantile(x[ ,1] , c(0, .5, .90, .99, 1)); sd(x[ , 1])
         0%         50%         90%         99%        100% 
0.002395153 0.036434412 0.122441101 0.709217098 1.643116236 
[1] 0.1298356
> 
>

this branch with my slight modifications:

grep -F response_time 20201207_235232_auto/* | sed -e "s/.* //g" | stats
> x <- read.csv("stdin", header = F); quantile(x[ ,1] , c(0, .5, .90, .99, 1)); sd(x[ , 1])
         0%         50%         90%         99%        100% 
0.002358675 0.038644910 0.150891209 0.788010578 1.731230021 
[1] 0.1442929
> 
>

You can see that for the average case the performance is nearly unchanged where as the performance of the top 10 percent of the slowest routes is something like 10-15% worse. @genadz i tried to update your other branch where we dont recost the whole route but rather just the parts that need it as they are recovered. can you use the same testing method to see what kind of results you get?

genadz · 2020-12-08T13:03:25Z

on my machine I got the following results:

master (route not found for 76 request)

grep -F response_time res_master/* | sed -e "s/.* //g" | stats 
> x <- read.csv("stdin", header = F); quantile(x[ ,1] , c(0, .5, .90, .99, 1)); sd(x[ , 1])
         0%         50%         90%         99%        100% 
0.003356218 0.108471870 0.408698320 0.995351839 2.934854984 
[1] 0.2041199
> 
>

recost_shortcuts_new (second approach) (route not found for 126 request)

grep -F response_time res_recost/* | sed -e "s/.* //g" | stats 
> x <- read.csv("stdin", header = F); quantile(x[ ,1] , c(0, .5, .90, .99, 1)); sd(x[ , 1])
        0%        50%        90%        99%       100% 
0.00315094 0.13643813 0.55678535 1.43106830 3.76190114 
[1] 0.2874338
> 
>

recost_shortcuts (first approach) (route not found for 1183 request)

grep -F response_time res_first/* | sed -e "s/.* //g" | stats 
> x <- read.csv("stdin", header = F); quantile(x[ ,1] , c(0, .5, .90, .99, 1)); sd(x[ , 1])
         0%         50%         90%         99%        100% 
0.003399849 0.132326603 0.479046679 1.087140005 3.063050032 
[1] 0.2249207
> 
>

hm, that's interesting. despite the fact that times for the first approach are closer to master than second approach, I got approximately the same total time for both recosting branches (~25% slower).

kevinkreiser · 2020-12-08T13:31:28Z

first thing i would do is figure out which routes are failing and fix them in both branches so that at least correctnesswise the code is shippable. then we'll have to ponder about performance.

genadz · 2020-12-08T16:37:58Z

fixed first branch. it's ~15% slower than master

recost_shortcuts (first approach)

grep -F response_time res_first/* | sed -e "s/.* //g" | stats
> x <- read.csv("stdin", header = F); quantile(x[ ,1] , c(0, .5, .90, .99, 1)); sd(x[ , 1])
        0%        50%        90%        99%       100% 
0.00292778 0.07737732 0.49527001 1.44068127 5.46187663 
[1] 0.2961684
> 
>

master

grep -F response_time res_master/* | sed -e "s/.* //g" | stats
> x <- read.csv("stdin", header = F); quantile(x[ ,1] , c(0, .5, .90, .99, 1)); sd(x[ , 1])
         0%         50%         90%         99%        100% 
0.002770424 0.073756933 0.417046785 1.102108717 5.447159052 
[1] 0.2343167
> 
>

kevinkreiser · 2020-12-23T02:59:48Z

@genadz testing the latest code here i still see very large performance difference, in absolute terms it takes me about 23% more time to complete a 14k set of routes. also there are about 10% diffs in a RAD of the same route set. i would have expected some diffs but 10% is staggering. we should that we are seeing diffs we expect.

@dgearhart would you be able to take a look? the summary here is that the code now turns shortcut edges in the path into the list of underlying edges. it doesnt add the intersecting edges at the nodes that were previously non-existant.

at first i thought at sharp turns and stuff, the narrative might get an extra maneuver but without in the intersecting edges that seems highly unlikely (i am pretty sure it wont happen). this leads me to believe we are somehow getting different paths but i fail to see how that is possible considering the change to recover the shortcuts is after the path is found!

dgearhart · 2020-12-23T18:29:16Z

@kevinkreiser I can kick off in the background - i am assuming master vs this branch?

dgearhart · 2021-01-04T17:42:24Z

@dgearhart would you be able to take a look? the summary here is that the code now turns shortcut edges in the path into the list of underlying edges. it doesnt add the intersecting edges at the nodes that were previously non-existant.

@kevinkreiser
I am seeing some time diffs - seem to be okay

I am seeing diffs like the following:

This looks good - we now have turn lanes with the recost_shortcuts_new branch

Why are we not adding the intersecting edge info? That would help with some missing maneuvers

I see a 32% delta for user routes - i do not have time to review all of them but someone should make a good pass at reviewing diffs

kevinkreiser · 2021-01-04T18:06:39Z

@dgearhart i would love to add all the edges too but it turns out that costs a ton of CPU and therefore really makes the request slower. that would be the ultimate goal but the original goal of this work is just to be able to get the right costing/time for the edges along the path. i recently got my perf tools working in my IDE so i might have a look here. I'll say this, one of the big contributors is the state shield verbal regexs in odin. They alone are about 7% of the total request latency. I have an idea how to refactor them to make them less expensive but haven't worked on it yet.

genadz · 2021-01-04T18:51:17Z

added commit with the logic that doesn't reject edges in recost function based on nodes/edges access restrictions.
don't know why, but I have different performance results: on my local machine current branch ~4% slower than master.

kevinkreiser · 2021-01-04T18:54:19Z

@genadz ill test it again today maybe i didnt get the updates for some reason!

genadz · 2021-01-04T18:54:44Z

@genadz ill test it again today maybe i didnt get the updates for some reason!

thanks!

kevinkreiser · 2021-01-04T23:40:51Z

ok looking at the 50th percentile yeah its about 5%. oddly the 90th is something like 15% slower but the 99th and 100th are 5 and 0 respectively. frankly its hard to tell what i should understand from this result. i guess mostly the 50th percentile is the most important since the bulk of requests will be in this range. when i look at the total time to complete the benchmark though its about a 7.5% performance drop. i think if we merge the optimizations branch and do a couple more optimizations we'll be able to pay for this with those 😄

i personally dont think we have to wait to merge this before we merge those, does anyone else have an opinion? @purew or @danpat ?

branch:
         0%         50%         90%         99%        100% 
0.002226353 0.038036823 0.144977236 0.778483784 1.566832542 

master:
         0%         50%         90%         99%        100% 
0.002390385 0.036790848 0.125865579 0.713516281 1.550947189

kevinkreiser · 2021-01-04T23:50:09Z

src/sif/recost.cc

+        (!costing.Allowed(edge, label, tile, edge_id, localtime, offset_time.timezone_index,
+                          time_restrictions_TODO) &&
+         !ignore_access)) {


move the flag up in the if so that allowed doesnt get called at all

Suggested change

(!costing.Allowed(edge, label, tile, edge_id, localtime, offset_time.timezone_index,

time_restrictions_TODO) &&

!ignore_access)) {

!ignore_access && !costing.Allowed(edge, label, tile, edge_id, localtime, offset_time.timezone_index,

time_restrictions_TODO)) {

we can't do this because we need to evaluate time restrictions.

or we can explicitly call costing.EvaluateRestrictions in case ignore_access = true

the restrictions are based on access though as well, is there really a point of checking them? i guess im more thinking we should make the boolean called throw_if_not_allowed or something more generic like strict to just completely turn off checking allowed at all. maybe we can flip the meaning and say something like bool allow_all = false?

https://github.com/valhalla/valhalla/blob/master/test/astar.cc#L784 - this test fails if we skip Allowed here

hm. what if we back to the first approach where we recost only shortcuts ? - in this case we will save all time_restrictions that we calculated for regular edges.
and, when recosting shortcuts, we can 1) use flag to turn off checking Allowed or 2) don't use this flag but in case recosting fails - we just don't expand shortcut edge, add shortcut to the final path

i personally prefer this implementation as its a lot less complex (no splicing). you have a good point about showing the restrictions. id say just let it as it is and maybe make a note that we have to put the flag second so we can get restriction information.

kevinkreiser · 2021-01-20T14:21:18Z

@mandeepsandhu the main slowdown in odin is running over these regex's for every street name: https://github.com/valhalla/valhalla/blob/master/valhalla/baldr/verbal_text_formatter_us.h#L37

i think we can rewrite this to do our own form of matching that isnt as smart as regex to speed this up.

* use functor instead of vector to get next edge

kevinkreiser

thank you again so much for the very long path to getting here. i hope we can make a few more performance imporovements and then even remove the optimization for not adding intersecting edges.

genadz requested a review from kevinkreiser December 7, 2020 10:49

genadz added 3 commits December 7, 2020 13:51

Recost path edges in FormPath method of bidiastar

5e5b8ef

Add unittest for path recosting in bidiastar

cc27a3e

Hack deadend flag in recost function to allow u-turns

c2798a3

genadz force-pushed the recost_shortcuts_new branch from e5ef728 to c2798a3 Compare December 7, 2020 10:56

Update changelog

f269762

genadz mentioned this pull request Dec 7, 2020

Recost shortcuts in bidastar: first approach #2661

Closed

4 tasks

Comment unused function params

aec3530