Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recost shortcuts in bidastar: second approach #2711

Merged
merged 24 commits into from
Jan 21, 2021

Conversation

genadz
Copy link
Contributor

@genadz genadz commented Dec 7, 2020

Tasklist

  • Add tests
  • Add #fixes with the issue number that this PR addresses
  • Generally use squash merge to rebase and clean comments before merging
  • Update the changelog

Requirements / Relations

Link any requirements here. Other pull requests this PR is based on?


// Special case code if the last edge of the forward path is the destination edge
// which means we need to worry about partial distance on the edge
if (edgelabels_reverse_[idx2].predecessor() == kInvalidLabel) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know its a bit heavy to use recosting to fix this problem, but it feels really good to be able to remove this anyway 😄

src/sif/recost.cc Outdated Show resolved Hide resolved
// grab the edge
edge = reader.directededge(edge_id, tile);
if (!edge) {
throw std::runtime_error("Edge cannot be found");
}

// re-derive uturns, would have been nice to return this but we dont know the next edge yet
if (label.opp_local_idx() == edge->localedgeidx())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why add a branch to this code path, we simply store the value of this comparison?

Copy link
Contributor Author

@genadz genadz Dec 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because by default we use deadend flag from the edge https://github.com/valhalla/valhalla/blob/master/valhalla/sif/edgelabel.h#L67

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm. from the other hand, if it's deadend, this condition will be always true

@@ -58,7 +58,8 @@ void filter_alternates_by_stretch(std::vector<CandidateConnection>& connections)

// Limited Sharing. Compare duration of edge segments shared between optimal path and
// candidate path. If they share more than kAtMostShared throw out this alternate.
bool validate_alternate_by_sharing(GraphReader& graphreader,
// Note that you should recover all shortcuts before call this function.
bool validate_alternate_by_sharing(GraphReader& /*graphreader*/,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you actually just remove the graphreader argument completely, now that we dont ahve to recover shortcuts its not used.

Copy link
Member

@kevinkreiser kevinkreiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two things:

  1. remove the branch in recosting
  2. remove the graphreader arg in path sharing

@kevinkreiser
Copy link
Member

well crap... I performance tested this and saw that it added approximately a 25% slowdown. assuming it must be the shortcut recovery. i did this little bit of work: #2714 after testing with that enabled on this branch it was just as slow 😄

so i suspect some things..

  1. i disabled recovering shortcuts on this branch and it was still about 12% slower, which must mean that recosting is slower
  2. since having a cache of recovered shortcuts didnt speed it up it must be a function of having so many edges in the route, so maybe its in triplegbuilder or other serializers.

i'll look a bit closer to figure out where the performance drop lies

@kevinkreiser
Copy link
Member

Ok did a bit more digging. First I enabled the shortcut cache. Then I took the variable path_edges and made it a member variable path_egdes_ (dont forget to clear it in the clear function). Then I wrote a bash alias like so:

alias stats='R -q -e "x <- read.csv(\"stdin\", header = F); quantile(x[ ,1] , c(0, .5, .90, .99, 1)); sd(x[ , 1])"'

It seemed to me that the worst offenders in terms of routes were driving up my benchmarking scores to the point that i couldnt really make sense of it. Non insane routes were almost unaffected by this change. Anyway to prove out this thought I ran up my server with master. And did a run of RAD but with the json output format:

./run_with_server.py --test-file auto.txt --url http://localhost:8002/route --concurrency 24 --format json

then i did the same but ran this branch with my small changes added. then i compared the percentiles between the two:

master:

grep -F response_time 20201207_235004_auto/* | sed -e "s/.* //g" | stats
> x <- read.csv("stdin", header = F); quantile(x[ ,1] , c(0, .5, .90, .99, 1)); sd(x[ , 1])
         0%         50%         90%         99%        100% 
0.002395153 0.036434412 0.122441101 0.709217098 1.643116236 
[1] 0.1298356
> 
> 

this branch with my slight modifications:

grep -F response_time 20201207_235232_auto/* | sed -e "s/.* //g" | stats
> x <- read.csv("stdin", header = F); quantile(x[ ,1] , c(0, .5, .90, .99, 1)); sd(x[ , 1])
         0%         50%         90%         99%        100% 
0.002358675 0.038644910 0.150891209 0.788010578 1.731230021 
[1] 0.1442929
> 
> 

You can see that for the average case the performance is nearly unchanged where as the performance of the top 10 percent of the slowest routes is something like 10-15% worse. @genadz i tried to update your other branch where we dont recost the whole route but rather just the parts that need it as they are recovered. can you use the same testing method to see what kind of results you get?

@genadz
Copy link
Contributor Author

genadz commented Dec 8, 2020

on my machine I got the following results:

master (route not found for 76 request)

grep -F response_time res_master/* | sed -e "s/.* //g" | stats 
> x <- read.csv("stdin", header = F); quantile(x[ ,1] , c(0, .5, .90, .99, 1)); sd(x[ , 1])
         0%         50%         90%         99%        100% 
0.003356218 0.108471870 0.408698320 0.995351839 2.934854984 
[1] 0.2041199
> 
> 

recost_shortcuts_new (second approach) (route not found for 126 request)

grep -F response_time res_recost/* | sed -e "s/.* //g" | stats 
> x <- read.csv("stdin", header = F); quantile(x[ ,1] , c(0, .5, .90, .99, 1)); sd(x[ , 1])
        0%        50%        90%        99%       100% 
0.00315094 0.13643813 0.55678535 1.43106830 3.76190114 
[1] 0.2874338
> 
> 

recost_shortcuts (first approach) (route not found for 1183 request)

grep -F response_time res_first/* | sed -e "s/.* //g" | stats 
> x <- read.csv("stdin", header = F); quantile(x[ ,1] , c(0, .5, .90, .99, 1)); sd(x[ , 1])
         0%         50%         90%         99%        100% 
0.003399849 0.132326603 0.479046679 1.087140005 3.063050032 
[1] 0.2249207
> 
> 

hm, that's interesting. despite the fact that times for the first approach are closer to master than second approach, I got approximately the same total time for both recosting branches (~25% slower).

@kevinkreiser
Copy link
Member

first thing i would do is figure out which routes are failing and fix them in both branches so that at least correctnesswise the code is shippable. then we'll have to ponder about performance.

@genadz
Copy link
Contributor Author

genadz commented Dec 8, 2020

fixed first branch. it's ~15% slower than master

recost_shortcuts (first approach)

grep -F response_time res_first/* | sed -e "s/.* //g" | stats
> x <- read.csv("stdin", header = F); quantile(x[ ,1] , c(0, .5, .90, .99, 1)); sd(x[ , 1])
        0%        50%        90%        99%       100% 
0.00292778 0.07737732 0.49527001 1.44068127 5.46187663 
[1] 0.2961684
> 
> 

master

grep -F response_time res_master/* | sed -e "s/.* //g" | stats
> x <- read.csv("stdin", header = F); quantile(x[ ,1] , c(0, .5, .90, .99, 1)); sd(x[ , 1])
         0%         50%         90%         99%        100% 
0.002770424 0.073756933 0.417046785 1.102108717 5.447159052 
[1] 0.2343167
> 
> 

@kevinkreiser
Copy link
Member

@genadz testing the latest code here i still see very large performance difference, in absolute terms it takes me about 23% more time to complete a 14k set of routes. also there are about 10% diffs in a RAD of the same route set. i would have expected some diffs but 10% is staggering. we should that we are seeing diffs we expect.

@dgearhart would you be able to take a look? the summary here is that the code now turns shortcut edges in the path into the list of underlying edges. it doesnt add the intersecting edges at the nodes that were previously non-existant.

at first i thought at sharp turns and stuff, the narrative might get an extra maneuver but without in the intersecting edges that seems highly unlikely (i am pretty sure it wont happen). this leads me to believe we are somehow getting different paths but i fail to see how that is possible considering the change to recover the shortcuts is after the path is found!

@dgearhart
Copy link
Member

@kevinkreiser I can kick off in the background - i am assuming master vs this branch?

@dgearhart
Copy link
Member

@dgearhart would you be able to take a look? the summary here is that the code now turns shortcut edges in the path into the list of underlying edges. it doesnt add the intersecting edges at the nodes that were previously non-existant.

@kevinkreiser
I am seeing some time diffs - seem to be okay

I am seeing diffs like the following:
image
This looks good - we now have turn lanes with the recost_shortcuts_new branch

Why are we not adding the intersecting edge info? That would help with some missing maneuvers

I see a 32% delta for user routes - i do not have time to review all of them but someone should make a good pass at reviewing diffs

@kevinkreiser
Copy link
Member

@dgearhart i would love to add all the edges too but it turns out that costs a ton of CPU and therefore really makes the request slower. that would be the ultimate goal but the original goal of this work is just to be able to get the right costing/time for the edges along the path. i recently got my perf tools working in my IDE so i might have a look here. I'll say this, one of the big contributors is the state shield verbal regexs in odin. They alone are about 7% of the total request latency. I have an idea how to refactor them to make them less expensive but haven't worked on it yet.

@genadz
Copy link
Contributor Author

genadz commented Jan 4, 2021

  1. added commit with the logic that doesn't reject edges in recost function based on nodes/edges access restrictions.
  2. don't know why, but I have different performance results: on my local machine current branch ~4% slower than master.

@kevinkreiser
Copy link
Member

@genadz ill test it again today maybe i didnt get the updates for some reason!

@genadz
Copy link
Contributor Author

genadz commented Jan 4, 2021

@genadz ill test it again today maybe i didnt get the updates for some reason!

thanks!

@kevinkreiser
Copy link
Member

ok looking at the 50th percentile yeah its about 5%. oddly the 90th is something like 15% slower but the 99th and 100th are 5 and 0 respectively. frankly its hard to tell what i should understand from this result. i guess mostly the 50th percentile is the most important since the bulk of requests will be in this range. when i look at the total time to complete the benchmark though its about a 7.5% performance drop. i think if we merge the optimizations branch and do a couple more optimizations we'll be able to pay for this with those 😄

i personally dont think we have to wait to merge this before we merge those, does anyone else have an opinion? @purew or @danpat ?

branch:
         0%         50%         90%         99%        100% 
0.002226353 0.038036823 0.144977236 0.778483784 1.566832542 

master:
         0%         50%         90%         99%        100% 
0.002390385 0.036790848 0.125865579 0.713516281 1.550947189 

Comment on lines +100 to +102
(!costing.Allowed(edge, label, tile, edge_id, localtime, offset_time.timezone_index,
time_restrictions_TODO) &&
!ignore_access)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move the flag up in the if so that allowed doesnt get called at all

Suggested change
(!costing.Allowed(edge, label, tile, edge_id, localtime, offset_time.timezone_index,
time_restrictions_TODO) &&
!ignore_access)) {
!ignore_access && !costing.Allowed(edge, label, tile, edge_id, localtime, offset_time.timezone_index,
time_restrictions_TODO)) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't do this because we need to evaluate time restrictions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or we can explicitly call costing.EvaluateRestrictions in case ignore_access = true

Copy link
Member

@kevinkreiser kevinkreiser Jan 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the restrictions are based on access though as well, is there really a point of checking them? i guess im more thinking we should make the boolean called throw_if_not_allowed or something more generic like strict to just completely turn off checking allowed at all. maybe we can flip the meaning and say something like bool allow_all = false?

Copy link
Contributor Author

@genadz genadz Jan 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm. what if we back to the first approach where we recost only shortcuts ? - in this case we will save all time_restrictions that we calculated for regular edges.
and, when recosting shortcuts, we can 1) use flag to turn off checking Allowed or 2) don't use this flag but in case recosting fails - we just don't expand shortcut edge, add shortcut to the final path

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i personally prefer this implementation as its a lot less complex (no splicing). you have a good point about showing the restrictions. id say just let it as it is and maybe make a note that we have to put the flag second so we can get restriction information.

@valhalla valhalla deleted a comment from mandeepsandhu Jan 20, 2021
@kevinkreiser
Copy link
Member

@mandeepsandhu the main slowdown in odin is running over these regex's for every street name: https://github.com/valhalla/valhalla/blob/master/valhalla/baldr/verbal_text_formatter_us.h#L37

i think we can rewrite this to do our own form of matching that isnt as smart as regex to speed this up.

Copy link
Member

@kevinkreiser kevinkreiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you again so much for the very long path to getting here. i hope we can make a few more performance imporovements and then even remove the optimization for not adding intersecting edges.

@kevinkreiser kevinkreiser merged commit e785ed9 into valhalla:master Jan 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants