-
Notifications
You must be signed in to change notification settings - Fork 660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix possible null pointer dereferencing #3065
Conversation
Rather than blanketing the code with this could you maybe figure out which one you are seeing in practice and fixt that one? A lot of these will be unnecessary. For example. In the Origin function of bidirectional a* the ede id's come from loki using the same graph reader. It is possible to write code to clear the tile cache between loki and Thor but it's harder to understand why someone would do that. So yeah please can you focus the pr to a specific case you were running into? Or maybe you are using the library in another project and the cache can be cleared at any time by another thread? A bit more explanation of what is going on here would be great! |
@kevinkreiser Sorry for the incomplete description. This fixes for our project. And yes, in our case the cache can be cleared from another thread. |
ce0b5b3
to
54fb3c7
Compare
thats a lot of branches and print statements. we should clean up print statements and we should check what impacts all of these branches have on the code in terms of performance |
@kevinkreiser |
54fb3c7
to
a98dbb4
Compare
24129f1
to
f4b2ab9
Compare
Algorithms Should Not Be Robust to Data Disappearing On-the-flyWe'll start by clearing up one thing. None of the algorithms are robust to randomly removing tiles out from underneath them. The
Yes the code is robust to a tile not existing or not being on the device that is fine but its not robust to it being there for part of the computation and going away the next. And actually it SHOULNT be anyway. Think about what would happen in a couple scenarios and you'll realize why its not worth it to try to make the code "gracefully" handle this.
As you can see, we can just keep playing little scenarios like this out over and over and every time it doesnt make sense to continue our work. We need Your Particular Problem: Routing on Regional ExtractsI'm pretty sure the scenario you are worrying about is the following: You configure graphreader to cache tiles in memory, but the in memory cache is backed up by disk and the disk only has a subset of tiles from the planet on it so its backed up by a network call. Your cache hierarchy is like this:
To screw us even worse we have multiple threads sharing synchronized access to this tile cache and they all have the power to Now I get that you still want to route with the data you have and you want to be able to clear some of it to stay within the boundaries of what the configuration demands. The router already supports that so long as you understand you can only shrink back down to your limit if a route isnt in progress. For the reasons described above it doesnt make sense to have the algorithms constantly checking if the data they just saw still exists (for the same routing calculation). Potential SolutionI believe you are using this library in another application where you have the individual workers (loki, thor, odin) and they share a thread synchronized tile cache via the graphreader. I also believe when you do a route or a map match etc you probably call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still dont think this is the right way to go. It is pointless to make the router robust to "disappearing" tiles. We should be robust to running on regional tile sets (ie nullptrs are allowed) but tiles cant be there in one iteration of the loop and not be there in the next.
Please see my comments here: #3065 (comment)
if we assume that we need the router to deal with tiles blinking into and out of existence then we need to look at each one of these changes and determine can the algorithm even continue past this point if it cant get the tile. If it can we continue if it cant we throw and abort the computation. For continuing some common cases would be:
For aborting the computation common cases would be:
@mskurydin @DenisPryt @SiarheiFedartsou it seems to me the only "recoverable" one above is when a tile goes away and we basically cut all search paths (expansions) that are in it out and continue. any other failure should if that is the case i kind of wonder if we should just always throw when we detect a tile disappeared out from under us and then retry. like maybe we make a method on graphreader that is called anyway i'll go through all of these and mark each type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i just went over the whole PR. please make the requested changes. after that we need to do a performance check in sever mode just to make sure we dont have any regressions there (hopefully branch prediction will help us there)
@kevinkreiser should I add new exception like |
@DenisPryt it would be a nice-to-have but i wouldnt say it is required. its your choice! we definitely need a changelog entry explaining that you made an enhancement to continue the routing computation in the case that a graphtile disappears from the tilecache |
2b42b1e
to
f2959e2
Compare
3396310
to
aeef6f3
Compare
@kevinkreiser so now we need to make performance tests, right? How can I do it? Just make run-benchmarks? |
@DenisPryt different people do it in different ways. Perhaps @genadz could show you the way he and i have been doing it lately. or if you can wait until the evening i can run the check on my end. @danpat did have a way of running the algorithm benchmarks with larger datasets (like a server would) but i cant remember if its documented or not. |
@kevinkreiser I've been downloading tiles for benchmarks since this morning. If it's not difficult, can you try to run a benchmark? |
Yep I can. I'll do it tonight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other than the small nitpick, please at a changelog entry to record that you made this enhancement
i checked the performance and there was no impact as we had hoped. i am certain there are many many more of these types of fixes througout the code base that we will want to do (in loki and in other parts of the algorithms in thor) but we can get them in another pull request.
i do think that in the future we should consider adding to GraphReader
a method which throws immediately if the tile is not found RequireGraphTile
that returns a tile just like GetGraphTile
does but throws an exception if its not there
…an disappear from the reader
853b4d4
to
129238d
Compare
@kevinkreiser done |
Issue
#3064
Tasklist
Requirements / Relations
Link any requirements here. Other pull requests this PR is based on?