Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graph performance is very bad in worst case scenarios #10331

Open
michaeljb opened this issue Feb 14, 2024 · 13 comments
Open

Graph performance is very bad in worst case scenarios #10331

michaeljb opened this issue Feb 14, 2024 · 13 comments
Assignees
Labels
graph Problems computing the graph

Comments

@michaeljb
Copy link
Collaborator

michaeljb commented Feb 14, 2024

Discussed in the site chat this morning:

Can anyone please tell me what is wrong with the following. Trying to place tile 24 on B25 in game https://18xx.games/game/151565

it does let me when I clone that game
though the game is running super slowly on my browser, takes forever to load

Thanks it is running slow here too and keeps refreshing and losing the tile

I repro'd the slowness but haven't had time to profile or do a cursory investigation of whether the slowness is the game processing or the rendering. (also need to see whether the issue is specific to 18GB)

Copied the game data to this gist

Maybe similar? #10123

@michaeljb
Copy link
Collaborator Author

Firefox profiling shows the slowness coming from computing the graph during rendering, starting from the connected_hexes call in tracker_available_hex:

https://github.com/tobymao/18xx/blob/3794ce0dd/lib/engine/step/tracker.rb#L494

From there it's a pretty deep stack of walking the graph

Screenshot 2024-02-17 092523

@michaeljb michaeljb added graph Problems computing the graph and removed needs triage labels Feb 17, 2024
@michaeljb
Copy link
Collaborator Author

connected = hex_neighbors(entity, hex)

@game.graph_for_entity(entity).connected_hexes(entity)[hex]

compute(corporation) unless @connected_hexes[corporation]

18xx/lib/engine/graph.rb

Lines 234 to 252 in 3794ce0

node.walk(visited: visited, corporation: walk_corporation, skip_track: @skip_track,
skip_paths: skip_paths, converging_path: false) do |path, _, _|
next if paths[path]
paths[path] = true
path.nodes.each do |p_node|
nodes[p_node] = true
local_nodes[p_node] = true
yield p_node if block_given?
end
hex = path.hex
path.exits.each do |edge|
hexes[hex][edge] = true
hexes[hex.neighbors[edge]][hex.invert(edge)] = true if !@check_regions || !@game.region_border?(hex, edge)
end
end

def walk(

@michaeljb
Copy link
Collaborator Author

michaeljb commented Feb 17, 2024

Adding a bunch of temporary puts statements to debug this, including some I've added dozens of times before, inspired me to finally make them permanent and set up a global logger for the game engine and views - #10350

If that PR was already merged and this problem was encountered, identifying it as a graph problem would be as simple as adding ?l=0 to the game URL and watching the console.

@michaeljb michaeljb changed the title [18GB] cannot lay a tile, slow processing and/or rendering [18GB] cannot lay a tile, slow rendering due to Graph computation Feb 18, 2024
@michaeljb
Copy link
Collaborator Author

michaeljb commented Feb 19, 2024

The problem is something to do with the deletion on the converging check from #7788, removing these lines lets my browser compute the graph (probably not correctly) in about 0.02 seconds instead of 55 seconds:

visited.delete(self) if converging_path

visited.delete(self) if converging

michaeljb added a commit to michaeljb/18xx.games that referenced this issue Feb 19, 2024
michaeljb added a commit to michaeljb/18xx.games that referenced this issue Feb 19, 2024
@michaeljb
Copy link
Collaborator Author

michaeljb commented Feb 19, 2024

Still have some failing tests, but I think I've got some progress on my graph-converging branch - https://github.com/michaeljb/18xx.games/commits/graph-converging?since=2024-02-19

@michaeljb michaeljb self-assigned this May 24, 2024
@michaeljb michaeljb changed the title [18GB] cannot lay a tile, slow rendering due to Graph computation Graph performance is very bad in worst case scenarios Jun 12, 2024
@michaeljb
Copy link
Collaborator Author

I have some real progress on my bfs-graph branch. https://github.com/michaeljb/18xx.games/commits/bfs-graph/

There's still a lot of work to make it to support the custom args on Engine::Graph and make it a drop-in replacement.

The key part that will fix the algorithm for these bad cases isn't switching from DFS to BFS, but instead of the converging_path logic, it tracks from which direction paths and nodes are added to the graph, i.e., once a path has been reached from both sides it can be skipped.

My graph also advances one function call at a time, instead of recursively traversing the whole graph. This enables visualizing the graph as it grows, and would be required for theoretical future features like persistent lazy graphs that don't get cleared, or bidirectional search for more efficient home-destination connection checking.

@michaeljb
Copy link
Collaborator Author

michaeljb commented Jun 23, 2024

In game 151565 the tile lays at action 536 and 537 on F19 and G20 raise the number of total node.walk and path.walk calls needed to compute LNWR's graph from 445 to 2711.

at 535:

Screenshot 2024-06-22 185127

at 537:

Screenshot 2024-06-22 185134

While not adding any actual new connectivity for LNWR, these tile lays add a lot more converging exits, and they're first really big jump in the number of walk calls made.

As the game progresses the number of calls gets huge.

Action 650 upgrades Derby, adding another token slot to the city and unblocking LNWR. The walk calls here rise from 196,238 to 1,659,446. By the end of the game, it's 5,862,969, plus 1,117,663 calls that return quickly via guard statements.


I was hoping that finding the first big unnecessary jump in walk calls would make it easier to describe what exactly the problem in the current algorithm is, but the track network at that point in the game is already fairly complex. I can't be more specific than saying the converging_paths logic is buggy and leads to lengthy loops.

@michaeljb
Copy link
Collaborator Author

michaeljb commented Jun 23, 2024

When a check is made to see if a corporation can run routes, route_info calls compute(routes_only: true). This still does a full DFS search, but only starting from one token. It could probably be optimized to return as soon as multiple nodes are found.

@michaeljb
Copy link
Collaborator Author

Action 650 upgrades Derby, adding another token slot to the city and unblocking LNWR. The walk calls here rise from 196,238 to 1,659,446. By the end of the game, it's 5,862,969, plus 1,117,663 calls that return quickly via guard statements.

It looks like these numbers are wrong, but the right numbers are still in the 100Ks if not millions. Looks like I was messing up the default argument when plumbing walk_calls through.

@michaeljb
Copy link
Collaborator Author

Action 650 upgrades Derby, adding another token slot to the city and unblocking LNWR. The walk calls here rise from 196,238 to 1,659,446. By the end of the game, it's 5,862,969, plus 1,117,663 calls that return quickly via guard statements.

It looks like these numbers are wrong, but the right numbers are still in the 100Ks if not millions. Looks like I was messing up the default argument when plumbing walk_calls through.

With #10951 I am actually getting the same endgame numbers at action 759 (LNWR's last turn of the game, in OR 8.1). Not sure why some of the earlier counts are different than what I was seeing before 🤷

@michaeljb
Copy link
Collaborator Author

Some good progress this weekend, I was able to configure 18GB to use my new adapter class instead of Engine::Graph here's screenshot of the side-by-side logs for loading up the problematic game to action 759:

Screenshot 2024-06-23 134642

I can't step through the game yet to see how much that performance has improved, getting some errors I haven't had time to fix yet, but it's certainly coming along nicely!

@michaeljb
Copy link
Collaborator Author

The real source of trouble appears to be in eliminating overlapping paths from connected_paths. If a path is overlapping (e.g., the pink path in the screenshot here), to see if it's legal, it needs to search for every possible route back to a token that avoids the path(s) it overlaps with, so if there is no such route it could be a very long process to confirm that.

Screenshot 2024-06-29 213332

@michaeljb
Copy link
Collaborator Author

Been a while since I've spent any time on this, but I think I'm still hung up on the overlapping problem from the previous comment.

I think that seeing if the overlapping path connects back to a token without overlapping requires checking all possible routes from that path. I've tried thinking of clever ways to track more info as the graph is built, but that would end up being both time and memory intensive if you want every path/node to know about all possible paths to it.

An alternate approach would be to rework the engine to stop auto-skipping steps so that the full graph doesn't need to be computed as much, and maybe getting rid of the highlighting in the UI for the track-laying step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
graph Problems computing the graph
Projects
None yet
Development

No branches or pull requests

1 participant