route calculation takes O(n^2) time #1761

rade · 2015-12-08T16:49:51Z

when running something like WEAVE_NO_FASTDP=1 NUM_WEAVES=40 bin/multiweave launch --conn-limit 100 and taking a profile snapshot, 2/3rd of the time is spent in route calculation.

Looking at the code, I am pretty sure it is O(n_peers^2). Hopefully there's a better way.

The text was updated successfully, but these errors were encountered:

rade · 2015-12-09T12:06:16Z

One way we could improve matters here is by calculating individual broadcast routes lazily, i.e. only when requested.

Can do the same for unicast routes, though for the whole rather than individual routes since a single traversal of the topology gives us all unicast routes.

Instead of calculating all routes in one go, we calculate them when needed. Except for unicast routes, and broadcast routes from ourself. We still calculate those eagerly because they are needed to route locally captured/fdp-missed packets, and performing route calculcations in that critical path increases the chances of dropping packets due to not keeping up. Well, technically only the established & symmetric routes are needed for the data plane, but we might as well calculate the equivalent routes for the control plane - it keeps the code simple and uniform. Fixes #1761.

Instead of calculating all routes in one go, we calculate them when needed. Except for unicast routes, and broadcast routes from ourself. We still calculate those eagerly because they are needed to route locally captured/fdp-missed packets, and performing route calculcations in that critical path increases the chances of dropping packets due to not keeping up. Well, technically only the established & symmetric routes are needed for the data plane, but we might as well calculate the equivalent routes for the control plane - it keeps the code simple and uniform. Calculating all routes is O(n_peers^2), so this change represents a huge saving in situations where not all of them are needed, which is quite common, especially when the topology is evolving rapidly. One downside of this new approach is that we can no longer suppress OnChange callbacks when the data plane routes remain unchanged. This means we will be purging the FDP flows more frequently than before, e.g. when only the control plane topology changed, e.g. new connections appeared that haven't been marked as 'established' yet. It's a small price to pay, fixable if we really care, and in any case making FDP flow invalidation more fine-grained would be a better approach. Fixes #1761.

rade added chore [component/mesh] performance labels Dec 8, 2015

This was referenced Dec 8, 2015

route calculation performed too frequently under high load #1762

Open

prevent stalling all broadcasts when single connection is blocked #1695

Closed

rade mentioned this issue Dec 10, 2015

lazy route calculation #1773

Merged

bboreham closed this as completed in #1773 Dec 23, 2015

bboreham added this to the 1.5.0 milestone Jan 12, 2016

murali-reddy mentioned this issue Feb 4, 2019

optimise Peer.Routes() for fully connected mesh of node topology weaveworks/mesh#102

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

route calculation takes O(n^2) time #1761

route calculation takes O(n^2) time #1761

rade commented Dec 8, 2015

rade commented Dec 9, 2015

route calculation takes O(n^2) time #1761

route calculation takes O(n^2) time #1761

Comments

rade commented Dec 8, 2015

rade commented Dec 9, 2015