-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: peer_manager - extend the number of connection requests to known peers #2534
Conversation
You can find the image built from this PR at
Built from f595bf6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
N.B.
The experimental peer manager try to connect to all possible nodes here -> https://github.com/waku-org/nwaku/blob/0894f0cfea9fac17348607cb50768e17bd788660/waku/node/peer_manager/peer_manager.nim#L750C1-L753C51
We could try to enable it in waku.test
?
Good point, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm. I think we'd need a more comprehensive solution, unless I'm missing something. This ignores the fact that there is a maximum number of connections in general (we can't just connect to everyone). In L670 this proc is exited if we have already reached our max out conn target. Previously we could exceed this, but by no more than 9 (10 minus 1) as we wouldn't attempt more than 10 connections if under this threshold. Now we will continue attempting connections whether we've reached the underlying libp2p conn limit or not. Furthermore, it's almost more important to ask why the peer store seemingly has so many peers stored that are almost certainly unreachable (given that > 700 peers in the store with less than 50 good connections). We should also consider having a larger inRelayPeers threshold and maintaining that, something that has been disabled (probably with good reason) higher up in the proc.
Thanks for the comments @jm-clius !
I completely agree with that, I've replicated the L670 condition within the connection loop in order to keep the same "out" limit.
That high number of peers within PeerStore, all come from Discv5. I don't know whether all of them are reachable or not.
Excuse me, but I can't quite get this comment :) |
0528713
to
bc65d75
Compare
Indeed. My point is that we may need a separate investigation into why the discv5 dht would have so many stale peers. Perhaps it should be tuned to ping peers more regularly, but it could also be an indication that many peers may somehow be reachable for discv5 exchanges but not be useful on the libp2p layer. In either case, having a very large proportion of "known" peers be essentially useless would cause fragile connectivity, loads of bandwidth/resource overhead, etc. in the network. My point is that this will also affect the number of successful "in" connections.
Similar to how we have a target number for out connections, we also have target number of in connections - however, the logic that would keep us to this target seems to be disabled higher up in the proc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the out target now included, I can approve. However, I'm not convinced that this will necessary improve the situation (since we'll be dialing in batches of 10 in any case, always waiting for the last connection/dial timeout before attempting to dial a new batch). We may need to monitor impact on resource usage here as well.
Description
@danisharora099 informed about a very low number of
in
connections on his node:We've also noticed this behavior in other nodes.
This is how
peer_manager
establishes newout
connections in anwaku
node:Every node tries to establish connections to its known peers, every minute.
On every iteration,
await pm.connectToNodes(outsideBackoffPeers[0..<numPeersToConnect])
is run but the number of parallel connections is bound to just 10 (MaxParallelDials
.) And that means that any other node will try to establish a connection with 10 peers, every minute.Then, given the number of known peers that each other node has (754 in the Danish node) that makes it more difficult, or slower, to try to connect to other nodes.
With this change, we make any node to establish as many connections as possible to other nodes, on each
relayConnectivityLoop
iteration. And with that, we enlarge the chances of other nodes to receivein
connections.Changes
relayConnectivityLoop
iteration.