fix: peer_manager - extend the number of connection requests to known peers #2534

Ivansete-status · 2024-03-15T11:15:02Z

Description

@danisharora099 informed about a very low number of in connections on his node:

We've also noticed this behavior in other nodes.

This is how peer_manager establishes new out connections in a nwaku node:
Every node tries to establish connections to its known peers, every minute.
On every iteration, await pm.connectToNodes(outsideBackoffPeers[0..<numPeersToConnect]) is run but the number of parallel connections is bound to just 10 (MaxParallelDials.) And that means that any other node will try to establish a connection with 10 peers, every minute.
Then, given the number of known peers that each other node has (754 in the Danish node) that makes it more difficult, or slower, to try to connect to other nodes.

With this change, we make any node to establish as many connections as possible to other nodes, on each relayConnectivityLoop iteration. And with that, we enlarge the chances of other nodes to receive in connections.

Changes

Try to connect to all known peers on each relayConnectivityLoop iteration.

github-actions · 2024-03-15T11:19:34Z

You can find the image built from this PR at

quay.io/wakuorg/nwaku-pr:2534-rln-v2-false

Built from f595bf6

SionoiS

LGTM!

N.B.
The experimental peer manager try to connect to all possible nodes here -> https://github.com/waku-org/nwaku/blob/0894f0cfea9fac17348607cb50768e17bd788660/waku/node/peer_manager/peer_manager.nim#L750C1-L753C51

We could try to enable it in waku.test?

Ivansete-status · 2024-03-15T13:57:42Z

LGTM!

N.B. The experimental peer manager try to connect to all possible nodes here -> https://github.com/waku-org/nwaku/blob/0894f0cfea9fac17348607cb50768e17bd788660/waku/node/peer_manager/peer_manager.nim#L750C1-L753C51

We could try to enable it waku.test?

Good point, thanks!
I think that we should properly separate both networks so that we see the impact of enabling such an experimental feature.

jm-clius

Mmm. I think we'd need a more comprehensive solution, unless I'm missing something. This ignores the fact that there is a maximum number of connections in general (we can't just connect to everyone). In L670 this proc is exited if we have already reached our max out conn target. Previously we could exceed this, but by no more than 9 (10 minus 1) as we wouldn't attempt more than 10 connections if under this threshold. Now we will continue attempting connections whether we've reached the underlying libp2p conn limit or not. Furthermore, it's almost more important to ask why the peer store seemingly has so many peers stored that are almost certainly unreachable (given that > 700 peers in the store with less than 50 good connections). We should also consider having a larger inRelayPeers threshold and maintaining that, something that has been disabled (probably with good reason) higher up in the proc.

Ivansete-status · 2024-03-15T23:06:33Z

Thanks for the comments @jm-clius !

Mmm. I think we'd need a more comprehensive solution, unless I'm missing something. This ignores the fact that there is a maximum number of connections in general (we can't just connect to everyone). In L670 this proc is exited if we have already reached our max out conn target. Previously we could exceed this, but by no more than 9 (10 minus 1) as we wouldn't attempt more than 10 connections if under this threshold. Now we will continue attempting connections whether we've reached the underlying libp2p conn limit or not.

I completely agree with that, I've replicated the L670 condition within the connection loop in order to keep the same "out" limit.

Furthermore, it's almost more important to ask why the peer store seemingly has so many peers stored that are almost certainly unreachable (given that > 700 peers in the store with less than 50 good connections).

That high number of peers within PeerStore, all come from Discv5. I don't know whether all of them are reachable or not.

We should also consider having a larger inRelayPeers threshold and maintaining that, something that has been disabled (probably with good reason) higher up in the proc.

Excuse me, but I can't quite get this comment :)

jm-clius · 2024-03-18T17:04:52Z

I don't know whether all of them are reachable or not.

Indeed. My point is that we may need a separate investigation into why the discv5 dht would have so many stale peers. Perhaps it should be tuned to ping peers more regularly, but it could also be an indication that many peers may somehow be reachable for discv5 exchanges but not be useful on the libp2p layer. In either case, having a very large proportion of "known" peers be essentially useless would cause fragile connectivity, loads of bandwidth/resource overhead, etc. in the network. My point is that this will also affect the number of successful "in" connections.

Excuse me, but I can't quite get this comment :)

Similar to how we have a target number for out connections, we also have target number of in connections - however, the logic that would keep us to this target seems to be disabled higher up in the proc.

jm-clius

With the out target now included, I can approve. However, I'm not convinced that this will necessary improve the situation (since we'll be dialing in batches of 10 in any case, always waiting for the last connection/dial timeout before attempting to dial a new batch). We may need to monitor impact on resource usage here as well.

Ivansete-status marked this pull request as ready for review March 15, 2024 11:55

Ivansete-status requested review from alrevuelta, jm-clius, SionoiS and NagyZoltanPeter March 15, 2024 11:55

Ivansete-status self-assigned this Mar 15, 2024

SionoiS approved these changes Mar 15, 2024

View reviewed changes

jm-clius requested changes Mar 15, 2024

View reviewed changes

Ivansete-status added 3 commits March 16, 2024 00:15

peer_manager: extend the number of connection requests to known peers

0f877f0

peer_manager: limit the max num out conns from within the conn loop

8b7bcb2

peer_manager: re-style with nph-0.5.1

bc65d75

Ivansete-status force-pushed the dial-issue branch from 0528713 to bc65d75 Compare March 15, 2024 23:18

jm-clius approved these changes Mar 19, 2024

View reviewed changes

Ivansete-status merged commit 2173fe2 into master Mar 19, 2024
14 of 15 checks passed

Ivansete-status deleted the dial-issue branch March 19, 2024 18:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: peer_manager - extend the number of connection requests to known peers #2534

fix: peer_manager - extend the number of connection requests to known peers #2534

Ivansete-status commented Mar 15, 2024

github-actions bot commented Mar 15, 2024 •

edited

SionoiS left a comment •

edited

Ivansete-status commented Mar 15, 2024

jm-clius left a comment

Ivansete-status commented Mar 15, 2024

jm-clius commented Mar 18, 2024

jm-clius left a comment

fix: peer_manager - extend the number of connection requests to known peers #2534

fix: peer_manager - extend the number of connection requests to known peers #2534

Conversation

Ivansete-status commented Mar 15, 2024

Description

Changes

github-actions bot commented Mar 15, 2024 • edited

SionoiS left a comment • edited

Choose a reason for hiding this comment

Ivansete-status commented Mar 15, 2024

jm-clius left a comment

Choose a reason for hiding this comment

Ivansete-status commented Mar 15, 2024

jm-clius commented Mar 18, 2024

jm-clius left a comment

Choose a reason for hiding this comment

github-actions bot commented Mar 15, 2024 •

edited

SionoiS left a comment •

edited