Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(networking): add relay connectivity loop #1482

Merged
merged 4 commits into from
Jan 18, 2023
Merged

Conversation

alrevuelta
Copy link
Contributor

@alrevuelta alrevuelta commented Jan 10, 2023

Related:

Summary:

  • Decouples peer discovery from peer connection.
  • discv5 loop now only adds peers the the peer store instead of also trying to connect to them.
  • A new loop relayConnectivityLoop ensures healthy Relay connections according to some criteria, taking peers from the peerstore previosly added by discv5 (or other ambient discovery techniques)
  • This PR is to be used by feat(networking): add backoff period after failed dial #1462

@status-im-auto
Copy link
Collaborator

status-im-auto commented Jan 10, 2023

Jenkins Builds

Click to see older builds (2)
Commit #️⃣ Finished (UTC) Duration Platform Result
✔️ 2f4df17 #1 2023-01-10 23:00:27 ~13 min macos 📦bin
✔️ 4773283 #2 2023-01-11 23:00:58 ~13 min macos 📦bin
Commit #️⃣ Finished (UTC) Duration Platform Result
✔️ ba89315 #3 2023-01-12 23:02:19 ~15 min macos 📦bin
✔️ 6542289 #4 2023-01-13 22:59:48 ~12 min macos 📦bin

@alrevuelta alrevuelta marked this pull request as ready for review January 12, 2023 10:10
@alrevuelta alrevuelta self-assigned this Jan 16, 2023
Copy link
Contributor

@jm-clius jm-clius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally makes sense to me! Would like an opportunity to revisit this tomorrow. Can we get one more reviewer? In the meantime see two minor comments below,

waku/v2/node/peer_manager/waku_peer_store.nim Outdated Show resolved Hide resolved
apps/wakunode2/wakunode2.nim Outdated Show resolved Hide resolved
@alrevuelta alrevuelta requested review from LNSD and rymnc January 16, 2023 16:40
Copy link
Contributor

@rymnc rymnc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jm-clius jm-clius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as an initial decoupling, thanks!

A suggestion for subsequent PRs is to extract the connectivity logic and decisions that we've made (e.g. number of slots for service peers, max parallel dials, number of peers to connect). This is not only to avoid magic numbers, but so that we can select which of these to specify in an RFC. An incremental process, of course.

continue

# TODO: Respect backoff before attempting to connect a relay peer
var disconnectedPeers = pm.peerStore.getNotConnectedPeers().mapIt(RemotePeerInfo.init(it.peerId, it.addrs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: notConnectedPeers

Copy link
Contributor

@LNSD LNSD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, check my comments.

# ...public key
var publicKey: PublicKey
discard remotePeerInfo.peerId.extractPublicKey(publicKey)

if pm.peerStore[AddressBook][remotePeerInfo.peerId] == remotePeerInfo.addrs and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you abstract these bracket access statements (this one and the ones below) with procs? For example:

Suggested change
if pm.peerStore[AddressBook][remotePeerInfo.peerId] == remotePeerInfo.addrs and
if pm.addressBook.getByPeerId(remotePeerInfo.peerId) == remotePeerInfo.addrs and

Note that addressBook is a "property": a getter proc, called without parenthesis, returning the field and has a noun name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mm I kind of agree, but we would need quite a lot of boilerplate code to wrap this, one new proc per "book". One of the cool things of nimlibp2p extended peerstore, is to expand their original peerstore with just peerStore[whatever].

since it implies some modifications that doesnt per se add functionality, would like to leave it like that by now.

shuffle(disconnectedPeers)

# limit the amount of paralel dials
let maxParalelDials = 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a parametrizing constant? Can you move it outside of the proc? To the top of the module with the rest of the constants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, done

@@ -392,6 +392,9 @@ proc startRelay*(node: WakuNode) {.async.} =
protocolMatcher(WakuRelayCodec),
backoffPeriod)

# Maintain relay connections
asyncSpawn node.peerManager.relayConnectivityLoop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, we should avoid calling asyncSpawn within the Waku node module. All asyncSpawn methods should be spawned by the application (all at the same place), in our case, the wakunode2 module.

Doing it that way will make it simpler to understand how different scheduled tasks interact and compete for CPU time, as there will be no unseen tasks launched.

I know this is not new, and there are more asyncSpawn scattered within the protocols and the Waku node module code, but it is in our hands to make the codebase debuggable and maintainable. And understanding where coroutines are spawned makes the code debuggable and maintainable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeap, agree. adding it to the todo list.

@@ -897,9 +900,6 @@ proc startKeepalive*(node: WakuNode) =

asyncSpawn node.keepaliveLoop(defaultKeepalive)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the case for this statement, for example, an asyncSpawn within the Waku node module.

Comment on lines +317 to +320
info "Relay connectivity loop",
connectedPeers = numConPeers,
targetConnectedPeers = maxConnections,
availableDisconnectedPeers = disconnectedPeers.len
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this to debug level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imho this log is quite important, and its not spammy. quite common in blockchains to have this constant feedback on connected peers. Also useful to quickly see if there are any problems, lets say that i don't have enough peers, well looking into availableDisconnectedPeers might help. And always good to see the difference between my target amount of peers and the ones that are actually connected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

One observation about "quite common in blockchains to have this constant feedback on connected peers.":

Logs, exceptions, and other "out of the code execution flow" elements are part of an application's public-facing API. Good API design states that a library module, like the Waku node, should not make assumptions about this matter.

A blockchain node is an application, not a library.

waku/v2/node/peer_manager/peer_manager.nim Show resolved Hide resolved
Comment on lines +187 to +190
if pm.peerStore[AddressBook][remotePeerInfo.peerId] == remotePeerInfo.addrs and
pm.peerStore[KeyBook][remotePeerInfo.peerId] == publicKey:
# Peer already managed
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For readability reasons, this complex if statement condition should be replaced by a pm.addressBook.isPeerManaged() proc returning a boolean. If you do that, the comment on line 189 will be unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just tried to implement that but then i realized that then you need to extractPublicKey twice, because its not a field in the remotePeerInfo. unless im missing something.

proc pm.addressBook.isPeerManaged(remotePeerInfo): bool = 
    var publicKey: PublicKey
    discard remotePeerInfo.peerId.extractPublicKey(publicKey)

    if pm.peerStore[AddressBook][remotePeerInfo.peerId] == remotePeerInfo.addrs and
     pm.peerStore[KeyBook][remotePeerInfo.peerId] == publicKey:
    # Peer already managed
    return true

Why thats not part of remotePeerInfo? well perhaps it should, but beyond this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why thats not part of remotePeerInfo? well perhaps it should, but beyond this PR.

🤷🏼

If I were working in Waku's peer management, I would modify the RemotePeerInfo type and add it. This is a fundamental issue that should be fixed if it helps improve code readability and reduces the cognitive load of the code.

As a temporary solution until the remote peer type includes the public key, I don't see any problem with extracting the key twice.

@@ -143,3 +143,6 @@ proc selectPeer*(peerStore: PeerStore, proto: string): Option[RemotePeerInfo] =

proc getPeersByDirection*(peerStore: PeerStore, direction: Direction): seq[StoredInfo] =
return peerStore.peers().filterIt(it.direction == direction)

proc getNotConnectedPeers*(peerStore: PeerStore): seq[StoredInfo] =
return peerStore.peers().filterIt(it.connectedness != Connected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You can remove the parenthesis from the peers property, as it is a noun.

Suggested change
return peerStore.peers().filterIt(it.connectedness != Connected)
return peerStore.peers.filterIt(it.connectedness != Connected)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure fixed.


# Ensures a healthy amount of connected relay peers
proc relayConnectivityLoop*(pm: PeerManager) {.async.} =
let defaultInterval = chronos.seconds(30)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a parametrizing constant? Can you move it outside of the proc? To the top of the module with the rest of the constants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup thanks, done.

@alrevuelta alrevuelta merged commit fd1ec4a into master Jan 18, 2023
@alrevuelta alrevuelta deleted the add-connectivity-loop branch January 18, 2023 14:17
rymnc pushed a commit that referenced this pull request Jan 22, 2023
* feat(networking): add relay connectivity loop

* Add unit tests

* feat(networking): fix comments

* Fix lnsd comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants