-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(networking): add relay connectivity loop #1482
Conversation
Jenkins BuildsClick to see older builds (2)
|
2f4df17
to
4773283
Compare
ba89315
to
6542289
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally makes sense to me! Would like an opportunity to revisit this tomorrow. Can we get one more reviewer? In the meantime see two minor comments below,
6542289
to
9837b6c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as an initial decoupling, thanks!
A suggestion for subsequent PRs is to extract the connectivity logic and decisions that we've made (e.g. number of slots for service peers, max parallel dials, number of peers to connect). This is not only to avoid magic numbers, but so that we can select which of these to specify in an RFC. An incremental process, of course.
continue | ||
|
||
# TODO: Respect backoff before attempting to connect a relay peer | ||
var disconnectedPeers = pm.peerStore.getNotConnectedPeers().mapIt(RemotePeerInfo.init(it.peerId, it.addrs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: notConnectedPeers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, check my comments.
# ...public key | ||
var publicKey: PublicKey | ||
discard remotePeerInfo.peerId.extractPublicKey(publicKey) | ||
|
||
if pm.peerStore[AddressBook][remotePeerInfo.peerId] == remotePeerInfo.addrs and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you abstract these bracket access statements (this one and the ones below) with procs? For example:
if pm.peerStore[AddressBook][remotePeerInfo.peerId] == remotePeerInfo.addrs and | |
if pm.addressBook.getByPeerId(remotePeerInfo.peerId) == remotePeerInfo.addrs and |
Note that addressBook
is a "property": a getter proc
, called without parenthesis, returning the field and has a noun name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mm I kind of agree, but we would need quite a lot of boilerplate code to wrap this, one new proc per "book". One of the cool things of nimlibp2p extended peerstore, is to expand their original peerstore with just peerStore[whatever]
.
since it implies some modifications that doesnt per se add functionality, would like to leave it like that by now.
shuffle(disconnectedPeers) | ||
|
||
# limit the amount of paralel dials | ||
let maxParalelDials = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a parametrizing constant? Can you move it outside of the proc? To the top of the module with the rest of the constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, done
@@ -392,6 +392,9 @@ proc startRelay*(node: WakuNode) {.async.} = | |||
protocolMatcher(WakuRelayCodec), | |||
backoffPeriod) | |||
|
|||
# Maintain relay connections | |||
asyncSpawn node.peerManager.relayConnectivityLoop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, we should avoid calling asyncSpawn
within the Waku node module. All asyncSpawn
methods should be spawned by the application (all at the same place), in our case, the wakunode2
module.
Doing it that way will make it simpler to understand how different scheduled tasks interact and compete for CPU time, as there will be no unseen tasks launched.
I know this is not new, and there are more asyncSpawn
scattered within the protocols and the Waku node module code, but it is in our hands to make the codebase debuggable and maintainable. And understanding where coroutines are spawned makes the code debuggable and maintainable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeap, agree. adding it to the todo list.
@@ -897,9 +900,6 @@ proc startKeepalive*(node: WakuNode) = | |||
|
|||
asyncSpawn node.keepaliveLoop(defaultKeepalive) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the case for this statement, for example, an asyncSpawn
within the Waku node module.
info "Relay connectivity loop", | ||
connectedPeers = numConPeers, | ||
targetConnectedPeers = maxConnections, | ||
availableDisconnectedPeers = disconnectedPeers.len |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move this to debug level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imho this log is quite important, and its not spammy. quite common in blockchains to have this constant feedback on connected peers. Also useful to quickly see if there are any problems, lets say that i don't have enough peers, well looking into availableDisconnectedPeers
might help. And always good to see the difference between my target amount of peers and the ones that are actually connected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
One observation about "quite common in blockchains to have this constant feedback on connected peers.":
Logs, exceptions, and other "out of the code execution flow" elements are part of an application's public-facing API. Good API design states that a library module, like the Waku node, should not make assumptions about this matter.
A blockchain node is an application, not a library.
if pm.peerStore[AddressBook][remotePeerInfo.peerId] == remotePeerInfo.addrs and | ||
pm.peerStore[KeyBook][remotePeerInfo.peerId] == publicKey: | ||
# Peer already managed | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For readability reasons, this complex if statement condition should be replaced by a pm.addressBook.isPeerManaged()
proc returning a boolean. If you do that, the comment on line 189 will be unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just tried to implement that but then i realized that then you need to extractPublicKey
twice, because its not a field in the remotePeerInfo
. unless im missing something.
proc pm.addressBook.isPeerManaged(remotePeerInfo): bool =
var publicKey: PublicKey
discard remotePeerInfo.peerId.extractPublicKey(publicKey)
if pm.peerStore[AddressBook][remotePeerInfo.peerId] == remotePeerInfo.addrs and
pm.peerStore[KeyBook][remotePeerInfo.peerId] == publicKey:
# Peer already managed
return true
Why thats not part of remotePeerInfo? well perhaps it should, but beyond this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why thats not part of remotePeerInfo? well perhaps it should, but beyond this PR.
🤷🏼
If I were working in Waku's peer management, I would modify the RemotePeerInfo
type and add it. This is a fundamental issue that should be fixed if it helps improve code readability and reduces the cognitive load of the code.
As a temporary solution until the remote peer type includes the public key, I don't see any problem with extracting the key twice.
@@ -143,3 +143,6 @@ proc selectPeer*(peerStore: PeerStore, proto: string): Option[RemotePeerInfo] = | |||
|
|||
proc getPeersByDirection*(peerStore: PeerStore, direction: Direction): seq[StoredInfo] = | |||
return peerStore.peers().filterIt(it.direction == direction) | |||
|
|||
proc getNotConnectedPeers*(peerStore: PeerStore): seq[StoredInfo] = | |||
return peerStore.peers().filterIt(it.connectedness != Connected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: You can remove the parenthesis from the peers
property, as it is a noun.
return peerStore.peers().filterIt(it.connectedness != Connected) | |
return peerStore.peers.filterIt(it.connectedness != Connected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure fixed.
|
||
# Ensures a healthy amount of connected relay peers | ||
proc relayConnectivityLoop*(pm: PeerManager) {.async.} = | ||
let defaultInterval = chronos.seconds(30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a parametrizing constant? Can you move it outside of the proc? To the top of the module with the rest of the constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup thanks, done.
* feat(networking): add relay connectivity loop * Add unit tests * feat(networking): fix comments * Fix lnsd comments
Related:
conectionLoop
loop that ensures we keep x relay connections #1477Summary:
discv5
loop now only adds peers the the peer store instead of also trying to connect to them.relayConnectivityLoop
ensures healthyRelay
connections according to some criteria, taking peers from the peerstore previosly added bydiscv5
(or other ambient discovery techniques)