-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(networking): integrate gossipsub scoring #1769
Conversation
264f792
to
b7ec0b1
Compare
@@ -264,7 +264,7 @@ proc start*(bridge: WakuBridge) {.async.} = | |||
|
|||
# Always mount relay for bridge. | |||
# `triggerSelf` is false on a `bridge` to avoid duplicates | |||
await bridge.nodev2.mountRelay(triggerSelf = false) | |||
await bridge.nodev2.mountRelay() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the bridge.nodev2.wakuRelay.triggerSelf = false
missing here?

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done b1df0dd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could set triggerSelf = false
inside the mountRelay()
itself so that we don't potentially miss that in another places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by now i would say triggerSelf = true
should be the default value. the bridge is a particular case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Minor nits below, but LGTM! Not approving as I know you want to test the selected parameters first. One more uncertainty is what the performance/memory knock would be if we decode all messages twice, i.e. for both validation and handling. Assumption here is negligible/nothing, but may be worth double checking in simulation.
waku/v2/waku_relay/protocol.nim
Outdated
fut.complete() | ||
return fut | ||
else: | ||
return handler(pubsubTopic, decMsg.get) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
return handler(pubsubTopic, decMsg.get) | |
return handler(pubsubTopic, decMsg.get()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure b1df0dd
waku/v2/waku_relay/protocol.nim
Outdated
debug "message decode failure", pubsubTopic=pubsubTopic, error=decodeRes.error | ||
return | ||
# rejects messages that are not WakuMessage | ||
proc validator(pubsubTopic: string, message: messages.Message): Future[ValidationResult] {.async.} = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a separate validator for each subscribed pubsubTopic? Why not define this outside the subscribe
block, in other words?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tldr: indeed, done b1df0dd
longer version. related to your other comment here I had the validator inside since I was trying to optimize this, to avoid decoding the message twice (or as many times as validators we have). This optimization was similar to the one in nimbus , where the validator acts also as the handler. Meaning that inside the validator, you also call the handler, and then you dont have to register the handle (just pass nil in subscribe) because the validator calls it for you. Like this you dont have to decode the message i) in the validation stage and ii) in the handler stage, just once.
something like:
proc validator(pubsubTopic: string, message: messages.Message): Future[ValidationResult] {.async.} =
let msg = WakuMessage.decode(message.data)
if msg.isOk():
yourHandler(yyy) <----
return ValidationResult.Accept
return ValidationResult.Reject
Buut, at the end I discarded this solution as it has some implications. For example, when having multiple validators (rln, signed topic,e tc). This hacked validator containing also the handler would need to be executed always the last, and this hack on top of a hack seemed too much.
I haven't done a detailed analysis on the impact of decoding the message twice, but haven't seen a significant increase in the simulations on any parameter. So at this stage I would say its overoptimizing. Leaving this idea here in case we want to implement it later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have an issue somewhere (status-im/nimbus-eth2#3043) about adding a "user data" field to the Message type, to let the app decode a single time and store the decoded version there.
That will require some trickery since libp2p doesn't know the type of the user data, but feel free to pick that up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, very interesting about both possible approaches to avoiding double decoding. Agree with doing what is simple and supported in libp2p for now.
waku/v2/waku_relay/protocol.nim
Outdated
) | ||
|
||
# see: https://rfc.vac.dev/spec/29/#gossipsub-v10-parameters | ||
const gossipsubParams = GossipSubParams( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: any chance to make this and the other const PascalCase
? I know this may lead to symbol clash, so feel free to ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed the comment, done b1df0dd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Minor nits below, but LGTM! Not approving as I know you want to test the selected parameters first. One more uncertainty is what the performance/memory knock would be if we decode all messages twice, i.e. for both validation and handling. Assumption here is negligible/nothing, but may be worth double checking in simulation.
3d8ba07
to
b1df0dd
Compare
@jm-clius thanks for the comments. Replied to this here. Indeed, seems neglible and left an idea in that comment in case in the future we want to optimize this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! This PR is very cool! I just have a doubt re the unsubscribe
change.
discard await nodes[0].wakuRelay.publish(topic, urandom(1*(10^3))) | ||
|
||
# long wait, must be higher than the configured decayInterval (how often score is updated) | ||
await sleepAsync(20.seconds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we somehow could enforce a shorter decayInterval
so that we can have a faster test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, this is a high delay but would like to leave it as it is, so that we dont end up testing something different than how it looks in reality.
waku/v2/node/waku_node.nim
Outdated
|
||
let wakuRelay = node.wakuRelay | ||
wakuRelay.unsubscribe(@[(topic, handler)]) | ||
wakuRelay.unsubscribe(topic) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry but I don't quite see why we don't need handler
now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. this was a leftover. i have replaced unsubscribeAll by unsubcribe. So whenever we call now unsubscribe, we unsubscribe all handlers from the topic, meaning 100% unsubscribing to the topic.
unsubscribing individual handlers seemed very specific and since now we wrap the handler to only accept a WakuMessage, it wasnt trivial to adapt it.
If in the future we need to unsubscribe individual handlers, we can revisit this, but imho it would be better to incentivice people to use a single handler that does everything.
fixed dc70df9
@@ -264,7 +264,7 @@ proc start*(bridge: WakuBridge) {.async.} = | |||
|
|||
# Always mount relay for bridge. | |||
# `triggerSelf` is false on a `bridge` to avoid duplicates | |||
await bridge.nodev2.mountRelay(triggerSelf = false) | |||
await bridge.nodev2.mountRelay() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could set triggerSelf = false
inside the mountRelay()
itself so that we don't potentially miss that in another places.
waku/v2/waku_relay/protocol.nim
Outdated
fanoutTTL: chronos.minutes(1), | ||
seenTTL: chronos.minutes(2), | ||
|
||
# no gossip is send to peers below this score |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny typo
# no gossip is send to peers below this score | |
# no gossip is sent to peers below this score |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks dc70df9
|
||
# p6: penalizes peers sharing more than threshold ips | ||
ipColocationFactorWeight: -50.0, | ||
ipColocationFactorThreshold: 5.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that cover in an implicit way the action that we perform to prune peers with big # of ips?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this indeed lowers the peer score based on the ip (with more than 5 ips it kicks in). but note that if 6 peers are behind the same ip and they perform great in other scores, we wont really prune any peer.
note that this doesnt replace our peer prunning based on ips, since this score just applies to peers participating in gossipsub.
waku/v2/waku_relay/protocol.nim
Outdated
let decMsg = WakuMessage.decode(data) | ||
if decMsg.isErr(): | ||
# fine if triggerSelf enabled, since validators are bypassed | ||
error "failed to decode WakuMessage, validator passed a wrong message" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error "failed to decode WakuMessage, validator passed a wrong message" | |
error "failed to decode WakuMessage, validator passed a wrong message", error = decMsg.error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks dc70df9
waku/v2/waku_relay/protocol.nim
Outdated
|
||
ok(w) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal at all but for I personally prefer being explicit and always use return
. Just for new incomers to Nim :)
ok(w) | |
return ok(w) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, i actually prefer to be explicit with this
dc70df9
Did some simulations and everything seems fine. No disconnections were triggered and everything was stable.
However, when creating 20 attackers (nodes publishing wrong messages affeecting the p4 score) disconnections were triggered from them (as expected) but these nodes were not entirely kicked out of the network. Guess its expected since afaik nim-libp2p doesn't check the score before a peer tries to connect (meaning that a peer gets low score, then its kicked out, then can connect again, kicked out, etc). But would be better to entirely kick these peers out, perhaps out of scope for this PR but something to take into account. |
waku/v2/waku_relay/protocol.nim
Outdated
|
||
proc unsubscribeAll*(w: WakuRelay, pubsubTopic: PubsubTopic) = | ||
debug "unsubscribeAll", pubsubTopic=pubsubTopic | ||
|
||
procCall GossipSub(w).unsubscribeAll(pubsubTopic) | ||
|
||
|
||
proc publish*(w: WakuRelay, pubsubTopic: PubsubTopic, message: WakuMessage|seq[byte]): Future[int] {.async.} = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the idea is not to return a seq[byte]
only WakuMessage
s in the subscription handler anymore, then to be consistent, you should not allow publishing a seq[byte]
, only WakuMessage
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, done dc70df9
waku/v2/waku_relay/protocol.nim
Outdated
let decMsg = WakuMessage.decode(data) | ||
if decMsg.isErr(): | ||
# fine if triggerSelf enabled, since validators are bypassed | ||
error "failed to decode WakuMessage, validator passed a wrong message" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error
level can be extremely noisy. Think of a "DoS" scenario. It will blow up the logging system.
Use a debug
or a trace
log level instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really, since we now enforce that every subscribed topic only gets valid WakuMessages in its handler, we should never enter in if decMsg.isErr()
, even in a DoS scenario. thats what the validator was for.
@@ -120,6 +179,12 @@ method stop*(w: WakuRelay) {.async.} = | |||
debug "stop" | |||
await procCall GossipSub(w).stop() | |||
|
|||
# rejects messages that are not WakuMessage | |||
proc validator(pubsubTopic: string, message: messages.Message): Future[ValidationResult] {.async.} = | |||
let msg = WakuMessage.decode(message.data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimization note:
You don't need to decode a message to know if the message is correct. Although typically Protocol buffer libraries perform both at the same time when decoding, validation and deserialization can be performed independently. For example, using nim-libp2p's protobuf library, you can check if the required fields are present without allocating memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting, added a note here: dc70df9 for future optimizations.
b1df0dd
to
dc70df9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
closes #1756
Changes:
WakuMessage
. Just one type of handler. No more handlingseq[byte]
and parsing. Waku relay message handler takes directly aWakuMesssage
.WakuMessage
. If ok the message is accepted, otherwise rejected. With this, "upper layers" now always get aWakuMessage
.