Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(networking): integrate gossipsub scoring #1769

Merged
merged 11 commits into from
Jun 6, 2023

Conversation

alrevuelta
Copy link
Contributor

@alrevuelta alrevuelta commented May 30, 2023

closes #1756

Changes:

  • Only allow handling messages containing a valid WakuMessage. Just one type of handler. No more handling seq[byte] and parsing. Waku relay message handler takes directly a WakuMesssage.
  • Add a default validator to all subscribed topics, which enforces being able to decode the message into a valid WakuMessage. If ok the message is accepted, otherwise rejected. With this, "upper layers" now always get a WakuMessage.
  • Integrate gossipsub scoring + trigger dissconnections from bad peers.
  • Set some default safe parameters for gossipsub scoring to use.
  • Write some tests to ensure the feature is integrated ok, where disconnections based on score are triggered.
  • Change the in/out peer relation, enforcing 50/50%. 10% of outgoing peers seems to have some issues in simulations, to be continued.
  • Test in sandbox machine with 200 nodes.

@alrevuelta alrevuelta marked this pull request as ready for review June 1, 2023 07:21
@@ -264,7 +264,7 @@ proc start*(bridge: WakuBridge) {.async.} =

# Always mount relay for bridge.
# `triggerSelf` is false on a `bridge` to avoid duplicates
await bridge.nodev2.mountRelay(triggerSelf = false)
await bridge.nodev2.mountRelay()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the bridge.nodev2.wakuRelay.triggerSelf = false missing here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed!

Copy link
Contributor Author

@alrevuelta alrevuelta Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done b1df0dd

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could set triggerSelf = false inside the mountRelay() itself so that we don't potentially miss that in another places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by now i would say triggerSelf = true should be the default value. the bridge is a particular case.

Copy link
Contributor

@jm-clius jm-clius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Minor nits below, but LGTM! Not approving as I know you want to test the selected parameters first. One more uncertainty is what the performance/memory knock would be if we decode all messages twice, i.e. for both validation and handling. Assumption here is negligible/nothing, but may be worth double checking in simulation.

fut.complete()
return fut
else:
return handler(pubsubTopic, decMsg.get)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
return handler(pubsubTopic, decMsg.get)
return handler(pubsubTopic, decMsg.get())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure b1df0dd

debug "message decode failure", pubsubTopic=pubsubTopic, error=decodeRes.error
return
# rejects messages that are not WakuMessage
proc validator(pubsubTopic: string, message: messages.Message): Future[ValidationResult] {.async.} =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a separate validator for each subscribed pubsubTopic? Why not define this outside the subscribe block, in other words?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tldr: indeed, done b1df0dd

longer version. related to your other comment here I had the validator inside since I was trying to optimize this, to avoid decoding the message twice (or as many times as validators we have). This optimization was similar to the one in nimbus , where the validator acts also as the handler. Meaning that inside the validator, you also call the handler, and then you dont have to register the handle (just pass nil in subscribe) because the validator calls it for you. Like this you dont have to decode the message i) in the validation stage and ii) in the handler stage, just once.

something like:

proc validator(pubsubTopic: string, message: messages.Message): Future[ValidationResult] {.async.} =
  let msg = WakuMessage.decode(message.data)
  if msg.isOk():
    yourHandler(yyy) <----
    return ValidationResult.Accept
  return ValidationResult.Reject

Buut, at the end I discarded this solution as it has some implications. For example, when having multiple validators (rln, signed topic,e tc). This hacked validator containing also the handler would need to be executed always the last, and this hack on top of a hack seemed too much.

I haven't done a detailed analysis on the impact of decoding the message twice, but haven't seen a significant increase in the simulations on any parameter. So at this stage I would say its overoptimizing. Leaving this idea here in case we want to implement it later on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an issue somewhere (status-im/nimbus-eth2#3043) about adding a "user data" field to the Message type, to let the app decode a single time and store the decoded version there.
That will require some trickery since libp2p doesn't know the type of the user data, but feel free to pick that up

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, very interesting about both possible approaches to avoiding double decoding. Agree with doing what is simple and supported in libp2p for now.

)

# see: https://rfc.vac.dev/spec/29/#gossipsub-v10-parameters
const gossipsubParams = GossipSubParams(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: any chance to make this and the other const PascalCase? I know this may lead to symbol clash, so feel free to ignore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed the comment, done b1df0dd

Copy link
Contributor

@jm-clius jm-clius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Minor nits below, but LGTM! Not approving as I know you want to test the selected parameters first. One more uncertainty is what the performance/memory knock would be if we decode all messages twice, i.e. for both validation and handling. Assumption here is negligible/nothing, but may be worth double checking in simulation.

@alrevuelta
Copy link
Contributor Author

Thanks. Minor nits below, but LGTM! Not approving as I know you want to test the selected parameters first. One more uncertainty is what the performance/memory knock would be if we decode all messages twice, i.e. for both validation and handling. Assumption here is negligible/nothing, but may be worth double checking in simulation.

@jm-clius thanks for the comments. Replied to this here. Indeed, seems neglible and left an idea in that comment in case in the future we want to optimize this.

Copy link
Collaborator

@Ivansete-status Ivansete-status left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! This PR is very cool! I just have a doubt re the unsubscribe change.

discard await nodes[0].wakuRelay.publish(topic, urandom(1*(10^3)))

# long wait, must be higher than the configured decayInterval (how often score is updated)
await sleepAsync(20.seconds)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we somehow could enforce a shorter decayInterval so that we can have a faster test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, this is a high delay but would like to leave it as it is, so that we dont end up testing something different than how it looks in reality.

tests/v2/waku_relay/test_wakunode_relay.nim Show resolved Hide resolved

let wakuRelay = node.wakuRelay
wakuRelay.unsubscribe(@[(topic, handler)])
wakuRelay.unsubscribe(topic)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but I don't quite see why we don't need handler now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. this was a leftover. i have replaced unsubscribeAll by unsubcribe. So whenever we call now unsubscribe, we unsubscribe all handlers from the topic, meaning 100% unsubscribing to the topic.

unsubscribing individual handlers seemed very specific and since now we wrap the handler to only accept a WakuMessage, it wasnt trivial to adapt it.

If in the future we need to unsubscribe individual handlers, we can revisit this, but imho it would be better to incentivice people to use a single handler that does everything.

fixed dc70df9

@@ -264,7 +264,7 @@ proc start*(bridge: WakuBridge) {.async.} =

# Always mount relay for bridge.
# `triggerSelf` is false on a `bridge` to avoid duplicates
await bridge.nodev2.mountRelay(triggerSelf = false)
await bridge.nodev2.mountRelay()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could set triggerSelf = false inside the mountRelay() itself so that we don't potentially miss that in another places.

fanoutTTL: chronos.minutes(1),
seenTTL: chronos.minutes(2),

# no gossip is send to peers below this score
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny typo

Suggested change
# no gossip is send to peers below this score
# no gossip is sent to peers below this score

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks dc70df9


# p6: penalizes peers sharing more than threshold ips
ipColocationFactorWeight: -50.0,
ipColocationFactorThreshold: 5.0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that cover in an implicit way the action that we perform to prune peers with big # of ips?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this indeed lowers the peer score based on the ip (with more than 5 ips it kicks in). but note that if 6 peers are behind the same ip and they perform great in other scores, we wont really prune any peer.

note that this doesnt replace our peer prunning based on ips, since this score just applies to peers participating in gossipsub.

let decMsg = WakuMessage.decode(data)
if decMsg.isErr():
# fine if triggerSelf enabled, since validators are bypassed
error "failed to decode WakuMessage, validator passed a wrong message"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
error "failed to decode WakuMessage, validator passed a wrong message"
error "failed to decode WakuMessage, validator passed a wrong message", error = decMsg.error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks dc70df9


ok(w)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal at all but for I personally prefer being explicit and always use return. Just for new incomers to Nim :)

Suggested change
ok(w)
return ok(w)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, i actually prefer to be explicit with this
dc70df9

@alrevuelta
Copy link
Contributor Author

Did some simulations and everything seems fine. No disconnections were triggered and everything was stable.

  • network with 150 nodes, with discv5.
  • injecting valid msgs at a constant rate.

However, when creating 20 attackers (nodes publishing wrong messages affeecting the p4 score) disconnections were triggered from them (as expected) but these nodes were not entirely kicked out of the network. Guess its expected since afaik nim-libp2p doesn't check the score before a peer tries to connect (meaning that a peer gets low score, then its kicked out, then can connect again, kicked out, etc). But would be better to entirely kick these peers out, perhaps out of scope for this PR but something to take into account.

see libp2p_gossipsub_bad_score_disconnection metric:
image


proc unsubscribeAll*(w: WakuRelay, pubsubTopic: PubsubTopic) =
debug "unsubscribeAll", pubsubTopic=pubsubTopic

procCall GossipSub(w).unsubscribeAll(pubsubTopic)


proc publish*(w: WakuRelay, pubsubTopic: PubsubTopic, message: WakuMessage|seq[byte]): Future[int] {.async.} =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the idea is not to return a seq[byte] only WakuMessages in the subscription handler anymore, then to be consistent, you should not allow publishing a seq[byte], only WakuMessages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, done dc70df9

let decMsg = WakuMessage.decode(data)
if decMsg.isErr():
# fine if triggerSelf enabled, since validators are bypassed
error "failed to decode WakuMessage, validator passed a wrong message"
Copy link
Contributor

@LNSD LNSD Jun 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error level can be extremely noisy. Think of a "DoS" scenario. It will blow up the logging system.

Use a debug or a trace log level instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really, since we now enforce that every subscribed topic only gets valid WakuMessages in its handler, we should never enter in if decMsg.isErr(), even in a DoS scenario. thats what the validator was for.

@@ -120,6 +179,12 @@ method stop*(w: WakuRelay) {.async.} =
debug "stop"
await procCall GossipSub(w).stop()

# rejects messages that are not WakuMessage
proc validator(pubsubTopic: string, message: messages.Message): Future[ValidationResult] {.async.} =
let msg = WakuMessage.decode(message.data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimization note:

You don't need to decode a message to know if the message is correct. Although typically Protocol buffer libraries perform both at the same time when decoding, validation and deserialization can be performed independently. For example, using nim-libp2p's protobuf library, you can check if the required fields are present without allocating memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, added a note here: dc70df9 for future optimizations.

Copy link
Contributor

@jm-clius jm-clius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@alrevuelta alrevuelta merged commit 34a9263 into master Jun 6, 2023
15 checks passed
@alrevuelta alrevuelta deleted the integrate-libp2p-scoring branch June 6, 2023 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: integrate gossipsub scoring
6 participants