Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(rln-relay): gracefully handle chain forks #1623

Merged
merged 5 commits into from
Mar 31, 2023
Merged

Conversation

rymnc
Copy link
Contributor

@rymnc rymnc commented Mar 28, 2023

Fixes #1338
This PR handles events that have the "removed" field set to true, which indicates that the event was removed from the canonical chain (it was included in an uncle block)

To handle this appropriately, we have to maintain a buffer of roots apart from the existing window of roots, to backfill valid roots when an uncle block has been detected. If this isn't done, then messages constructed in a valid state of the tree will be considered invalid, and dropped.

The length of the root buffer mentioned is dependent on the average reorg depth per chain. Should be updated accordingly.

@rymnc rymnc self-assigned this Mar 28, 2023
@rymnc rymnc added this to the Release 0.17.0 milestone Mar 28, 2023
Comment on lines +256 to +261
proc removeMembers*(rlnInstance: ptr RLN, indices: seq[MembershipIndex]): bool =
for index in indices:
let deletion_success = delete_member(rlnInstance, index)
if not deletion_success:
return false
return true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be an atomic operation in zerokit soon

@rymnc rymnc marked this pull request as ready for review March 28, 2023 15:44
Copy link
Contributor

@jm-clius jm-clius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think some procs that do a lot of different things (e.g. all the processing that happens per event when calling getEvents) is getting quite complex to follow. It may not be trivial, but I'd consider in future being very explicit about each processing step and extracting into separate procs as much as possible.

Copy link
Contributor

@LNSD LNSD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ✅

@@ -246,17 +246,20 @@ suite "Onchain group manager":

asyncTest "startGroupSync: should fetch history correctly":
let manager = await setup()
let credentials = generateCredentials(manager.rlnInstance, 5)
const credentialCount = 6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The const keyword should be only used at "package level". Here, a let is sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used const since it can be evaluated at compile time, is that okay?

var futures = [newFuture[void](), newFuture[void](), newFuture[void](), newFuture[void](), newFuture[void]()]

proc generateCallback(futs: array[0..4, Future[system.void]], credentials: seq[IdentityCredential]): OnRegisterCallback =
type VoidFuturesArray = array[0..credentialCount - 1, Future[void]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather avoid defining types in the middle of a test. I think the right place for type aliases and new types is the "package level".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type would be dependant on the credentialCount, which is set at the top of the test. Would you just prefer I hardcode it, or not use type aliases altogether?

Copy link
Contributor

@alrevuelta alrevuelta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I come late to the party.

This PR handles events that have the "removed" field set to true, which indicates that the event was removed from the canonical chain (it was included in an uncle block)

Can we just rely on finalized blocks to avoid this kind of problems, which should also reduce the complexity? Finalized blocks are not reorganized nor modified (unless 66% of the chain changes their mind, which is almost impossible, and would require tons of slashings)

Its one of the cool thing of PoS vs PoW, PoS has some finality guarantees, wether in PoW anything could be reorged in theory.

@rymnc
Copy link
Contributor Author

rymnc commented Mar 31, 2023

https://discord.com/channels/613988663034118151/616299964242460682/1038023309960433674 waiting 12 minutes while you can be spammed is not good ux, this should serve better than waiting :)

edit: while you can remove the offender from your merkle tree, that would lead to all other messages coming in being invalid since you have an indeterminate state of the tree (for 12 mins)

@alrevuelta
Copy link
Contributor

@rymnc I agree that during these (around 12 minutes) you wont be able to slash onchain, but you will be able to detect spamming which allows us to i) Reject message (protecting the network) and ii) lower scoring, see P4 (protecting each node). Then after these 12 minutes when everything is settled, then that node can't even send messages since its removed from the tree. Perhaps im missing something?

waiting 12 minutes while you can be spammed is not good ux, this should serve better than waiting :)
so I wouldnt say that you are spammed during this 12 minutes. Actually RLN would work without slashing/onchain-stuff, since its also a way of just detecting spam. or?

@rymnc
Copy link
Contributor Author

rymnc commented Mar 31, 2023

Yeah, we have to validate proofs based on the state of the tree, i.e when a peer sends a message, they attach a RateLimitProof, which was generated with a set of elements which leads to their leaf in the tree. This constraint ensures that everyone maintains the same merkle tree locally, i.e if I had a different looking tree, the proofs generated would be invalid since they don't refer to a state the other peer considers "valid". To overcome network delay, etc, we include a window of acceptable roots, which changes per block (if a member registration/slashing is detected within the block).
When we remove a leaf from the tree locally, this triggers a change in the window of acceptable roots, which pops the oldest, and pushes this new root.

In your suggestion, if we slash them locally, this would lead to an invalid root being a part of the window (albeit, allowing other peers to still generate valid proofs from the last finalized block). This is a non-issue, since validating proofs will still work (sorry about prev comment)

Also, someone wouldn't want to wait 12 minutes before they can start using waku with rln. This would be the main driver for this feature.

@rymnc rymnc merged commit 00a3812 into master Mar 31, 2023
12 checks passed
@rymnc rymnc deleted the reconcile-chain-forking branch March 31, 2023 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

chore(rln-relay): Gracefully handle chain forks
4 participants