Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

groups: fix role deletion DDOS #3355

Merged
merged 11 commits into from
Mar 27, 2024
Merged

groups: fix role deletion DDOS #3355

merged 11 commits into from
Mar 27, 2024

Conversation

arthyn
Copy link
Member

@arthyn arthyn commented Mar 19, 2024

This PR should fix LAND-1698 by doing all our role removals in one event rather than an event for each member of the group. It also prevents channels from attempting to resubscribe to channels that are no longer associated with the group. This is untested, but should fix the issues we've been seeing.

todos:

  • follow-up PR with proper tests
  • follow-up PR to rip out old agent logic (heap/diary) that isn't involved in migration

As far as we understand the issue with role deletion was as follows:
host
delete role in UI (1 event) -> verify group cabal(role) data -> check every member's roles for old roles -> poke self for each member that needs a role change (5k events) -> disseminate events -> each event goes to 5k subscribers ... eventually receive resubscribes for channels

subscribers
receive role change event for each member (5k events) -> recheck permissions of each channel -> for each channel that we can read, try to resubscribe if subscription is missing (5-10 events)

So we can calculate these by multiplying for each step. Also some of these events were ran multiple times:
5 tries * 5000 role change events * 5000 ships to distribute each event to = 125 million facts needing to be sent from nibset

meanwhile each time a ship hears any of those events it sends back 5 more events trying to resubscribe to deleted channels so 125m * 5 = 625 million

so in total facts/messages:

  • distributed: 125m + 625 m (watch-acks/nacks) = 750m
  • heard: 625m

which means nibset has been chewing through 1.375 b events. This makes me think we should have diagrams for each event flow to make sure we never cascade like this.

PR Checklist

  • Includes changes to desk files
  • Describes how you tested the PR locally (test ship vs livenet)
  • If a new feature, includes automated tests
  • Comments added anywhere logic may be confusing without context

Copy link

linear bot commented Mar 19, 2024

desk/app/channels.hoon Outdated Show resolved Hide resolved
desk/app/channels.hoon Outdated Show resolved Hide resolved
desk/app/groups.hoon Outdated Show resolved Hide resolved
desk/app/groups.hoon Outdated Show resolved Hide resolved
@arthyn arthyn requested a review from Fang- March 22, 2024 19:37
Copy link
Member

@Fang- Fang- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better already! Some further comments on implementation details, if you'll forgive my nitpicking.

desk/app/groups.hoon Outdated Show resolved Hide resolved
desk/app/groups.hoon Outdated Show resolved Hide resolved
desk/app/groups.hoon Outdated Show resolved Hide resolved
@arthyn arthyn requested a review from Fang- March 25, 2024 17:02
@arthyn arthyn changed the base branch from develop to staging March 25, 2024 17:04
Copy link
Member

@Fang- Fang- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will do!

@arthyn arthyn merged commit 823db5b into staging Mar 27, 2024
1 check passed
@arthyn arthyn deleted the hm/fix-role-deletions-ddos branch March 27, 2024 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants