Skip to content

ipn, tstime : add opt in rate limiting for netmap updates on the IPN bus#14119

Closed
barnstar wants to merge 1 commit intomainfrom
jonathan/netmap_ratelimiter
Closed

ipn, tstime : add opt in rate limiting for netmap updates on the IPN bus#14119
barnstar wants to merge 1 commit intomainfrom
jonathan/netmap_ratelimiter

Conversation

@barnstar
Copy link
Copy Markdown
Member

updates tailscale/corp#24553

Adds opt-in rate limiting to limit netmap updates to, at most, one every 3 seconds when the client includes the NotifyRateLimitNetmaps option in the ipn bus watcher opts. This should mitigate issues with excessive
memory and CPU usage in clients on large, busy tailnets.

This is not a complete of comprehensive fix, but it requires the addition of only a single flag by clients and should mitigate the issue of runaway memory/CPU consumption while we rethink how we handle of netmap updates on the ipn bus more generally.

updates tailscale/corp#24553

Adds opt-in rate limiting to limit netmap updates to, at most, one every
3 seconds when the client includes the NotifyRateLimitNetmaps option
in the ipn bus watcher opts.   This should mitigate issues with excessive
memory and CPU usage in clients on large, busy tailnets.

Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
@barnstar barnstar force-pushed the jonathan/netmap_ratelimiter branch from 3aa809a to b24b36f Compare November 15, 2024 20:14
Copy link
Copy Markdown
Member

@bradfitz bradfitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like where this is going but I think this a bit more complicated than it needs to be, and I can't convince myself it's correct as a result.

Comment on lines +2840 to +2845
if mask&ipn.NotifyRateLimitNetmaps != 0 {
b.setNetmapRateLimit(ipn.DefaultNetmapRateLimit)
} else {
b.setNetmapRateLimit(0)
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is kinda weird making a per-session WatchOpt bit affect the LocalBackend globally.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about something like #14120 ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack.. Yeah, I'll toss this and run some tests using your patch. You're correct on the out-of-order comment below. I just spotted a couple of cases where this gives odd results.

// or not the netmap was sent or cancelled respectively. A nil return value indicates that the netmap
// was sent immediately. The returned value is primarily useful for testing and you can safely ignore
// it and just call this method at will.
func (b *LocalBackend) sendNetmap(nm *netmap.NetworkMap) chan bool {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing uses this returned channel except for tests?

// deferredNetmapCancel is used to cancel deferred netmap updates which
// were initially blocked due to rate limiting. We always attempt to send the latest
// netmap once the rate limiter allows it, discarding any pending netmaps.
deferredNetmapCancel context.CancelFunc
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// or nil

Comment on lines +1699 to +1700
// b.mu must be held
func (b *LocalBackend) setNetmapRateLimit(interval time.Duration) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add "Locked" suffix to the method name


// setNetmapRateLimit Sets the minimum interval between netmap updates on the IPN Bus (in seconds)
// If interval is 0 or negative, the rate limiter is disabled. Netmap rate limiting is
// disabled by default
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trailing period

case <-ctx.Done():
c <- false
}
close(c)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's only one value ever sent, right? you don't need to close channels. a close is just a special type of send saying it's all done. But if the contract is there's only one value, you can just omit this


var _ controlclient.NetmapDeltaUpdater = (*LocalBackend)(nil)

// UpdateNetmapDelta implements controlclient.NetmapDeltaUpdater.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restore this?

go func() {
select {
case <-time.After(delay):
b.send(notify)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't convince myself that this goroutine's send can't get scheduled out of order with another send, and then push down IPN bus updates out of order.

Comment on lines +5086 to +5088
if b.deferredNetmapCancel != nil {
b.deferredNetmapCancel()
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all state changes need to flush any pending message and then send the State update.

Comment on lines +1724 to +1725
b.deferredNetmapCancel()
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to wait for it to be done after this.

bradfitz added a commit that referenced this pull request Nov 15, 2024
Limit spamming GUIs with boring updates to once in 3 seconds, unless
the notification is relatively interesting and the GUI should update
immediately.

This is basically @barnstar's #14119 but with the logic moved to be
per-watch-session (since the bit is per session), rather than
globally. And this distinguishes notable Notify messages (such as
state changes) and makes them send immediately.

Updates tailscale/corp#24553

Change-Id: I79cac52cce85280ce351e65e76ea11e107b00b49
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
@barnstar
Copy link
Copy Markdown
Member Author

Closing in favour of #14120

@barnstar barnstar closed this Nov 15, 2024
bradfitz added a commit that referenced this pull request Nov 16, 2024
Limit spamming GUIs with boring updates to once in 3 seconds, unless
the notification is relatively interesting and the GUI should update
immediately.

This is basically @barnstar's #14119 but with the logic moved to be
per-watch-session (since the bit is per session), rather than
globally. And this distinguishes notable Notify messages (such as
state changes) and makes them send immediately.

Updates tailscale/corp#24553

Change-Id: I79cac52cce85280ce351e65e76ea11e107b00b49
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
bradfitz added a commit that referenced this pull request Nov 16, 2024
Limit spamming GUIs with boring updates to once in 3 seconds, unless
the notification is relatively interesting and the GUI should update
immediately.

This is basically @barnstar's #14119 but with the logic moved to be
per-watch-session (since the bit is per session), rather than
globally. And this distinguishes notable Notify messages (such as
state changes) and makes them send immediately.

Updates tailscale/corp#24553

Change-Id: I79cac52cce85280ce351e65e76ea11e107b00b49
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
bradfitz added a commit that referenced this pull request Nov 16, 2024
Limit spamming GUIs with boring updates to once in 3 seconds, unless
the notification is relatively interesting and the GUI should update
immediately.

This is basically @barnstar's #14119 but with the logic moved to be
per-watch-session (since the bit is per session), rather than
globally. And this distinguishes notable Notify messages (such as
state changes) and makes them send immediately.

Updates tailscale/corp#24553

Change-Id: I79cac52cce85280ce351e65e76ea11e107b00b49
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
bradfitz added a commit that referenced this pull request Nov 18, 2024
Limit spamming GUIs with boring updates to once in 3 seconds, unless
the notification is relatively interesting and the GUI should update
immediately.

This is basically @barnstar's #14119 but with the logic moved to be
per-watch-session (since the bit is per session), rather than
globally. And this distinguishes notable Notify messages (such as
state changes) and makes them send immediately.

Updates tailscale/corp#24553

Change-Id: I79cac52cce85280ce351e65e76ea11e107b00b49
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
@barnstar barnstar deleted the jonathan/netmap_ratelimiter branch April 1, 2025 13:15
thirdeyenick pushed a commit to ninech/tailscale that referenced this pull request Jul 2, 2025
Limit spamming GUIs with boring updates to once in 3 seconds, unless
the notification is relatively interesting and the GUI should update
immediately.

This is basically @barnstar's tailscale#14119 but with the logic moved to be
per-watch-session (since the bit is per session), rather than
globally. And this distinguishes notable Notify messages (such as
state changes) and makes them send immediately.

Updates tailscale/corp#24553

Change-Id: I79cac52cce85280ce351e65e76ea11e107b00b49
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants