ipn, tstime : add opt in rate limiting for netmap updates on the IPN bus#14119
ipn, tstime : add opt in rate limiting for netmap updates on the IPN bus#14119
Conversation
updates tailscale/corp#24553 Adds opt-in rate limiting to limit netmap updates to, at most, one every 3 seconds when the client includes the NotifyRateLimitNetmaps option in the ipn bus watcher opts. This should mitigate issues with excessive memory and CPU usage in clients on large, busy tailnets. Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
3aa809a to
b24b36f
Compare
bradfitz
left a comment
There was a problem hiding this comment.
I like where this is going but I think this a bit more complicated than it needs to be, and I can't convince myself it's correct as a result.
| if mask&ipn.NotifyRateLimitNetmaps != 0 { | ||
| b.setNetmapRateLimit(ipn.DefaultNetmapRateLimit) | ||
| } else { | ||
| b.setNetmapRateLimit(0) | ||
| } | ||
|
|
There was a problem hiding this comment.
this is kinda weird making a per-session WatchOpt bit affect the LocalBackend globally.
There was a problem hiding this comment.
Ack.. Yeah, I'll toss this and run some tests using your patch. You're correct on the out-of-order comment below. I just spotted a couple of cases where this gives odd results.
| // or not the netmap was sent or cancelled respectively. A nil return value indicates that the netmap | ||
| // was sent immediately. The returned value is primarily useful for testing and you can safely ignore | ||
| // it and just call this method at will. | ||
| func (b *LocalBackend) sendNetmap(nm *netmap.NetworkMap) chan bool { |
There was a problem hiding this comment.
nothing uses this returned channel except for tests?
| // deferredNetmapCancel is used to cancel deferred netmap updates which | ||
| // were initially blocked due to rate limiting. We always attempt to send the latest | ||
| // netmap once the rate limiter allows it, discarding any pending netmaps. | ||
| deferredNetmapCancel context.CancelFunc |
| // b.mu must be held | ||
| func (b *LocalBackend) setNetmapRateLimit(interval time.Duration) { |
There was a problem hiding this comment.
add "Locked" suffix to the method name
|
|
||
| // setNetmapRateLimit Sets the minimum interval between netmap updates on the IPN Bus (in seconds) | ||
| // If interval is 0 or negative, the rate limiter is disabled. Netmap rate limiting is | ||
| // disabled by default |
| case <-ctx.Done(): | ||
| c <- false | ||
| } | ||
| close(c) |
There was a problem hiding this comment.
there's only one value ever sent, right? you don't need to close channels. a close is just a special type of send saying it's all done. But if the contract is there's only one value, you can just omit this
|
|
||
| var _ controlclient.NetmapDeltaUpdater = (*LocalBackend)(nil) | ||
|
|
||
| // UpdateNetmapDelta implements controlclient.NetmapDeltaUpdater. |
| go func() { | ||
| select { | ||
| case <-time.After(delay): | ||
| b.send(notify) |
There was a problem hiding this comment.
I can't convince myself that this goroutine's send can't get scheduled out of order with another send, and then push down IPN bus updates out of order.
| if b.deferredNetmapCancel != nil { | ||
| b.deferredNetmapCancel() | ||
| } |
There was a problem hiding this comment.
I think all state changes need to flush any pending message and then send the State update.
| b.deferredNetmapCancel() | ||
| } |
There was a problem hiding this comment.
you need to wait for it to be done after this.
Limit spamming GUIs with boring updates to once in 3 seconds, unless the notification is relatively interesting and the GUI should update immediately. This is basically @barnstar's #14119 but with the logic moved to be per-watch-session (since the bit is per session), rather than globally. And this distinguishes notable Notify messages (such as state changes) and makes them send immediately. Updates tailscale/corp#24553 Change-Id: I79cac52cce85280ce351e65e76ea11e107b00b49 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
|
Closing in favour of #14120 |
Limit spamming GUIs with boring updates to once in 3 seconds, unless the notification is relatively interesting and the GUI should update immediately. This is basically @barnstar's #14119 but with the logic moved to be per-watch-session (since the bit is per session), rather than globally. And this distinguishes notable Notify messages (such as state changes) and makes them send immediately. Updates tailscale/corp#24553 Change-Id: I79cac52cce85280ce351e65e76ea11e107b00b49 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Limit spamming GUIs with boring updates to once in 3 seconds, unless the notification is relatively interesting and the GUI should update immediately. This is basically @barnstar's #14119 but with the logic moved to be per-watch-session (since the bit is per session), rather than globally. And this distinguishes notable Notify messages (such as state changes) and makes them send immediately. Updates tailscale/corp#24553 Change-Id: I79cac52cce85280ce351e65e76ea11e107b00b49 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Limit spamming GUIs with boring updates to once in 3 seconds, unless the notification is relatively interesting and the GUI should update immediately. This is basically @barnstar's #14119 but with the logic moved to be per-watch-session (since the bit is per session), rather than globally. And this distinguishes notable Notify messages (such as state changes) and makes them send immediately. Updates tailscale/corp#24553 Change-Id: I79cac52cce85280ce351e65e76ea11e107b00b49 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Limit spamming GUIs with boring updates to once in 3 seconds, unless the notification is relatively interesting and the GUI should update immediately. This is basically @barnstar's #14119 but with the logic moved to be per-watch-session (since the bit is per session), rather than globally. And this distinguishes notable Notify messages (such as state changes) and makes them send immediately. Updates tailscale/corp#24553 Change-Id: I79cac52cce85280ce351e65e76ea11e107b00b49 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Limit spamming GUIs with boring updates to once in 3 seconds, unless the notification is relatively interesting and the GUI should update immediately. This is basically @barnstar's tailscale#14119 but with the logic moved to be per-watch-session (since the bit is per session), rather than globally. And this distinguishes notable Notify messages (such as state changes) and makes them send immediately. Updates tailscale/corp#24553 Change-Id: I79cac52cce85280ce351e65e76ea11e107b00b49 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
updates tailscale/corp#24553
Adds opt-in rate limiting to limit netmap updates to, at most, one every 3 seconds when the client includes the NotifyRateLimitNetmaps option in the ipn bus watcher opts. This should mitigate issues with excessive
memory and CPU usage in clients on large, busy tailnets.
This is not a complete of comprehensive fix, but it requires the addition of only a single flag by clients and should mitigate the issue of runaway memory/CPU consumption while we rethink how we handle of netmap updates on the ipn bus more generally.