Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite netmap reload loop due to client-side expiry handling #7193

Closed
mihaip opened this issue Feb 6, 2023 · 1 comment · Fixed by #7194
Closed

Infinite netmap reload loop due to client-side expiry handling #7193

mihaip opened this issue Feb 6, 2023 · 1 comment · Fixed by #7194
Assignees

Comments

@mihaip
Copy link
Contributor

mihaip commented Feb 6, 2023

There's CPU profile from a user (ticket) that shows a lot of time spent doing netmap updates:

image

Based on logs, it appears to be related to client-side expiration handling (#6937), e.g. there so many logs that they're getting throttled:

2023-01-31 23:38:38.729611 +0000 UTC: [RATELIMIT] format("setClientStatus: netmap expiry timer triggered after %v") (48618 dropped)
andrew-d added a commit that referenced this issue Feb 6, 2023
We now handle the case where the NetworkMap.SelfNode has already expired
and do not return an expiry time in the past (which causes an ~infinite
loop of timers to fire).

Additionally, we now add an explicit check to ensure that the next
expiry time is never before the current local-to-the-system time, to
ensure that we don't end up in a similar situation due to clock skew.

Finally, we add more tests for this logic to ensure that we don't
regress on these edge cases.

Fixes #7193

Change-Id: Iaf8e3d83be1d133a7aab7f8d62939e508cc53f9c
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
andrew-d added a commit that referenced this issue Feb 6, 2023
We now handle the case where the NetworkMap.SelfNode has already expired
and do not return an expiry time in the past (which causes an ~infinite
loop of timers to fire).

Additionally, we now add an explicit check to ensure that the next
expiry time is never before the current local-to-the-system time, to
ensure that we don't end up in a similar situation due to clock skew.

Finally, we add more tests for this logic to ensure that we don't
regress on these edge cases.

Fixes #7193

Change-Id: Iaf8e3d83be1d133a7aab7f8d62939e508cc53f9c
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
andrew-d added a commit that referenced this issue Feb 7, 2023
We now handle the case where the NetworkMap.SelfNode has already expired
and do not return an expiry time in the past (which causes an ~infinite
loop of timers to fire).

Additionally, we now add an explicit check to ensure that the next
expiry time is never before the current local-to-the-system time, to
ensure that we don't end up in a similar situation due to clock skew.

Finally, we add more tests for this logic to ensure that we don't
regress on these edge cases.

Fixes #7193

Change-Id: Iaf8e3d83be1d133a7aab7f8d62939e508cc53f9c
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
andrew-d added a commit that referenced this issue Feb 7, 2023
We now handle the case where the NetworkMap.SelfNode has already expired
and do not return an expiry time in the past (which causes an ~infinite
loop of timers to fire).

Additionally, we now add an explicit check to ensure that the next
expiry time is never before the current local-to-the-system time, to
ensure that we don't end up in a similar situation due to clock skew.

Finally, we add more tests for this logic to ensure that we don't
regress on these edge cases.

Fixes #7193

Change-Id: Iaf8e3d83be1d133a7aab7f8d62939e508cc53f9c
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
@andrew-d
Copy link
Member

andrew-d commented Feb 7, 2023

This is now released in an unstable build of Tailscale - v1.37.74

andrew-d added a commit that referenced this issue Feb 8, 2023
We now handle the case where the NetworkMap.SelfNode has already expired
and do not return an expiry time in the past (which causes an ~infinite
loop of timers to fire).

Additionally, we now add an explicit check to ensure that the next
expiry time is never before the current local-to-the-system time, to
ensure that we don't end up in a similar situation due to clock skew.

Finally, we add more tests for this logic to ensure that we don't
regress on these edge cases.

Fixes #7193

Change-Id: Iaf8e3d83be1d133a7aab7f8d62939e508cc53f9c
Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
(cherry picked from commit 6d84f34)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants