-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
magicsock: if STUN failed to send before, rebind before STUNning again. #3014
Conversation
cfd952f
to
d530e6d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks right, worth having @danderson look.
net/netcheck/netcheck.go
Outdated
IPv6 bool // IPv6 works (send & receive) | ||
IPv6CanSend bool // IPv6 works (send) | ||
IPv4 bool // IPv4 works (send & receive) | ||
IPv4CanSend bool // IPv4 works (send) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarify whether these new CanSend fields are any vs all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically all four of them are "any" and not "all".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated for more verbosity.
On iOS (and possibly other platforms), sometimes our UDP socket would get stuck in a state where it was bound to an invalid interface (or no interface) after a network reconfiguration. We can detect this by actually checking the error codes from sending our STUN packets. If we completely fail to send any STUN packets, we know something is very broken. So on the next STUN attempt, let's rebind the UDP socket to try to correct any problems. This fixes a problem where iOS would sometimes get stuck using DERP instead of direct connections until the backend was restarted. Fixes #2994 Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
d530e6d
to
70b9501
Compare
#3014 added a rebind on STUN failure, which means there can now be a call in tailscale.com/wgengine/magicsock.(*RebindingUDPConn).ReadFromNetaddr at the end of the test, waiting for a STUN response which will never arrive. The WG scaffolding to call bind.Close() is not present in this these tests, so ensure it gets called explicitly. Signed-off-by: Denton Gentry <dgentry@tailscale.com>
#3014 added a rebind on STUN failure, which means there can now be a tailscale.com/wgengine/magicsock.(*RebindingUDPConn).ReadFromNetaddr in progress at the end of the test, waiting for a STUN response which will never arrive. This causes a test failure due to the resource leak. For whatever reason, it mostly fails with Windows. The WG scaffolding to call bind.Close() is not present in this these tests, so ensure it gets called explicitly. Signed-off-by: Denton Gentry <dgentry@tailscale.com>
#3014 added a rebind on STUN failure, which means there can now be a tailscale.com/wgengine/magicsock.(*RebindingUDPConn).ReadFromNetaddr in progress at the end of the test, waiting for a STUN response which will never arrive. This causes a test failure due to the resource leak. For whatever reason, it mostly fails with Windows. The WG scaffolding to call bind.Close() is not present in this these tests, so ensure it gets called explicitly. Signed-off-by: Denton Gentry <dgentry@tailscale.com>
#3014 added a rebind on STUN failure, which means there can now be a tailscale.com/wgengine/magicsock.(*RebindingUDPConn).ReadFromNetaddr in progress at the end of the test, waiting for a STUN response which will never arrive. This causes a test failure due to the resource leak. For whatever reason, it mostly fails with Windows. The WG scaffolding to call bind.Close() is not present in this these tests, so ensure it gets called explicitly. Signed-off-by: Denton Gentry <dgentry@tailscale.com>
#3014 added a rebind on STUN failure, which means there can now be a tailscale.com/wgengine/magicsock.(*RebindingUDPConn).ReadFromNetaddr in progress at the end of the test, waiting for a STUN response which will never arrive. This causes a test failure due to the resource leak. For whatever reason, it mostly fails with Windows. The WG scaffolding to call bind.Close() is not present in this these tests, so ensure it gets called explicitly. Signed-off-by: Denton Gentry <dgentry@tailscale.com>
#3014 added a rebind on STUN failure, which means there can now be a tailscale.com/wgengine/magicsock.(*RebindingUDPConn).ReadFromNetaddr in progress at the end of the test, waiting for a STUN response which will never arrive. This causes a test failure due to the resource leak. For whatever reason, it mostly fails with Windows. The WG scaffolding to call bind.Close() is not present in this these tests, so ensure it gets called explicitly. Signed-off-by: Denton Gentry <dgentry@tailscale.com>
#3014 added a rebind on STUN failure, which means there can now be a tailscale.com/wgengine/magicsock.(*RebindingUDPConn).ReadFromNetaddr in progress at the end of the test, waiting for a STUN response which will never arrive. This causes a test failure due to the resource leak. For whatever reason, it mostly fails with Windows. The WG scaffolding to call bind.Close() is not present in this these tests, so ensure it gets called explicitly. Signed-off-by: Denton Gentry <dgentry@tailscale.com>
#3014 added a rebind on STUN failure, which means there can now be a tailscale.com/wgengine/magicsock.(*RebindingUDPConn).ReadFromNetaddr in progress at the end of the test, waiting for a STUN response which will never arrive. This causes a test failure due to the resource leak. For whatever reason, it mostly fails with Windows. The WG scaffolding to call bind.Close() is not present in this these tests, so ensure it gets called explicitly. Signed-off-by: Denton Gentry <dgentry@tailscale.com>
#3014 added a rebind on STUN failure, which means there can now be a tailscale.com/wgengine/magicsock.(*RebindingUDPConn).ReadFromNetaddr in progress at the end of the test waiting for a STUN response which will never arrive. This causes a test flake due to the resource leak in those cases where the Conn decided to rebind. For whatever reason, it mostly flakes with Windows. If the Conn is closed, don't Rebind after a send error. Signed-off-by: Denton Gentry <dgentry@tailscale.com>
#3014 added a rebind on STUN failure, which means there can now be a tailscale.com/wgengine/magicsock.(*RebindingUDPConn).ReadFromNetaddr in progress at the end of the test waiting for a STUN response which will never arrive. This causes a test flake due to the resource leak in those cases where the Conn decided to rebind. For whatever reason, it mostly flakes with Windows. If the Conn is closed, don't Rebind after a send error. Signed-off-by: Denton Gentry <dgentry@tailscale.com>
#3014 added a rebind on STUN failure, which means there can now be a tailscale.com/wgengine/magicsock.(*RebindingUDPConn).ReadFromNetaddr in progress at the end of the test waiting for a STUN response which will never arrive. This causes a test flake due to the resource leak in those cases where the Conn decided to rebind. For whatever reason, it mostly flakes with Windows. If the Conn is closed, don't Rebind after a send error. Signed-off-by: Denton Gentry <dgentry@tailscale.com>
#3014 added a rebind on STUN failure, which means there can now be a tailscale.com/wgengine/magicsock.(*RebindingUDPConn).ReadFromNetaddr in progress at the end of the test waiting for a STUN response which will never arrive. This causes a test flake due to the resource leak in those cases where the Conn decided to rebind. For whatever reason, it mostly flakes with Windows. If the Conn is closed, don't Rebind after a send error. Signed-off-by: Denton Gentry <dgentry@tailscale.com> (cherry picked from commit def650b)
On iOS (and possibly other platforms), sometimes our UDP socket would
get stuck in a state where it was bound to an invalid interface (or no
interface) after a network reconfiguration. We can detect this by
actually checking the error codes from sending our STUN packets.
If we completely fail to send any STUN packets, we know something is
very broken. So on the next STUN attempt, let's rebind the UDP socket
to try to correct any problems.
This fixes a problem where iOS would sometimes get stuck using DERP
instead of direct connections until the backend was restarted.
Fixes #2994
Signed-off-by: Avery Pennarun apenwarr@tailscale.com