[v2] Properly retry on hiccups on the SOCKS server #1598

LukeShu · 2021-03-23T17:54:50Z

Description

@esmet reports that he sees AES CI flakes where a hiccup in the connection to the cluster causes the whole connector to quit. It should have retries! Well, it turns out that while most of the things had retries, the /connector/server-socks goroutine didn't retry, and so it'd error out, and cause all the rest of it to shut down.

As usual, I suggest a commit-by-commit review:

The first 3 commits are doing prep-work for the fix, with an eye for auditing uses of tm.sshPort to make sure that all users of it properly retry.
The 4th commit is the actual fix.
The final 2 commits are generic cleanup tasks that are now-trivial and now-obvious because of the changes made above.

Checklist

I made sure to update ./CHANGELOG.md. - yes
I made sure to either submit a docs PR, or tell Matt about the necessary documentation changes. - no applicable changes
My change is adequately tested. - no
I updated DEVELOPING.md with any any special dev tricks I had to use to work on this code efficiently. - no tricks

Signed-off-by: Luke Shumaker <lukeshu@datawire.io>

So move it to the traffic manager. Signed-off-by: Luke Shumaker <lukeshu@datawire.io>

Signed-off-by: Luke Shumaker <lukeshu@datawire.io>

thallgren

LGTM!

Signed-off-by: Luke Shumaker <lukeshu@datawire.io>

LukeShu added 6 commits March 23, 2021 11:30

connector: Add a comment about SSH retry

f597e71

Signed-off-by: Luke Shumaker <lukeshu@datawire.io>

connector: bridge.check() is really testing the tm kpf, not the bridge

b8df654

So move it to the traffic manager. Signed-off-by: Luke Shumaker <lukeshu@datawire.io>

connector: Factor out a dedicated tm.sshPortForward helper function

2e89023

Signed-off-by: Luke Shumaker <lukeshu@datawire.io>

connector: bridge: Use that new tm.sshPortForward helper function

c90b1a3

Signed-off-by: Luke Shumaker <lukeshu@datawire.io>

connector: Get rid of the now-pointless "bridge" wrapper object

ce30a0e

Signed-off-by: Luke Shumaker <lukeshu@datawire.io>

connector: Move the kubectl version check; begone with BRIDGE_FAILED

9f4ad2e

Signed-off-by: Luke Shumaker <lukeshu@datawire.io>

LukeShu force-pushed the lukeshu/for-jesmet branch from e27d02e to 9f4ad2e Compare March 23, 2021 18:39

thallgren approved these changes Mar 23, 2021

View reviewed changes

Merge remote-tracking branch 'origin/release/v2' into lukeshu/for-jesmet

bc8b7d2

Signed-off-by: Luke Shumaker <lukeshu@datawire.io>

LukeShu merged commit 1f299ce into release/v2 Mar 24, 2021

LukeShu deleted the lukeshu/for-jesmet branch March 24, 2021 18:47

khussey added this to the 2021 Cycle 3 milestone Mar 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v2] Properly retry on hiccups on the SOCKS server #1598

[v2] Properly retry on hiccups on the SOCKS server #1598

LukeShu commented Mar 23, 2021 •

edited

Loading

thallgren left a comment

[v2] Properly retry on hiccups on the SOCKS server #1598

[v2] Properly retry on hiccups on the SOCKS server #1598

Conversation

LukeShu commented Mar 23, 2021 • edited Loading

Description

Checklist

thallgren left a comment

Choose a reason for hiding this comment

LukeShu commented Mar 23, 2021 •

edited

Loading