Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load --grpc_auth_static_client_creds file once #15030

Merged

Conversation

timvaillancourt
Copy link
Contributor

@timvaillancourt timvaillancourt commented Jan 24, 2024

Description

This PR causes the --grpc_auth_static_client_creds file to be loaded once in order to avoid high OS-thread usage when vtgate starts up. In the right circumstances, the repeat loading of this file can contribute to vtgate crashing on runtime: program exceeds 10000-thread limit

This stack is seen using a significant amount of blocking syscalls on every spawned healthcheck stream:

(dlv) stack
 0  0x000000000047f120 in syscall.Syscall6
    at syscall/asm_linux_amd64.s:43
 1  0x000000000047bb05 in syscall.openat
    at syscall/zsyscall_linux_amd64.go:68
 2  0x00000000004f7c5b in syscall.Open
    at syscall/syscall_linux.go:155
 3  0x00000000004f7c5b in os.openFileNolog
    at os/file_unix.go:216
 4  0x00000000004f6585 in os.OpenFile
    at os/file.go:338
 5  0x00000000004f694a in os.Open
    at os/file.go:318
 6  0x00000000004f694a in os.ReadFile
    at os/file.go:670
 7  0x0000000000de2f7a in vitess.io/vitess/go/vt/grpcclient.AppendStaticAuth
    at vitess.io/vitess/go/vt/grpcclient/client_auth_static.go:62
 8  0x0000000000de21cc in vitess.io/vitess/go/vt/grpcclient.DialContext
    at vitess.io/vitess/go/vt/grpcclient/client.go:105
 9  0x0000000001150a45 in vitess.io/vitess/go/vt/grpcclient.Dial
    at vitess.io/vitess/go/vt/grpcclient/client.go:63
10  0x0000000001150a45 in vitess.io/vitess/go/vt/vttablet/grpctabletconn.DialTablet
    at vitess.io/vitess/go/vt/vttablet/grpctabletconn/conn.go:80
11  0x0000000000e5a54c in vitess.io/vitess/go/vt/discovery.(*tabletHealthCheck).connectionLocked
    at vitess.io/vitess/go/vt/discovery/tablet_health_check.go:148
12  0x0000000000e5a430 in vitess.io/vitess/go/vt/discovery.(*tabletHealthCheck).Connection
    at vitess.io/vitess/go/vt/discovery/tablet_health_check.go:143
13  0x0000000000e5a2f1 in vitess.io/vitess/go/vt/discovery.(*tabletHealthCheck).stream
    at vitess.io/vitess/go/vt/discovery/tablet_health_check.go:127
14  0x0000000000e5ae9d in vitess.io/vitess/go/vt/discovery.(*tabletHealthCheck).checkConn
    at vitess.io/vitess/go/vt/discovery/tablet_health_check.go:276
15  0x0000000000e508aa in vitess.io/vitess/go/vt/discovery.(*HealthCheckImpl).AddTablet.func2
    at vitess.io/vitess/go/vt/discovery/healthcheck.go:380
16  0x00000000004692c1 in runtime.goexit
    at runtime/asm_amd64.s:1571

At startup vtgate opens healthcheck streams to every tablet it is watching in parallel, and right now each of those goroutines are loading/unmarshalling the static client creds file

A consequence of this change is a process reload is required to change the file, whereas today you could change the file and it would be re-loaded (sadly by N x goroutines) on each loop of the topology watcher .Start(...) ticker, which to me isn't a dealbreaker (I was surprised it's reloaded actually), but I'm curious what everyone thought? I'm happy to adjust this to support SIGHUP and I'm curious if there are helpers for that sort of thing

Related Issue(s)

#15031

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Deployment Notes

Copy link
Contributor

vitess-bot bot commented Jan 24, 2024

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Jan 24, 2024
@github-actions github-actions bot added this to the v19.0.0 milestone Jan 24, 2024
@timvaillancourt timvaillancourt changed the title Load --grpc_auth_static_client_creds file once Load --grpc_auth_static_client_creds file once Jan 24, 2024
@mattlord mattlord added Type: Bug Component: Authn/z Authentication / Authorization / Certificates and removed NeedsWebsiteDocsUpdate What it says NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work labels Jan 24, 2024
Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! This is a much better way forward than imposing a concurrency limit on vtgate's processing of tablet records. I can re-review and approve once the feedback is addressed. They are all small things, the basic approach is obviously correct and what everyone has agreed on.

go/vt/grpcclient/client_auth_static.go Outdated Show resolved Hide resolved
go/vt/grpcclient/client_auth_static.go Outdated Show resolved Hide resolved
go/vt/grpcclient/client_auth_static.go Outdated Show resolved Hide resolved
go/vt/grpcclient/client_auth_static.go Outdated Show resolved Hide resolved
@deepthi deepthi removed NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Jan 25, 2024
@timvaillancourt
Copy link
Contributor Author

timvaillancourt commented Jan 25, 2024

Nice work! This is a much better way forward than imposing a concurrency limit on vtgate's processing of tablet records. I can re-review and approve once the feedback is addressed. They are all small things, the basic approach is obviously correct and what everyone has agreed on.

Thanks @deepthi! I think I've addressed your helpful PR comments now 🙇

@timvaillancourt
Copy link
Contributor Author

This PR seems to be causing this test failure, but I can't spot why. More 👀 appreciated!

--- FAIL: TestVtctlAuthClient (0.00s)
    client.go:88: failed to get first line: rpc error: code = Unauthenticated desc = username and password must be provided
FAIL
FAIL	vitess.io/vitess/go/vt/vtctl/grpcvtctlclient	0.116s

@timvaillancourt
Copy link
Contributor Author

timvaillancourt commented Feb 6, 2024

@deepthi / @ajm188: I believe this is ready for review!

I unfortunately needed to move most of the logic out of init() because the CLI-library was setting credsFile after the init() was ran

@frouioui frouioui modified the milestones: v19.0.0, v20.0.0 Feb 6, 2024
Copy link

codecov bot commented Feb 6, 2024

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (1bf8dda) 47.36% compared to head (6b23362) 39.76%.
Report is 85 commits behind head on main.

Files Patch % Lines
go/vt/grpcclient/client_auth_static.go 88.37% 4 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main   #15030       +/-   ##
===========================================
- Coverage   47.36%   39.76%    -7.60%     
===========================================
  Files        1144     1639      +495     
  Lines      238995   380084   +141089     
===========================================
+ Hits       113197   151155    +37958     
- Misses     117213   212721    +95508     
- Partials     8585    16208     +7623     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
@timvaillancourt
Copy link
Contributor Author

@deepthi / @ajm188: I believe this is ready for review!

@deepthi / @ajm188: I lied 😅, now it's ready for review!

Copy link
Contributor

@ajm188 ajm188 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one minor bug fix then lgtm

go/vt/grpcclient/client_auth_static.go Outdated Show resolved Hide resolved
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Copy link
Contributor

@ajm188 ajm188 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@deepthi deepthi merged commit acae954 into vitessio:main Feb 10, 2024
101 of 102 checks passed
@timvaillancourt timvaillancourt deleted the load-grpcclient-static-auth-once branch February 10, 2024 05:49
timvaillancourt added a commit to timvaillancourt/vitess that referenced this pull request Feb 27, 2024
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
timvaillancourt added a commit to slackhq/vitess that referenced this pull request Feb 27, 2024
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
frouioui added a commit that referenced this pull request Mar 7, 2024
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Florent Poinsard <florent.poinsard@outlook.fr>
Co-authored-by: Florent Poinsard <florent.poinsard@outlook.fr>
timvaillancourt added a commit to slackhq/vitess that referenced this pull request Mar 15, 2024
…izations (#227)

* Load `--grpc_auth_static_client_creds` file once (vitessio#15030)

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Filter by keyspace earlier in `tabletgateway`s `WaitForTablets(...)` (vitessio#15347)

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Limit concurrent creation of healthcheck gRPC connections (vitessio#15053)

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* go mod tidy

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

* Update MySQL apt package and GPG signature (vitessio#14785)

Signed-off-by: Matt Lord <mattalord@gmail.com>

* remove unrelated workflow files from v20

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>

---------

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Co-authored-by: Matt Lord <mattalord@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Authn/z Authentication / Authorization / Certificates Type: Bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants