Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the issue of etcdserver crashing on receiving REST watch stream requests #19521

Merged
merged 4 commits into from
Mar 4, 2025

Conversation

ahrtr
Copy link
Member

@ahrtr ahrtr commented Mar 4, 2025

Reproduce #19509

Once grpc-gateway v2.26.3 is out, then I will bump it as the second commit in this PR.

cc @ivanvc @johanbrandhorst @Totktonada

/Users/wachao/go/src/github.com/ahrtr/etcd/bin/etcd (TestCurlWatchIssue19509-test-0) (72931): 2025/03/04 13:04:04 ERROR: Failed to decode request: invalid character 'v' looking for beginning of value
/Users/wachao/go/src/github.com/ahrtr/etcd/bin/etcd (TestCurlWatchIssue19509-test-0) (72931): 2025/03/04 13:04:04 ERROR: Failed to decode request: invalid character 'v' looking for beginning of value
/Users/wachao/go/src/github.com/ahrtr/etcd/bin/etcd (TestCurlWatchIssue19509-test-0) (72931): panic: Write called after Handler finished
/Users/wachao/go/src/github.com/ahrtr/etcd/bin/etcd (TestCurlWatchIssue19509-test-0) (72931): 
/Users/wachao/go/src/github.com/ahrtr/etcd/bin/etcd (TestCurlWatchIssue19509-test-0) (72931): goroutine 234 [running]:
/Users/wachao/go/src/github.com/ahrtr/etcd/bin/etcd (TestCurlWatchIssue19509-test-0) (72931): golang.org/x/net/http2.(*responseWriter).write(0x1400096d1d0?, 0x140005a5a10?, {0x1400010ebd0?, 0x101524f7c?, 0x140000a81e0?}, {0x0?, 0x1400096d1d0?})
/Users/wachao/go/src/github.com/ahrtr/etcd/bin/etcd (TestCurlWatchIssue19509-test-0) (72931): 	golang.org/x/net@v0.35.0/http2/server.go:3076 +0x15c
/Users/wachao/go/src/github.com/ahrtr/etcd/bin/etcd (TestCurlWatchIssue19509-test-0) (72931): golang.org/x/net/http2.(*responseWriter).Write(0x14000076568?, {0x1400010ebd0?, 0x1400096d1d0?, 0x14000076570?})
/Users/wachao/go/src/github.com/ahrtr/etcd/bin/etcd (TestCurlWatchIssue19509-test-0) (72931): 	golang.org/x/net@v0.35.0/http2/server.go:3065 +0x38
/Users/wachao/go/src/github.com/ahrtr/etcd/bin/etcd (TestCurlWatchIssue19509-test-0) (72931): github.com/grpc-ecosystem/grpc-gateway/v2/runtime.HTTPStreamError({0x101b000d8?, 0x14000b348d0?}, 0x101af07a8?, {0x101b021e0, 0x140006885c0}, {0x101afb270, 0x14000076550}, 0x14000119f28?, {0x101af1e20?, 0x14000076570?})
/Users/wachao/go/src/github.com/ahrtr/etcd/bin/etcd (TestCurlWatchIssue19509-test-0) (72931): 	github.com/grpc-ecosystem/grpc-gateway/v2@v2.26.1/runtime/errors.go:93 +0x11c
/Users/wachao/go/src/github.com/ahrtr/etcd/bin/etcd (TestCurlWatchIssue19509-test-0) (72931): go.etcd.io/etcd/api/v3/etcdserverpb/gw.RegisterWatchHandler.RegisterWatchHandlerClient.func1.1()
/Users/wachao/go/src/github.com/ahrtr/etcd/bin/etcd (TestCurlWatchIssue19509-test-0) (72931): 	go.etcd.io/etcd/api/v3@v3.6.0-alpha.0/etcdserverpb/gw/rpc.pb.gw.go:2234 +0xd8
/Users/wachao/go/src/github.com/ahrtr/etcd/bin/etcd (TestCurlWatchIssue19509-test-0) (72931): created by go.etcd.io/etcd/api/v3/etcdserverpb/gw.RegisterWatchHandler.RegisterWatchHandlerClient.func1 in goroutine 172
/Users/wachao/go/src/github.com/ahrtr/etcd/bin/etcd (TestCurlWatchIssue19509-test-0) (72931): 	go.etcd.io/etcd/api/v3@v3.6.0-alpha.0/etcdserverpb/gw/rpc.pb.gw.go:2231 +0x1cc
/usr/bin/curl (curl_--cacert_/Users/wachao/go/src/github.com/ahrtr/etcd/tests/fixtures/ca.crt_--cert_/Users/wachao/go/src/github.com/ahrtr/etcd/tests/fixtures/server.crt_--key_/Users/wachao/go/src/github.com/ahrtr/etcd/tests/fixtures/server.key.insecure_-L_https://localhost:20000/v3/watch_-m_3_-X_POST_-d_value=) (72933): curl: (7) Failed to connect to localhost port 20000 after 0 ms: Couldn't connect to server
    logger.go:146: 2025-03-04T13:04:04.251Z	INFO	server exited	{"name": "TestCurlWatchIssue19509-test-0", "code": 2}
    v3_curl_watch_test.go:74: 
        	Error Trace:	/Users/wachao/go/src/github.com/ahrtr/etcd/tests/e2e/v3_curl_watch_test.go:74
        	Error:      	Should be true
        	Test:       	TestCurlWatchIssue19509
        	Messages:   	etcdserver already exited after 2 curl watch requests
    logger.go:146: 2025-03-04T13:04:04.252Z	INFO	closing test cluster...
    logger.go:146: 2025-03-04T13:04:04.252Z	INFO	closing server...	{"name": "TestCurlWatchIssue19509-test-0"}
    logger.go:146: 2025-03-04T13:04:04.252Z	INFO	removing directory	{"data-dir": "/var/folders/mf/_v3jxtyd0qgccvmc1z0_5klm0000gq/T/TestCurlWatchIssue195092804627941/002"}
    logger.go:146: 2025-03-04T13:04:04.255Z	INFO	closed test cluster.
--- FAIL: TestCurlWatchIssue19509 (0.72s)
FAIL
exit status 1
FAIL	go.etcd.io/etcd/tests/v3/e2e	1.117s

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

@ahrtr
Copy link
Member Author

ahrtr commented Mar 4, 2025

cc @fuweid

Copy link

codecov bot commented Mar 4, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.87%. Comparing base (2f15f08) to head (d23b078).
Report is 5 commits behind head on main.

Additional details and impacted files

see 25 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #19521      +/-   ##
==========================================
+ Coverage   68.77%   68.87%   +0.10%     
==========================================
  Files         421      421              
  Lines       35901    35901              
==========================================
+ Hits        24690    24727      +37     
+ Misses       9782     9750      -32     
+ Partials     1429     1424       -5     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f15f08...d23b078. Read the comment docs.

…EST watch requests

Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
@ahrtr ahrtr force-pushed the curl_watch_20250304 branch from 1190df8 to 6569111 Compare March 4, 2025 14:00
Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
ahrtr added 2 commits March 4, 2025 18:08
Executed ./scripts/genproto.sh

Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
@ahrtr ahrtr changed the title Create an e2e test to reproduce the issue of etcdserver crashing on REST watch requests Fix the issue of etcdserver crashing on receiving REST watch stream requests Mar 4, 2025
@ahrtr
Copy link
Member Author

ahrtr commented Mar 4, 2025

Copy link

@johanbrandhorst johanbrandhorst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤞🏻

Copy link
Member

@ivanvc ivanvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on green. Thanks, @ahrtr.

defer epc.Close()

curlCmdAndArgs := e2e.CURLPrefixArgsCluster(epc.Cfg, epc.Procs[0], "POST", e2e.CURLReq{Endpoint: "/v3/watch", Timeout: 3})
for i := 0; i < 10; i++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified to

Suggested change
for i := 0; i < 10; i++ {
for i := range 10 {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a big deal, let's just keep it as it's. I really don't want to wait another 1 hour to backport this PR, thx

@ahrtr
Copy link
Member Author

ahrtr commented Mar 4, 2025

@jmhbnz @serathius we still need one approver to approve this PR.

@ahrtr
Copy link
Member Author

ahrtr commented Mar 4, 2025

also cc @spzala

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahrtr, ivanvc, johanbrandhorst, serathius, spzala

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [ahrtr,serathius,spzala]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@spzala
Copy link
Member

spzala commented Mar 4, 2025

Thanks @ahrtr !!

Comment on lines +69 to +70
_ = curlProc.Signal(syscall.SIGKILL)
_ = curlProc.Close()
Copy link
Member Author

@ahrtr ahrtr Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I intentionally do not check the error, because they may fail for known reason. The curlProc.Close() will most likely fail because the curl command is an invalid command, because there is NO request body.

The purpose of this test is to verify etcdserver don't crash when receiving even invalid requests.

@ahrtr
Copy link
Member Author

ahrtr commented Mar 4, 2025

Thanks all for the quick review.

@ahrtr ahrtr merged commit 3d38153 into etcd-io:main Mar 4, 2025
33 checks passed
@ahrtr
Copy link
Member Author

ahrtr commented Mar 4, 2025

/cherry-pick release-3.6

@k8s-infra-cherrypick-robot

@ahrtr: #19521 failed to apply on top of branch "release-3.6":

Applying: Create an e2e test to reproduce the issue of etcdserver crashing on REST watch requests
Applying: Bump grpc-gateway to v2.26.3
Using index info to reconstruct a base tree...
M	api/go.mod
M	api/go.sum
M	client/internal/v2/go.mod
M	client/v3/go.mod
M	client/v3/go.sum
M	etcdctl/go.mod
M	etcdctl/go.sum
M	etcdutl/go.mod
M	etcdutl/go.sum
M	go.mod
M	go.sum
M	server/go.mod
M	server/go.sum
M	tests/go.mod
M	tests/go.sum
M	tools/mod/go.mod
M	tools/mod/go.sum
Falling back to patching base and 3-way merge...
Auto-merging tools/mod/go.sum
CONFLICT (content): Merge conflict in tools/mod/go.sum
Auto-merging tools/mod/go.mod
CONFLICT (content): Merge conflict in tools/mod/go.mod
Auto-merging tests/go.sum
CONFLICT (content): Merge conflict in tests/go.sum
Auto-merging tests/go.mod
CONFLICT (content): Merge conflict in tests/go.mod
Auto-merging server/go.sum
CONFLICT (content): Merge conflict in server/go.sum
Auto-merging server/go.mod
CONFLICT (content): Merge conflict in server/go.mod
Auto-merging go.sum
CONFLICT (content): Merge conflict in go.sum
Auto-merging go.mod
CONFLICT (content): Merge conflict in go.mod
Auto-merging etcdutl/go.sum
CONFLICT (content): Merge conflict in etcdutl/go.sum
Auto-merging etcdutl/go.mod
CONFLICT (content): Merge conflict in etcdutl/go.mod
Auto-merging etcdctl/go.sum
CONFLICT (content): Merge conflict in etcdctl/go.sum
Auto-merging etcdctl/go.mod
CONFLICT (content): Merge conflict in etcdctl/go.mod
Auto-merging client/v3/go.sum
CONFLICT (content): Merge conflict in client/v3/go.sum
Auto-merging client/v3/go.mod
CONFLICT (content): Merge conflict in client/v3/go.mod
Auto-merging client/internal/v2/go.mod
Auto-merging api/go.sum
CONFLICT (content): Merge conflict in api/go.sum
Auto-merging api/go.mod
CONFLICT (content): Merge conflict in api/go.mod
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0002 Bump grpc-gateway to v2.26.3

In response to this:

/cherry-pick release-3.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ahrtr
Copy link
Member Author

ahrtr commented Mar 4, 2025

Let me manually backport this PR to release-3.6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

7 participants