Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestExitWatch is unstable #6936

Closed
lhy1024 opened this issue Aug 10, 2023 · 5 comments
Closed

TestExitWatch is unstable #6936

lhy1024 opened this issue Aug 10, 2023 · 5 comments
Assignees
Labels
type/ci The issue is related to CI.

Comments

@lhy1024
Copy link
Contributor

lhy1024 commented Aug 10, 2023

Flaky Test

Which jobs are failing

2023-08-10T07:00:45.5252318Z     testutil.go:63: 
2023-08-10T07:00:45.5252820Z         	Error Trace:	/home/runner/work/pd/pd/pkg/utils/testutil/testutil.go:63
2023-08-10T07:00:45.5254044Z         	            				/home/runner/work/pd/pd/pkg/election/leadership_test.go:198
2023-08-10T07:00:45.5254866Z         	            				/home/runner/work/pd/pd/pkg/election/leadership_test.go:148
2023-08-10T07:00:45.5255253Z         	Error:      	Condition never satisfied
2023-08-10T07:00:45.5255549Z         	Test:       	TestExitWatch

CI link

https://github.com/tikv/pd/actions/runs/5817947111/job/15773440869

Reason for failure (if possible)

Anything else

@lhy1024 lhy1024 added the type/ci The issue is related to CI. label Aug 10, 2023
@lhy1024 lhy1024 self-assigned this Aug 10, 2023
@lhy1024
Copy link
Contributor Author

lhy1024 commented Aug 29, 2023

It hangs with here

	// Case4: close the server before the watch loop starts
	checkExitWatch(t, leaderKey, func(server *embed.Etcd, client *clientv3.Client) {
		re.NoError(failpoint.Enable("github.com/tikv/pd/server/delayWatcher", `pause`))
		server.Close()
		re.NoError(failpoint.Disable("github.com/tikv/pd/server/delayWatcher"))
	})

Maybe the server is closed slow so the code passes the healthy check but hangs with

watchChan := watcher.Watch(watchChanCtx, ls.leaderKey, clientv3.WithRev(revision))
.

If so, it has fixed by #6961

@lhy1024
Copy link
Contributor Author

lhy1024 commented Aug 30, 2023

function t0 t1 t2
server normal closing closed
leadership healthy check ---- watch chan creating

server test code

	checkExitWatch(t, leaderKey, func(server *embed.Etcd, client *clientv3.Client) {
		<-allowCloseServer
		server.Close()
		waitServerClosed <- struct{}{}
	})

leadership test code

	if !etcdutil.IsHealthy(serverCtx, ls.client) {
		......
	}
	allowCloseServer <- struct{}{}
	<-waitServerClosed
	watchChan := watcher.Watch(watchChanCtx, ls.leaderKey, clientv3.WithRev(revision))

It will hang with before #6961 and can pass after it

        Error Trace:	/home/lhy1024/pd/pkg/utils/testutil/testutil.go:63
                    				/home/lhy1024/pd/pkg/election/leadership_test.go:200
                    				/home/lhy1024/pd/pkg/election/leadership_test.go:150
        Error:      	Condition never satisfied
        Test:       	TestExitWatch

@lhy1024 lhy1024 closed this as completed Aug 30, 2023
@rleungx
Copy link
Member

rleungx commented Aug 31, 2023

@rleungx rleungx reopened this Aug 31, 2023
@lhy1024
Copy link
Contributor Author

lhy1024 commented Aug 31, 2023

https://github.com/tikv/pd/actions/runs/6035163111/job/16374979676?pr=7010

2023-08-31T08:58:41.0954663Z --- FAIL: TestExitWatch (30.14s)
2023-08-31T08:58:41.0955001Z     leadership_test.go:222: 
2023-08-31T08:58:41.0955611Z         	Error Trace:	/home/runner/work/pd/pd/pkg/election/leadership_test.go:222
2023-08-31T08:58:41.0956603Z         	            				/home/runner/work/pd/pd/pkg/election/leadership_test.go:271
2023-08-31T08:58:41.0957568Z         	            				/home/runner/work/pd/pd/pkg/election/leadership_test.go:194
2023-08-31T08:58:41.0958047Z         	Error:      	Received unexpected error:
2023-08-31T08:58:41.0961383Z         	            	error validating peerURLs {ClusterID:7f4ed0387b758466 Members:[&{ID:74e0fd7187919647 RaftAttributes:{PeerURLs:[http://127.0.0.1:38853] IsLearner:false} Attributes:{Name:test_etcd_9439 ClientURLs:[http://127.0.0.1:46565]}} &{ID:85a9b63401171586 RaftAttributes:{PeerURLs:[http://127.0.0.1:46513] IsLearner:false} Attributes:{Name:test_etcd_7350 ClientURLs:[http://127.0.0.1:38735]}}] RemovedMemberIDs:[]}: member count is unequal

It is different from before and it also met error validating peerURLs when prepare cluster

@lhy1024
Copy link
Contributor Author

lhy1024 commented Sep 4, 2023

closed by #7007

@lhy1024 lhy1024 closed this as completed Sep 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/ci The issue is related to CI.
Projects
None yet
Development

No branches or pull requests

2 participants