Skip to content

Add growing member test scenario #20239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nwnt
Copy link
Member

@nwnt nwnt commented Jun 28, 2025

For #20137

Signed-off-by: Nont <nont@duck.com>
@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: nwnt
Once this PR has been reviewed and has the lgtm label, please assign ahrtr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@nwnt
Copy link
Member Author

nwnt commented Jun 28, 2025

@serathius Not sure if this is the same as what you have in mind. I probably missed a lot of things so definitely would love your feedback about it.

@k8s-ci-robot
Copy link

@nwnt: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-etcd-robustness-amd64 1d4a309 link true /test pull-etcd-robustness-amd64
pull-etcd-robustness-arm64 1d4a309 link true /test pull-etcd-robustness-arm64
pull-etcd-e2e-amd64 1d4a309 link true /test pull-etcd-e2e-amd64

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link

codecov bot commented Jun 28, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 69.00%. Comparing base (dc7fe2a) to head (1d4a309).
Report is 14 commits behind head on main.

Additional details and impacted files

see 37 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #20239      +/-   ##
==========================================
- Coverage   69.18%   69.00%   -0.18%     
==========================================
  Files         414      411       -3     
  Lines       34403    34273     -130     
==========================================
- Hits        23800    23651     -149     
- Misses       9209     9211       +2     
- Partials     1394     1411      +17     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dc7fe2a...1d4a309. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@@ -40,6 +40,10 @@ var (
MemberReplace Failpoint = memberReplace{}
MemberDowngrade Failpoint = memberDowngrade{}
MemberDowngradeUpgrade Failpoint = memberDowngradeUpgrade{}
GrowMember Failpoint = &growMember{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not growing members here :P

Usually talk about "cluster" and not "members". And we call the process "scale up". Maybe ClusterScaleUp?

@@ -304,3 +308,28 @@ func patchArgs(args []string, flag, newValue string) error {
}
return fmt.Errorf("--%s flag not found", flag)
}

type growMember struct {
currentSize int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which "current" you mean? Maybe initialSize.

}

func (f *growMember) Available(config e2e.EtcdProcessClusterConfig, member e2e.EtcdProcess, profile traffic.Profile) bool {
return f.currentSize < f.targetSize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the tricky part. If you start from cluster of size 1, then the traffic will talk to the 1 initial member. I would recommend instead to start to from 3 members, remove 2 and add back 2. This is still tricky because StartNewProc will allocate unique port numbers so we will need to work around that.

Profile: tp.Profile,
Cluster: *e2e.NewConfig(clusterOfSize1Options...),
},
TestScenario{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this. As long the failpoint is registered in allFailpoints it will be used. For testing I recommend just temporarily disabling other failpoints,.

@serathius
Copy link
Member

serathius commented Jun 30, 2025

Good work on the implementation, implementing a new failpoint is pretty tricky.

Got an error main_test.go:114: failed to read WAL, err: wal: slice bounds out of range, snapshot[Index: 0, Term: 0], current entry[Index: 4093, Term: 3], len(ents): 0

Problem we are hitting with PersistedRequestsCluster not being able to read WAL files from added members. Current implementation of PersistedRequestsCluster assumes that quorum members have all entries in the WAL. In this case only the initial member has the full WAL, so it fails. I think we could rewrite it to consider each index individually and accepting it as long as there is no conflict.

Unfortunately to merge this feature we will need to rewrite PersistedRequestsCluster first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants