Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

member: avoid frequent campaign times #7301

Merged
merged 5 commits into from
Nov 16, 2023

Conversation

HuSharp
Copy link
Member

@HuSharp HuSharp commented Nov 2, 2023

What problem does this PR solve?

Issue Number: Close #7251, ref #7377

What is changed and how does it work?

when pd leader frequently campaign leader, but etcd leader did not change.
We need to prevent this pd leader campaign and resign to another member.

Check List

Tests

  • Unit test

Release note

None.

Copy link
Contributor

ti-chi-bot bot commented Nov 2, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • bufferflies
  • lhy1024

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/needs-triage-completed labels Nov 2, 2023
@ti-chi-bot ti-chi-bot bot requested a review from rleungx November 2, 2023 08:02
@ti-chi-bot ti-chi-bot bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Nov 2, 2023
@@ -152,7 +152,7 @@ func (s *Server) primaryElectionLoop() {

func (s *Server) campaignLeader() {
log.Info("start to campaign the primary/leader", zap.String("campaign-resource-manager-primary-name", s.participant.Name()))
if err := s.participant.CampaignLeader(s.cfg.LeaderLease); err != nil {
if err := s.participant.CampaignLeader(s.Context(), s.cfg.LeaderLease); err != nil {
Copy link
Member Author

@HuSharp HuSharp Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, pass context to make sure resign can be used

@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 2, 2023
Signed-off-by: husharp <jinhao.hu@pingcap.com>
Copy link

codecov bot commented Nov 2, 2023

Codecov Report

Merging #7301 (8f141b2) into master (181fdc9) will decrease coverage by 0.04%.
The diff coverage is 80.95%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7301      +/-   ##
==========================================
- Coverage   74.35%   74.31%   -0.04%     
==========================================
  Files         451      451              
  Lines       48863    48877      +14     
==========================================
- Hits        36330    36325       -5     
- Misses       9324     9336      +12     
- Partials     3209     3216       +7     
Flag Coverage Δ
unittests 74.31% <80.95%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@@ -105,7 +109,8 @@ func (ls *Leadership) GetLeaderKey() string {
}

// Campaign is used to campaign the leader with given lease and returns a leadership
func (ls *Leadership) Campaign(leaseTimeout int64, leaderData string, cmps ...clientv3.Cmp) error {
func (ls *Leadership) Campaign(ctx context.Context, leaseTimeout int64, leaderData string, cmps ...clientv3.Cmp) error {
ls.CampaignTimes = append(ls.CampaignTimes, *cache.NewStringTTL(ctx, 5*time.Second, 5*time.Minute))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is its length always increase?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace to slice without ttl

pkg/encryption/key_manager_test.go Outdated Show resolved Hide resolved
@@ -62,6 +63,9 @@ type Leadership struct {
keepAliveCtx context.Context
keepAliveCancelFunc context.CancelFunc
keepAliveCancelFuncLock syncutil.Mutex
// CampaignTimes is used to record the campaign times of the leader in 5min.
// To avoid the leader campaign too frequently.
CampaignTimes []cache.TTLString
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TTLString is container, no needs to use array.BTW, you can only use array not ttlcache.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. fixed

@HuSharp HuSharp changed the title member: avoid frequency campaign times member: avoid frequent campaign times Nov 3, 2023
Signed-off-by: husharp <jinhao.hu@pingcap.com>
@ti-chi-bot ti-chi-bot bot added the status/LGT1 Indicates that a PR has LGTM 1. label Nov 3, 2023
pkg/election/leadership.go Outdated Show resolved Hide resolved
Signed-off-by: husharp <jinhao.hu@pingcap.com>
@ti-chi-bot ti-chi-bot bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Nov 13, 2023
@HuSharp
Copy link
Member Author

HuSharp commented Nov 13, 2023

Being tested on tcms... please DO NOT merge in now :)

@rleungx
Copy link
Member

rleungx commented Nov 13, 2023

/hold

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 13, 2023
Signed-off-by: husharp <jinhao.hu@pingcap.com>
@HuSharp
Copy link
Member Author

HuSharp commented Nov 15, 2023

after test, we can reduce the duration for qps drop to zero

before
b5a7092c-2e10-4869-9849-28b8258d2b8c

after
f8c432ef-c8fc-4142-82e8-d4ed6a86dfa4

@HuSharp
Copy link
Member Author

HuSharp commented Nov 15, 2023

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 15, 2023
@bufferflies
Copy link
Contributor

/merge

Copy link
Contributor

ti-chi-bot bot commented Nov 16, 2023

@bufferflies: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Copy link
Contributor

ti-chi-bot bot commented Nov 16, 2023

This pull request has been accepted and is ready to merge.

Commit hash: bacc30a

@ti-chi-bot ti-chi-bot bot added the status/can-merge Indicates a PR has been approved by a committer. label Nov 16, 2023
Copy link
Contributor

ti-chi-bot bot commented Nov 16, 2023

@HuSharp: Your PR was out of date, I have automatically updated it for you.

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot bot merged commit 0ebf4b2 into tikv:master Nov 16, 2023
26 checks passed
@HuSharp HuSharp deleted the avoid_campaign_leader branch November 16, 2023 02:38
@ti-chi-bot ti-chi-bot added the needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. label Feb 1, 2024
@ti-chi-bot
Copy link
Member

/cherry-pick release-7.5

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.5: #7790.

@ti-chi-bot
Copy link
Member

@ti-chi-bot: new pull request could not be created: failed to create pull request against tikv/pd#release-7.5 from head ti-chi-bot:cherry-pick-7301-to-release-7.5: status code 422 not one of [201], body: {"message":"Validation Failed","errors":[{"resource":"PullRequest","code":"custom","message":"A pull request already exists for ti-chi-bot:cherry-pick-7301-to-release-7.5."}],"documentation_url":"https://docs.github.com/rest/pulls/pulls#create-a-pull-request"}

In response to this:

/cherry-pick release-7.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot removed the needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. label Feb 2, 2024
ti-chi-bot bot added a commit that referenced this pull request Feb 2, 2024
close #7251, ref #7377

when pd leader frequently campaign leader, but etcd leader did not change.
We need to prevent this pd leader campaign and resign to another member.

Signed-off-by: husharp <jinhao.hu@pingcap.com>

Co-authored-by: husharp <jinhao.hu@pingcap.com>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pd can not select new leader which lead qps drop to zero when inject pdleader io delay 500ms last for 5mins
5 participants