Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schedulers,test: avoid some test branches not being reached and remove schedulePeerPr #8087

Merged
merged 9 commits into from
Jul 4, 2024

Conversation

lhy1024
Copy link
Contributor

@lhy1024 lhy1024 commented Apr 17, 2024

What problem does this PR solve?

Issue Number: Close #8073

What is changed and how does it work?

schedulePeerPr always be 1.0, which makes some test cannot reach transfer-leader branch

Check List

Tests

  • Unit test

Release note

None.

Copy link
Contributor

ti-chi-bot bot commented Apr 17, 2024

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • HuSharp

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot bot requested review from JmPotato and rleungx April 17, 2024 14:02
@ti-chi-bot ti-chi-bot bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 17, 2024
@@ -527,25 +534,12 @@ func checkHotWriteRegionScheduleByteRateOnly(re *require.Assertions, enablePlace
clearPendingInfluence(hb.(*hotScheduler))
tc.SetHotRegionScheduleLimit(int(opt.GetHotRegionScheduleLimit()))

for i := 0; i < 20; i++ {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is duplicated with L500-L522. But this branch, which tests the transfer leader, is not reached, so it is not updated before.

Comment on lines 714 to 716
pdServerCfg := tc.GetPDServerConfig()
pdServerCfg.FlowRoundByDigit = 8
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add config to avoid test branches not being reached

re.Equal("move-hot-write-leader", op.Desc())
operatorutil.CheckTransferLearner(re, op, operator.OpHotRegion, 8, 10)
re.Equal("move-hot-write-peer", op.Desc())
operatorutil.CheckTransferLearner(re, op, operator.OpHotRegion, 8, 0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may be scheduled to store 10 or store 11

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to make the result deterministic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link

codecov bot commented Apr 17, 2024

Codecov Report

Attention: Patch coverage is 82.69231% with 9 lines in your changes missing coverage. Please review.

Project coverage is 77.12%. Comparing base (6b25787) to head (8d6f0cc).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8087      +/-   ##
==========================================
- Coverage   77.21%   77.12%   -0.10%     
==========================================
  Files         470      470              
  Lines       61671    61681      +10     
==========================================
- Hits        47622    47574      -48     
- Misses      10468    10516      +48     
- Partials     3581     3591      +10     
Flag Coverage Δ
unittests 77.12% <82.69%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@ti-chi-bot ti-chi-bot bot added the status/LGT1 Indicates that a PR has LGTM 1. label May 20, 2024
Comment on lines 63 to 69
// schedulePeerPr the probability of schedule the hot peer.
schedulePeerPr = 0.66
schedulePeerPr = defaultSchedulePeerPr
// pendingAmpFactor will amplify the impact of pending influence, making scheduling slower or even serial when two stores are close together
pendingAmpFactor = 2.0
pendingAmpFactor = defaultPendingAmpFactor
// If the distribution of a dimension is below the corresponding stddev threshold, then scheduling will no longer be based on this dimension,
// as it implies that this dimension is sufficiently uniform.
stddevThreshold = 0.1
stddevThreshold = defaultStddevThreshold
// topnPosition is the position of the topn peer in the hot peer list.
// We use it to judge whether to schedule the hot peer in some cases.
topnPosition = 10
topnPosition = defaultTopnPosition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These variables are too complicated. Moreover, the current hot tests are hard to understand and debug.

Copy link
Contributor Author

@lhy1024 lhy1024 May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will try to reduce these variables and to make test clear in later prs.

Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 25, 2024
@lhy1024 lhy1024 changed the title tests: avoid some test branches not being reached schedulers,test: avoid some test branches not being reached and remove schedulePeerPr Jun 25, 2024
@lhy1024 lhy1024 requested review from rleungx and HuSharp June 25, 2024 12:55
Signed-off-by: lhy1024 <admin@liudos.us>
@lhy1024
Copy link
Contributor Author

lhy1024 commented Jun 27, 2024

@rleungx @HuSharp PTAL

ops, _ = hs.Schedule(tc, false)
re.Len(ops, 1)
operatorutil.CheckTransferPeer(re, ops[0], operator.OpHotRegion, 1, 4)
for i := 0; i < 100; i++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change it?

Copy link
Contributor Author

@lhy1024 lhy1024 Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func (h *hotScheduler) balanceHotWriteRegions(cluster sche.SchedulerCluster) []*operator.Operator {
// prefer to balance by peer
s := h.r.Intn(100)
switch {
case s < int(schedulePeerPr*100):
peerSolver := newBalanceSolver(h, cluster, utils.Write, movePeer)
ops := peerSolver.solve()
if len(ops) > 0 && peerSolver.tryAddPendingInfluence() {
return ops
}
default:
}
leaderSolver := newBalanceSolver(h, cluster, utils.Write, transferLeader)
ops := leaderSolver.solve()
if len(ops) > 0 && leaderSolver.tryAddPendingInfluence() {
return ops
}
hotSchedulerSkipCounter.Inc()
return nil
}

In master code, if a write peer was randomized but no create operator was created, another attempt was made to generate a write leader.
Now, to simplify the code, there is no such logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I added this check to avoid meeting panic when randomized write peer.

@@ -121,7 +122,7 @@ type baseHotScheduler struct {
// this records regionID which have pending Operator by operation type. During filterHotPeers, the hot peers won't
// be selected if its owner region is tracked in this attribute.
regionPendings map[uint64]*pendingInfluence
types []utils.RWType
types []resourceType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using a more clear name instead of resourceType?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any good ideas about this name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can merge rwTy and resourceType in another PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can merge rwTy and resourceType in another PR?

+1

return h.balanceHotReadRegions(cluster)
case utils.Write:
return h.balanceHotWriteRegions(cluster)
case writePeer:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to distinguish write?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// while the Leader and Follower are under different loads (usually the Leader consumes more CPU).
So we schedule the write leader based on the query, and we schedule the write peer based on the byte and key at the moment.

re.Equal("move-hot-write-leader", op.Desc())
operatorutil.CheckTransferLearner(re, op, operator.OpHotRegion, 8, 10)
re.Equal("move-hot-write-peer", op.Desc())
operatorutil.CheckTransferLearner(re, op, operator.OpHotRegion, 8, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to make the result deterministic?

Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
@lhy1024
Copy link
Contributor Author

lhy1024 commented Jul 3, 2024

/test pull-integration-realcluster-test

Copy link
Contributor

ti-chi-bot bot commented Jul 4, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: HuSharp, rleungx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

ti-chi-bot bot commented Jul 4, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-07-03 03:35:59.004964433 +0000 UTC m=+1381885.490453264: ☑️ agreed by rleungx.
  • 2024-07-04 02:45:47.588184446 +0000 UTC m=+1465274.073673278: ☑️ agreed by HuSharp.

@ti-chi-bot ti-chi-bot bot merged commit 86f8030 into tikv:master Jul 4, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved dco-signoff: yes lgtm release-note-none size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. status/LGT1 Indicates that a PR has LGTM 1.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tests: transfer hot region test not run
3 participants