Skip to content

schedule: add split-scatter for release-8.5-20251204-v8.5.4#10678

Merged
ti-chi-bot[bot] merged 9 commits into
tikv:release-8.5-20251204-v8.5.4from
lhy1024:backport/split-scatter-v8.5.4
May 20, 2026
Merged

schedule: add split-scatter for release-8.5-20251204-v8.5.4#10678
ti-chi-bot[bot] merged 9 commits into
tikv:release-8.5-20251204-v8.5.4from
lhy1024:backport/split-scatter-v8.5.4

Conversation

@lhy1024
Copy link
Copy Markdown
Member

@lhy1024 lhy1024 commented May 14, 2026

What problem does this PR solve?

Issue Number: ref #10621

What is changed and how does it work?

Cherry-pick #10621 to support load-based split-scatter on release-8.5-20251204-v8.5.4.

PD records split batches reported with SplitReason_LOAD after AskBatchSplit allocates split IDs, keeps pending split-scatter regions, and dispatches internal scatter operators with range-aware scatter groups.

This PR also adds the split-scatter schedule limit config and updates kvproto to pingcap/kvproto v0.0.0-20260518035033-ca8835cfa721, which includes pingcap/kvproto#1464.

Cherry-pick / dependency handling

Source PR: #10621, merge commit 38a0f9ab0a3e373c2da23fa7c556423b4a67a6db.
Target branch: release-8.5-20251204-v8.5.4.
Current head: 44949721a3483de54eec810979bbc5cba1532e38.

  • The source PR checker, scatter, server: add load-based split-scatter with range-aware group baseline #10621 does not change Go module files; this backport changes them only because the release branch needs the split-reason kvproto backport.
  • Updated root, tools/, and tests/integrations/ modules to github.com/pingcap/kvproto v0.0.0-20260518035033-ca8835cfa721.
  • Removed the temporary github.com/lhy1024/kvproto@07f3062a replace and stale checksums after pd: support split reason for release-8.5-20251208-v8.5.4 pingcap/kvproto#1464 was merged.
  • Restored the split-scatter metric tests to the source PR's prometheus/testutil style; root github.com/prometheus/client_model v0.6.1 remains indirect after go mod tidy.
  • No master-only affinity checker/operator code or checker metrics refactor was brought into this release backport.
  • MCS/NEXT_GEN split-scatter behavior remains out of scope; the existing TODOs are preserved and only the shared config getter is added for interface compatibility.

Check List

Tests

  • Unit test
  • Manual test

Manual test commands:

PATH="$PWD/.tools/bin:$PATH" GOTOOLCHAIN=go1.23.12 make check
.tools/bin/failpoint-ctl enable . && go test ./pkg/schedule/checker -run SplitScatter -count=1; rc=$?; .tools/bin/failpoint-ctl disable .; exit $rc
.tools/bin/failpoint-ctl enable . && (cd tests/integrations && GOTOOLCHAIN=go1.23.12 go test ./mcs/resourcemanager -run 'TestResourceManagerClientTestSuite/TestSwitchBurst' -count=1); rc=$?; .tools/bin/failpoint-ctl disable .; exit $rc
git diff --check

Code changes

  • Has the configuration change

Side effects

  • Increased code complexity

Related changes

Release note

PD now supports load-based split-scatter for regions split by TiKV load-based splitting.

(cherry picked from commit 38a0f9a)
Signed-off-by: lhy1024 <admin@liudos.us>
@ti-chi-bot ti-chi-bot Bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has signed the dco. labels May 14, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8ac03583-9c34-4163-8a7e-db9fc671d5e4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 14, 2026
@lhy1024
Copy link
Copy Markdown
Member Author

lhy1024 commented May 14, 2026

Cherry-pick conflict / adaptation notes

Source PR: #10621, merge commit 38a0f9ab0a3e373c2da23fa7c556423b4a67a6db.
Target branch: release-8.5-20251204-v8.5.4.
Current head: 54db6ee8407fbaecacd7cee60552b25c4401ef52.

I rechecked the actual cherry-pick conflicts by applying the source merge commit on the release branch with:

git cherry-pick -m 1 --no-commit 38a0f9ab0a3e373c2da23fa7c556423b4a67a6db

The actual content-conflict files are:

  • pkg/mcs/scheduling/server/config/config.go
  • pkg/schedule/checker/checker_controller.go
  • pkg/schedule/checker/metrics.go
  • pkg/schedule/config/config.go
  • pkg/schedule/config/config_provider.go
  • pkg/schedule/operator/kind.go
  • pkg/schedule/operator/operator_test.go
  • pkg/schedule/scatter/region_scatterer.go
  • pkg/schedule/scatter/region_scatterer_test.go
  • pkg/schedule/types/type.go
  • pkg/statistics/store_collection.go
  • server/config/persist_options.go

File-level parity was checked against source PR #10621. The business file set matches the source PR semantically. The only extra files in this backport are Go module dependency files required by the release branch kvproto update:

  • go.mod, go.sum
  • tools/go.mod, tools/go.sum
  • tests/integrations/go.mod, tests/integrations/go.sum

Risk classification for review

High / should review carefully:

  1. pkg/schedule/scatter/region_scatterer.go, pkg/schedule/scatter/region_scatterer_test.go: this is the most behavior-heavy conflict. The source PR is on top of master scatterer changes and affinity code, while release-8.5 has older scatterer APIs and no affinity scheduling. The resolution preserves internal split-scatter behavior, range-aware group baseline, running internal-op deltas, non-admin scatter operator creation, placement/store/leader constraints, and intentionally omits affinity-only logic. This is the main place to check for behavioral drift.

  2. pkg/schedule/checker/checker_controller.go, pkg/schedule/checker/metrics.go, and the added pkg/schedule/checker/split_scatter.go: the first two are actual conflict files; split_scatter.go is not a conflict file but is the core dispatch implementation. Review focus should be patrol insertion point, pending lifecycle, disabled behavior, retry/backoff, metrics, and whether the backport avoided master-only checker metrics / affinity scaffolding.

  3. server/cluster/cluster_worker.go: not a content-conflict file, but it is the integration boundary that records split-scatter batches only for pdpb.SplitReason_LOAD. Review focus should be that SIZE/admin/default splits remain unchanged, and that sourceWaitVersion / new region IDs are recorded after split ID allocation.

Medium / worth checking but mostly mechanical:

  1. Config surface: pkg/schedule/config/config.go, pkg/schedule/config/config_provider.go, server/config/persist_options.go, pkg/mcs/scheduling/server/config/config.go, pkg/statistics/store_collection.go, and config tests. The conflict hunks include both affinity config and split-scatter config from source/master. The resolution keeps release-8.5 config layout and adds only split-scatter-schedule-limit default/key/field/getter/setter/stat entry. Review focus should be default value, TTL getter behavior, and confirming affinity config did not slip in.

  2. Operator kind/type surface: pkg/schedule/operator/kind.go, pkg/schedule/operator/operator_test.go, pkg/schedule/operator/operator.go, pkg/schedule/operator/create_operator.go, and pkg/schedule/types/type.go. The conflict hunks include both affinity operator/checker additions and split-scatter additions. The resolution adds only OpSplitScatter, its string mappings, SplitScatterChecker, and non-admin split-scatter operator creation. Review focus should be scheduler kind precedence and that admin scatter behavior is unchanged.

  3. Dependency files: root, tools/, and tests/integrations modules were all updated to github.com/pingcap/kvproto v0.0.0-20260518035033-ca8835cfa721, which includes pd: support split reason for release-8.5-20251208-v8.5.4 pingcap/kvproto#1464. Review focus should be that the dependency update is consistent across modules and that the root github.com/prometheus/client_model v0.6.1 direct requirement is expected because the test helper imports github.com/prometheus/client_model/go directly.

Low / should not need deep review:

  1. pkg/mcs/scheduling/server/grpc_service.go and server/grpc_service.go: only TODOs are preserved for the MCS/NEXT_GEN path. No split-scatter behavior is enabled on the MCS forwarding path in this backport.

  2. Release API adaptations: release-8.5 NewController still accepts ruleManager and labeler, Operator.GetAdditionalInfo returns only string, and filter.NewCandidates requires an explicit *rand.Rand. These are compatibility adaptations to the release branch API shape; they should be compile/test covered and do not intentionally change split-scatter behavior.

  3. Test-only adaptations: pkg/schedule/checker/split_scatter_test.go uses a local metricValue helper instead of source PR's prometheus/testutil to avoid unrelated dependency churn. This is test-only.

  4. Follow-up review fixes in this backport reset the process-global pending gauge on split-scatter controller creation, skip recording when split-scatter is disabled, clear existing pending entries when disabled, throttle blocked dispatch by nextDispatchAt, and count/delay missing pending regions. These are local to split-scatter behavior and covered by split-scatter tests.

Conflict resolution details

  1. pkg/schedule/checker/checker_controller.go, pkg/schedule/checker/metrics.go: keep the release-8.5 controller flow, initialize the split-scatter controller, and insert only c.splitScatter.dispatchSplitScatterRegions() after pending processed regions. Split-scatter metrics were added in the release-8.5 metrics style without bringing in master-only checker/affinity scaffolding.

  2. pkg/schedule/config/config.go, pkg/schedule/config/config_provider.go, server/config/persist_options.go, pkg/mcs/scheduling/server/config/config.go, pkg/statistics/store_collection.go: keep release-8.5 config layout and add only split-scatter-schedule-limit default/key/field/getter/setter/stat entry. Affinity config was intentionally not backported.

  3. pkg/schedule/operator/kind.go, pkg/schedule/operator/operator_test.go, pkg/schedule/types/type.go: add only OpSplitScatter, its string mappings, split-scatter scheduler kind tests, and SplitScatterChecker. Affinity operator/checker additions were intentionally not backported.

  4. pkg/schedule/scatter/region_scatterer.go, pkg/schedule/scatter/region_scatterer_test.go: preserve source split-scatter behavior, keep release-8.5 API shape, and omit affinity-only logic.

Dependency handling

  1. release-8.5's original kvproto does not contain pdpb.SplitReason, so this backport updates root, tools/, and tests/integrations modules to github.com/pingcap/kvproto v0.0.0-20260518035033-ca8835cfa721, which includes pd: support split reason for release-8.5-20251208-v8.5.4 pingcap/kvproto#1464.

  2. The temporary github.com/lhy1024/kvproto@07f3062a replace and stale checksums were removed after pd: support split reason for release-8.5-20251208-v8.5.4 pingcap/kvproto#1464 was merged.

  3. The root github.com/prometheus/client_model v0.6.1 requirement is kept as direct because the backported split-scatter test uses github.com/prometheus/client_model/go directly through a local metricValue helper.

Unrelated master prerequisites intentionally not included

  • Affinity checker/operator/config changes.
  • Master-only checker metrics refactors.

Verification after conflict resolution

rg -n '<<<<<<<|=======|>>>>>>>' $(gh pr diff 10678 --repo tikv/pd --name-only)
PATH="$PWD/.tools/bin:$PATH" GOTOOLCHAIN=go1.23.12 make check
.tools/bin/failpoint-ctl enable . && go test ./pkg/schedule/checker -run SplitScatter -count=1; rc=$?; .tools/bin/failpoint-ctl disable .; exit $rc
.tools/bin/failpoint-ctl enable . && (cd tests/integrations && GOTOOLCHAIN=go1.23.12 go test ./mcs/resourcemanager -run 'TestResourceManagerClientTestSuite/TestSwitchBurst' -count=1); rc=$?; .tools/bin/failpoint-ctl disable .; exit $rc
git diff --check

Signed-off-by: lhy1024 <admin@liudos.us>
@lhy1024
Copy link
Copy Markdown
Member Author

lhy1024 commented May 15, 2026

image image

Signed-off-by: lhy1024 <admin@liudos.us>
opController *operator.Controller,
addPendingProcessedRegions func(needCheckLen bool, ids ...uint64),
) *splitScatterController {
return &splitScatterController{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gauge is process-global. Could we reset it when the split-scatter controller is created so a previous coordinator/leader term does not leave a stale nonzero pending count exposed?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

// Keep pendingMu short: cluster reads below can be slower and do not need
// to block pending updates. Stale snapshots are safe because candidates
// are rechecked before dispatch and delay/delete recheck the pending identity.
pendingSnapshot := make([]splitScatterPendingItem, 0, len(c.pending))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This snapshots all pending entries and later sorts candidates on every dispatch pass. With the 4096 pending cap, can we avoid the full scan/sort by tracking retry/expire order or throttling blocked batches?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled the throttling side in 4ab32c3 by adding a nextDispatchAt gate when dispatch is blocked or no candidate is ready. I kept the pending scan/sort structure unchanged because replacing it with retry/expire ordering is a larger source/master-level optimization, not specific to this release backport.

ordinaryPeer := make(map[uint64]uint64)
ordinaryLeader := make(map[uint64]uint64)
specialPeer := make(map[string]map[uint64]uint64)
for _, store := range r.cluster.GetStores() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rebuilds the range baseline by iterating all stores for each internal scatter. For regions in the same table/index group, can we reuse the scatter state within one dispatch pass to avoid repeated range count queries?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inherited from the source PR/current master, not introduced by this backport. Reusing scatter state across regions in one dispatch pass would require broader RegionScatterer API/state changes, so I prefer leaving it to a master follow-up rather than expanding this release backport.

// Check pending processed regions first.
c.checkPendingProcessedRegions()

c.splitScatter.dispatchSplitScatterRegions()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This runs on every patrol tick (10ms by default). Could we add a cheap backoff/next-retry gate when split-scatter is blocked so it does not add steady overhead to patrol?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled in 4ab32c3: split-scatter now has a cheap nextDispatchAt gate. When schedule limit blocks dispatch or no pending item is ready, patrol returns before another full collect pass until the retry backoff expires.

for _, pending := range pendingSnapshot {
regionID := pending.regionID
region := c.cluster.GetRegion(regionID)
if region == nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This skip makes most missing-region cases invisible to splitScatterDispatchRegionMissingCounter below; the pending item will just wait until TTL. Could we count or clean these missing entries here so the metric reflects the actual drop/delay reason?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing pending target/source regions are now counted with splitScatterDispatchRegionMissingCounter and delayed with retryAt, so they no longer stay invisible until TTL and also avoid repeated counting during backoff.

return
}
limit := c.cluster.GetCheckerConfig().GetSplitScatterScheduleLimit()
if limit == 0 {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the limit is set to 0, split-scatter is documented as disabled, but new split batches are still recorded and kept until TTL. Could we skip recording pending entries as well, or otherwise avoid accumulating pending work and counter noise while the feature is disabled?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RecordSplitScatterBatch skips recording when split-scatter is disabled, and dispatch clears any existing pending entries if the limit is later set to 0, updating the pending gauge.

Reset the process-global pending gauge when creating a split-scatter controller.

Avoid recording or retaining pending entries while split-scatter is disabled, add a cheap dispatch backoff for blocked pending work, and count missing pending regions when they are delayed.

Signed-off-by: lhy1024 <admin@liudos.us>
@lhy1024 lhy1024 force-pushed the backport/split-scatter-v8.5.4 branch from 681c716 to 4ab32c3 Compare May 18, 2026 07:34
lhy1024 added 3 commits May 18, 2026 15:48
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
@ti-chi-bot ti-chi-bot Bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label May 19, 2026
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
@ti-chi-bot ti-chi-bot Bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels May 20, 2026
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented May 20, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-05-19 04:06:33.833452822 +0000 UTC m=+237123.337583498: ☑️ agreed by rleungx.
  • 2026-05-20 02:01:18.663474232 +0000 UTC m=+316008.167604908: ☑️ agreed by bufferflies.

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented May 20, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bufferflies, niubell, rleungx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the approved label May 20, 2026
@ti-chi-bot ti-chi-bot Bot merged commit 399f778 into tikv:release-8.5-20251204-v8.5.4 May 20, 2026
18 of 19 checks passed
@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

❌ Patch coverage is 84.30610% with 121 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release-8.5-20251204-v8.5.4@34c6617). Learn more about missing BASE report.

Additional details and impacted files
@@                      Coverage Diff                       @@
##             release-8.5-20251204-v8.5.4   #10678   +/-   ##
==============================================================
  Coverage                               ?   77.82%           
==============================================================
  Files                                  ?      469           
  Lines                                  ?    63081           
  Branches                               ?        0           
==============================================================
  Hits                                   ?    49090           
  Misses                                 ?    10391           
  Partials                               ?     3600           
Flag Coverage Δ
unittests 77.82% <84.30%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants