Skip to content

Migrate Slots & Keys Optimizations #1217

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Jul 16, 2025

Conversation

vazois
Copy link
Collaborator

@vazois vazois commented May 22, 2025

This PR adds following optimizations and fixes to migrate operations

  • For both KEYS and SLOTS option, we now use a sketch to avoid keeping track of the actual keys in a dictionary which is expensive.
  • For SLOTS option enable support parallel scan tasks to speedup migration.
  • For SLOTS option ensure no keys are lost because of copy-update during the scan.
  • Expose api to suspend reviv.

NOTES:
When using the SLOTS option, we do not know in advance which keys need to be migrated.
We have to scan the store log and make sure any relevant keys are captured and transmitted to the target node.
Before starting the scan, the migration driver captures the BeginAddress (BA) and CurrentTailAddress (CTA).
This is done after the slot transitions to MIGRATING state and is coordinated under epoch protection.
The migration driver will perform a scan between (BA,CTA). For this reason, we need to ensure that we scan all records created within that region, including those that were copy-updated or deleted.
After gathering the associated keys, we lock them from write operations and perform an additional random read to ensure they are still live.
We transmit every live key to the target node and continue the scan from where we left off otherwise break.

@vazois vazois force-pushed the vazois/migrate-optimization branch 4 times, most recently from d2432f6 to be0fc3b Compare May 29, 2025 03:47
@vazois vazois force-pushed the vazois/migrate-optimization branch 4 times, most recently from 7024307 to 1d90f7f Compare June 7, 2025 02:25
@vazois vazois force-pushed the vazois/migrate-optimization branch 5 times, most recently from f6154e5 to 52887c8 Compare June 17, 2025 21:51
@vazois vazois force-pushed the vazois/migrate-optimization branch from ae926a0 to 9020d52 Compare June 19, 2025 00:34
@vazois vazois marked this pull request as ready for review June 19, 2025 03:00
@Copilot Copilot AI review requested due to automatic review settings June 19, 2025 03:00
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a bloom-filter–based sketch for key tracking during both KEYS and SLOTS migrations, adds parallel slot-scan tasks to accelerate slot migrations, and extends scan APIs to support cursors, max addresses, and tombstone visibility.

  • Replace MigratingKeysWorkingSet with a space-efficient Sketch and update migrate commands to use it
  • Enable parallel scan tasks in MigrationDriver for SLOTS option
  • Extend IterateLookup and related APIs with cursor, maxAddress, resetCursor, and returnTombstoned parameters

Reviewed Changes

Copilot reviewed 43 out of 43 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
test/Garnet.test.cluster/ClusterTestUtils.cs Added MigrateSlotsIndex helper; updated loop variable types
test/Garnet.test.cluster/ClusterTestContext.cs Added GenerateKeysWithPrefix and GenerateIncreasingSizeValues
test/Garnet.test.cluster/ClusterMigrateTests.cs Updated migrate calls to use new collection expressions
libs/storage/Tsavorite/cs/src/core/ClientSession/ClientSession.cs Extended IterateLookup and ScanCursor signatures
libs/storage/Tsavorite/cs/src/core/Allocator/*.cs Propagated returnTombstoned through scan iterator hierarchy
libs/server/Storage/Session/Common/ArrayKeyIterationFunctions.cs Extended client APIs for cursor, maxAddress, and tombstones
libs/server/Servers/GarnetServerOptions.cs Added ParallelMigrateTasks option
libs/host/* Added defaults and CLI support for ParallelMigrateTasks
libs/common/HashUtils.cs Added MurmurHash2x64A(Span<byte>) overload
libs/cluster/Session/RespClusterMigrateCommands.cs Switched non-importing key logic from continue to break
libs/cluster/Session/MigrateCommand.cs Replaced working set with Sketch in MIGRATE command parsing
libs/cluster/Server/Migration/* Major refactor to use Sketch, parallelize slot scans
libs/client/ClientSession/GarnetClientSessionIncremental.cs Removed default param for InitializeIterationBuffer
Comments suppressed due to low confidence (3)

libs/storage/Tsavorite/cs/src/core/Allocator/BlittableAllocatorImpl.cs:279

  • Parameter name returneTombstoned has a typo; it should be returnTombstoned to match the rest of the API.
        internal override bool ScanCursor<TScanFunctions>(TsavoriteKV<TKey, TValue, TStoreFunctions, BlittableAllocator<TKey, TValue, TStoreFunctions>> store,

libs/client/ClientSession/GarnetClientSessionIncremental.cs:48

  • Removing the = default default parameter breaks existing callers; consider restoring a default value or updating all call sites.
        public void InitializeIterationBuffer(TimeSpan iterationProgressFreq)

test/Garnet.test.cluster/ClusterMigrateTests.cs:602

  • The literal [slot] does not create a List<int> and will not match the List<int> parameter; consider using new List<int> { slot } or adjusting the method to accept an array.
            context.clusterTestUtils.MigrateSlots(sourcePort, targetPort, [slot], logger: context.logger);

@vazois vazois requested review from TedHartMS and badrishc June 19, 2025 03:13
@vazois vazois force-pushed the vazois/migrate-optimization branch 5 times, most recently from ba9ebb2 to 0d74a97 Compare June 27, 2025 16:57
@vazois vazois force-pushed the vazois/migrate-optimization branch from e96a7db to 17969c0 Compare July 8, 2025 22:51
@vazois vazois merged commit 50433a0 into microsoft:main Jul 16, 2025
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants