Skip to content

Cluster Test Stability Fixes #1261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Jul 10, 2025
Merged

Cluster Test Stability Fixes #1261

merged 30 commits into from
Jul 10, 2025

Conversation

vazois
Copy link
Collaborator

@vazois vazois commented Jun 20, 2025

Misc fixes for cluster tests.

  • Ensure callback error is logged correctly across different OS.
  • Validate metadata at insertion and continuous retry before sending checkpoint.
  • Simplify and cleanup migrate tests.

@vazois vazois force-pushed the vazois/cluster-test-stability branch from 4a6403b to dca600f Compare June 20, 2025 18:06
@vazois vazois changed the title Vazois/cluster test stability Cluster Test Stability Jun 20, 2025
@vazois vazois changed the title Cluster Test Stability Cluster Test Stability Fixes Jun 20, 2025
@vazois vazois force-pushed the vazois/cluster-test-stability branch 2 times, most recently from 3ae8876 to 1db9294 Compare June 24, 2025 18:58
@vazois vazois force-pushed the vazois/cluster-test-stability branch from 3c9096f to 8559874 Compare July 2, 2025 20:27
@vazois vazois marked this pull request as ready for review July 8, 2025 18:04
@Copilot Copilot AI review requested due to automatic review settings July 8, 2025 18:04
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves cluster test stability by:

  • Adding OS-specific error logging and metadata validation with retries
  • Refactoring and simplifying cluster migration tests and utilities
  • Exposing a toggle (useNativeDeviceLinux) for Linux native device usage in tests

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/Garnet.test/TestUtils.cs Added useNativeDeviceLinux parameter to cluster setup
test/Garnet.test.cluster/ReplicationTests/ClusterReplicationBaseTests.cs Refactored primary/replica indexing and extended timeouts
test/Garnet.test.cluster/RedirectTests/ClusterSlotVerificationTests.cs Introduced AssertSlotsNotAssigned and ClusterState helper
test/Garnet.test.cluster/RedirectTests/BaseCommand.cs Updated single‐slot request setup to use MSET
test/Garnet.test.cluster/ClusterTestUtils.cs Enhanced TLS support, new session creation, added failover wait
test/Garnet.test.cluster/ClusterTestContext.cs Propagated useNativeDeviceLinux through instance creation
test/Garnet.test.cluster/ClusterMigrateTests.cs Simplified migrate tests to continuous read/write tasks
test/Garnet.test.cluster/ClusterMigrateTLSTests.cs Adapted TLS migration tests for async continuous model
playground/TstRunner/Program.cs Updated RunTest signature and slicing test cases
libs/storage/Tsavorite/.../DeviceLogCommitCheckpointManager.cs Added Unix strerror fallback for overlapped IO logging
libs/cluster/Server/.../ReceiveCheckpointHandler.cs Added Unix strerror fallback for overlapped IO logging
libs/cluster/Server/.../ReplicaSyncSession.cs Retries checkpoint‐metadata send until success
libs/cluster/Server/.../GarnetClusterCheckpointManager.cs Changed metadata‐read warnings to errors
libs/cluster/Server/.../CheckpointStore.cs Validates metadata at insertion and logs deletion warnings
libs/cluster/Server/ClusterUtils.cs Added Unix strerror fallback for overlapped IO logging
Comments suppressed due to low confidence (1)

test/Garnet.test.cluster/ClusterTestUtils.cs:2939

  • The variable context is not defined in this scope. It looks like this method resides in ClusterTestUtils and should directly call GetReplicationInfo, BackOff, and accept the logger parameter instead of referencing an undefined context.
                var infoItem = context.clusterTestUtils.GetReplicationInfo(nodeIndex, [ReplicationInfoItem.LAST_FAILOVER_STATE], logger: context.logger);

@vazois vazois force-pushed the vazois/cluster-test-stability branch from 43ec73c to fbc9f24 Compare July 8, 2025 18:19
@vazois vazois force-pushed the vazois/cluster-test-stability branch from fbc9f24 to 0383b53 Compare July 8, 2025 18:19
@TalZaccai TalZaccai self-requested a review July 8, 2025 22:42
Copy link
Contributor

@TalZaccai TalZaccai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with a few minor suggestions

TalZaccai and others added 4 commits July 9, 2025 20:37
* Updating Azure.Identity version & bumping Garnet version

* Empty commit
* Save large System.Linq.Async depedency by implementing FirstAsync ourselves.

* Add .ConfgureAwait(false) matching the other calls.

---------

Co-authored-by: Tal Zaccai <talzacc@microsoft.com>
* ZDIFF of a key against nonexisting keys should return the original contents.
ZDIFF 1 zset should return contents of zset.

* ZINTERSTORE should return empty array, not WRONGTYPE error, when given nonexsting keys in input.

* ZRANGEBYLEX assertion when the minimum was 'larger' than the maximum value in the sorted set.

* it's faster to check the BYLEX condition in the assertion.
* enable support for HEXPIRE variants with txn

* add txn hexpire test
@vazois vazois force-pushed the vazois/cluster-test-stability branch from a3fd5a8 to 2a5c004 Compare July 10, 2025 03:37
@vazois vazois merged commit 0aaa7c7 into main Jul 10, 2025
33 checks passed
@vazois vazois deleted the vazois/cluster-test-stability branch July 10, 2025 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants