Open
Description
Description
This test has been identified as flaky in the CI pipeline.
Test:
Failure Rate: 2 failures out of 87 runs (97.7% success rate)
Average Duration: 48.32s
Test Details
- Test Class:
- Test Method:
- Test Categories: Functional, AzureStorage, Lease
Test Description
This test verifies lease-based queue balancer behavior during unexpected node failures by:
- Starting with 6 queues and 4 silos (expecting 1-2 queues per silo)
- Killing one silo and verifying rebalancing (3 silos, 2 queues each)
- Killing another silo and verifying rebalancing (2 silos, 3 queues each)
- Starting a new silo and verifying rebalancing (3 silos, 2 queues each)
Test Configuration
- Lease length: 15 seconds
- Lease renew period: 10 seconds
- Lease acquisition period: 10 seconds
- Test timeout: 2 minutes
Failure Pattern
The test has a high average duration (48.32s) and involves:
- Azure Blob lease operations
- Cluster membership changes through silo kills
- Waiting for lease rebalancing to occur
- Multiple timeout-based waits for agent ownership verification
Failures may be related to:
- Azure Storage transient failures or throttling
- Lease acquisition/renewal timing issues
- Cluster membership propagation delays
- Test environment performance affecting timing assumptions
Next Steps
- Investigate Azure Blob lease operation reliability in test environment
- Review timeout values and consider increasing for CI environments
- Add diagnostic logging for lease state transitions
- Consider retry logic for transient Azure Storage failures
- Analyze if 15-second lease length is too aggressive for tests
Related
- Test file:
Metadata
Metadata
Assignees
Labels
No labels