Fix expired task handing in fairTaskReader by dnr · Pull Request #9775 · temporalio/temporal

dnr · 2026-04-01T21:40:48Z

What changed?

Instead of dropping expired tasks immediately, pass them through mergeTasksLocked and treat them as already-acked, so that the read level and ack level can be updated appropriately.

Why?

As the previous comment suggests, reading a batch of entirely expired tasks would lead to incorrect behavior: since the acks were not placed in outstandingTasks, the read level wouldn't be updated, and we'd re-read the expired tasks in a loop, i.e. the reader would be stuck. (Unless a task was written within that range, that would unstick it.) Even if we read a batch that wasn't all expired, but ended in a long stretch of expired tasks, we wouldn't update the read level as much as we could.

This adds expired tasks as acks, and then adds a call to advanceAckLevelLocked, so the read level can be pushed up, and we can advance the ack level over a stretch of expired tasks as well.

How did you test it?

built
run locally and tested manually
covered by existing tests
added new unit test(s) – and tested it fails without the fix
added new functional test(s)

rkannan82 · 2026-04-01T22:54:00Z

service/matching/backlog_manager_test.go

 		"backlog count should not be incremented")
 }

+func (s *BacklogManagerTestSuite) TestSkipExpiredTasks_AllExpiredThenValid() {


Suggest restructuring this test so its a bit easier to read.

type taskBlock struct {
count int
expired bool
}

func expiredBlock(n int) taskBlock { return taskBlock{count: n, expired: true} }
func validBlock(n int) taskBlock { return taskBlock{count: n, expired: false} }

func (s *BacklogManagerTestSuite) TestSkipExpiredTasks_AllExpiredThenValid() {
s.testSkipExpiredTasks(/batchSize/ 10, expiredBlock(33), validBlock(3))
}

func (s *BacklogManagerTestSuite) TestSkipExpiredTasks_ValidExpiredValid() {
s.testSkipExpiredTasks(/batchSize/ 10, validBlock(3), expiredBlock(33), validBlock(3))
}

func (s *BacklogManagerTestSuite) TestSkipExpiredTasks_AllExpired() {
s.testSkipExpiredTasks(/batchSize/ 10, expiredBlock(20))
}

The call sites now read like English: "batch size 10, 33 expired then 3 valid."

Inside the helper, replace the pattern expansion with a flat loop over the runs:

func (s *BacklogManagerTestSuite) testSkipExpiredTasks(batchSize int, runs ...taskRun) {
// ...setup...

var dbTasks []*persistencespb.AllocatedTaskInfo var numValid int id := int64(1) for _, run := range runs { for range run.count { task := &persistencespb.AllocatedTaskInfo{ TaskId: id, Data: &persistencespb.TaskInfo{ CreateTime: timestamp.TimeNowPtrUtcAddSeconds(-3600), }, } if run.expired { task.Data.ExpiryTime = timestamp.TimeNowPtrUtcAddSeconds(-60) } else { task.Data.ExpiryTime = timestamp.TimeNowPtrUtcAddSeconds(3600) numValid++ } if s.fairness { task.TaskPass = id * passMultiplier } dbTasks = append(dbTasks, task) id++ } } // ...rest unchanged...

}

## What changed? Instead of dropping expired tasks immediately, pass them through mergeTasksLocked and treat them as already-acked, so that the read level and ack level can be updated appropriately. ## Why? As the previous comment suggests, reading a batch of entirely expired tasks would lead to incorrect behavior: since the acks were not placed in outstandingTasks, the read level wouldn't be updated, and we'd re-read the expired tasks in a loop, i.e. the reader would be stuck. (Unless a task was written within that range, that would unstick it.) Even if we read a batch that wasn't all expired, but ended in a long stretch of expired tasks, we wouldn't update the read level as much as we could. This adds expired tasks as acks, and then adds a call to advanceAckLevelLocked, so the read level can be pushed up, and we can advance the ack level over a stretch of expired tasks as well. ## How did you test it? - [ ] built - [ ] run locally and tested manually - [ ] covered by existing tests - [x] added new unit test(s) – and tested it fails without the fix - [ ] added new functional test(s)

## What changed? Instead of dropping expired tasks immediately, pass them through mergeTasksLocked and treat them as already-acked, so that the read level and ack level can be updated appropriately. ## Why? As the previous comment suggests, reading a batch of entirely expired tasks would lead to incorrect behavior: since the acks were not placed in outstandingTasks, the read level wouldn't be updated, and we'd re-read the expired tasks in a loop, i.e. the reader would be stuck. (Unless a task was written within that range, that would unstick it.) Even if we read a batch that wasn't all expired, but ended in a long stretch of expired tasks, we wouldn't update the read level as much as we could. This adds expired tasks as acks, and then adds a call to advanceAckLevelLocked, so the read level can be pushed up, and we can advance the ack level over a stretch of expired tasks as well. ## How did you test it? - [ ] built - [ ] run locally and tested manually - [ ] covered by existing tests - [x] added new unit test(s) – and tested it fails without the fix - [ ] added new functional test(s) (cherry picked from commit 0b372d5)

) ## What changed? Instead of dropping expired tasks immediately, pass them through mergeTasksLocked and treat them as already-acked, so that the read level and ack level can be updated appropriately. ## Why? As the previous comment suggests, reading a batch of entirely expired tasks would lead to incorrect behavior: since the acks were not placed in outstandingTasks, the read level wouldn't be updated, and we'd re-read the expired tasks in a loop, i.e. the reader would be stuck. (Unless a task was written within that range, that would unstick it.) Even if we read a batch that wasn't all expired, but ended in a long stretch of expired tasks, we wouldn't update the read level as much as we could. This adds expired tasks as acks, and then adds a call to advanceAckLevelLocked, so the read level can be pushed up, and we can advance the ack level over a stretch of expired tasks as well. ## How did you test it? - [ ] built - [ ] run locally and tested manually - [ ] covered by existing tests - [x] added new unit test(s) – and tested it fails without the fix - [ ] added new functional test(s) Co-authored-by: David Reiss <david@temporal.io>

Fix expired task handing in fairTaskReader

b3b48de

dnr requested a review from a team as a code owner April 1, 2026 21:40

lint

66e202c

rkannan82 approved these changes Apr 1, 2026

View reviewed changes

dnr merged commit 0b372d5 into temporalio:main Apr 1, 2026
46 checks passed

dnr deleted the fairexpired branch April 1, 2026 22:50

rkannan82 reviewed Apr 1, 2026

View reviewed changes

pashafateev mentioned this pull request Apr 1, 2026

Patch cloud/v1.31.0-152 with security, scheduler, and fairTaskReader fixes #9777

Closed

temporal-cicd bot mentioned this pull request Apr 2, 2026

Patch v1.31.0-152.3 #9791

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix expired task handing in fairTaskReader#9775

Fix expired task handing in fairTaskReader#9775
dnr merged 2 commits intotemporalio:mainfrom
dnr:fairexpired

dnr commented Apr 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

rkannan82 Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dnr commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed?

Why?

How did you test it?

Uh oh!

Uh oh!

rkannan82 Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dnr commented Apr 1, 2026 •

edited

Loading