Fix possibly incorrect `is_replaying` values when processing empty WFTs by Sushisource · Pull Request #910 · temporalio/sdk-rust

Sushisource · 2025-04-29T22:16:22Z

Currently, everything passes except some UTs that specifically test WFT boundaries involving histories with empty WFTs, which this fix changes.

I tried this change with Python, and everything passes. My intuition is that this is probably a compatible change, but we could always flag it to be sure. If anyone can come up with a counterexample please do. It's possible that this change can potentially move jobs from one activation to another (without changing ordering), but I don't think that matters in any way that actually affects anything in terms of how user code is woken up (besides making is_replay be correct in places where it previously wasn't, which would've led to incorrect behavior like in the bug, or NDEs anyway)

Sushisource · 2025-04-29T22:17:14Z

+                        if !saw_command
+                            && next_next_event.event_type() == EventType::WorkflowTaskScheduled


This is the fix.

Before this change, if the history looked something like this (newest event at the bottom):

WF started

Full workflow task <- this is previous WFT started

Full activity sched/start/compl

Full workflow task

WFT Scheduled

WFT Started

I'd end up sending an activation to lang with the activity resolution but also with replaying = false. That's because this function here, which decides the task boundaries, was not properly considering the end of the last WFT through the activity events to the next WFT as the next sequence. It was also including the next (partial) WFT (and since that WFT was the end of history, decided replaying is now false when processing the events).

It did so because WFT "heartbeats" normally don't count as a "real" wft and are skipped over because they shouldn't cause spurious wakeups (ex: when LAs are running).

However, in this case, it really should count as a separate WFT, because the activity resolution happened in that sequence, and should be considered replaying, and then we should move on to the new (partial) WFT and set replaying to false at that point.

Sushisource · 2025-04-29T22:17:40Z

        assert_eq!(seq.len(), 6);
        let seq = next_check_peek(&mut update, 6);
-        assert_eq!(seq.len(), 13);
+        assert_eq!(seq.len(), 4);


This is the test that is updated to work with the new boundaries

cretz

LGTM. I can't think of an obvious way to break existing workflows that may inadvertently rely on the existing task-end-boundary expectation. We may want to consider an environment variable to be able to flip back for a release or two just in case.

* One UT fixed to show, others fail

Sushisource · 2025-05-06T21:02:13Z

Added emergency env var

Sushisource commented Apr 29, 2025

View reviewed changes

Sushisource force-pushed the fix-update-replay-nde branch 3 times, most recently from bb31a3c to fc0daae Compare April 29, 2025 22:34

Sushisource marked this pull request as ready for review May 5, 2025 21:29

Sushisource requested a review from a team as a code owner May 5, 2025 21:29

Sushisource force-pushed the fix-update-replay-nde branch 2 times, most recently from 86b8a84 to db9ecc1 Compare May 5, 2025 21:45

cretz approved these changes May 6, 2025

View reviewed changes

Sushisource added 8 commits May 6, 2025 13:59

Add repro test

6de8a9b

Implements fix & changes WFT boundaries

9f58c20

* One UT fixed to show, others fail

Fix remaining unit tests

293c59f

Add test with long la & update replay result fetching

258d295

Add check-in histories with old version to test

ee41305

Add wakeup count test

3d25489

Add some notes about this

ba8ae86

Add emergency flag

deec018

Sushisource force-pushed the fix-update-replay-nde branch from d7d371d to deec018 Compare May 6, 2025 21:02

Sushisource enabled auto-merge (squash) May 6, 2025 21:02

Sushisource merged commit 9af3cb5 into master May 6, 2025
17 checks passed

Sushisource deleted the fix-update-replay-nde branch May 6, 2025 21:10

Sushisource mentioned this pull request May 7, 2025

[Bug] Updates conflict with replay temporalio/sdk-python#848

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix possibly incorrect `is_replaying` values when processing empty WFTs#910

Fix possibly incorrect `is_replaying` values when processing empty WFTs#910
Sushisource merged 8 commits intomasterfrom
fix-update-replay-nde

Sushisource commented Apr 29, 2025 •

edited

Loading

Uh oh!

Sushisource Apr 29, 2025 •

edited

Loading

Uh oh!

Sushisource Apr 29, 2025

Uh oh!

cretz left a comment

Uh oh!

Sushisource commented May 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if !saw_command
		&& next_next_event.event_type() == EventType::WorkflowTaskScheduled

Conversation

Sushisource commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Sushisource Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Sushisource Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

cretz left a comment

Choose a reason for hiding this comment

Uh oh!

Sushisource commented May 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sushisource commented Apr 29, 2025 •

edited

Loading

Sushisource Apr 29, 2025 •

edited

Loading