Skip to content

Ensure heartbeats aren't flushed on successful activity completion#1019

Merged
Sushisource merged 4 commits intomasterfrom
no-hb-flush-on-success
Sep 26, 2025
Merged

Ensure heartbeats aren't flushed on successful activity completion#1019
Sushisource merged 4 commits intomasterfrom
no-hb-flush-on-success

Conversation

@Sushisource
Copy link
Copy Markdown
Member

@Sushisource Sushisource commented Sep 24, 2025

What was changed

In title

Why?

Pointless, since these details are just erased after the activity completes successfully, and it imposes extra cost on cloud users for no reason.

Checklist

  1. Closes [Feature Request] Do not send heartbeats on activity success #1017

  2. How was this tested:
    Added unit test. Can't integration test because I can't prove a negative, but, I seemed to discover a server bug with the case where flushing should happen in the process.

  3. Any docs updates needed?

@Sushisource Sushisource requested a review from a team as a code owner September 24, 2025 22:25
Comment thread core/src/worker/activities/activity_heartbeat_manager.rs Outdated
@Sushisource Sushisource force-pushed the no-hb-flush-on-success branch from 85f9c23 to 798aec8 Compare September 25, 2025 00:12
Comment on lines +336 to +340
let should_flush = !known_not_found
&& !matches!(
&status,
aer::Status::Completed(_) | aer::Status::WillCompleteAsync(_)
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrmm, digging into known_not_found, I'm thinking we also should not send heartbeat if this was a canceled with a cancel reason of "timed out" or "reset", or if this was a canceled reason of "cancelled" and we're bubbling it out as such.

May be worth confirming if Go and Java do the same with regards to how they choose whether to flush heartbeat.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go's logic is the same as this. IMO it's not worth trying to be extra clever here. Timed out I agree is probably just wasted, but even the reset case might have some edge situation where you'd want to keep it (like resetting just attempts, not progress - don't know if that's possible but could be at some point).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see canceled somehow wanting to send since the RPC would be a success (even if it is dropped a few ms later when we respond with canceled). For timed out and reset, those are known-not-found IIUC (meaning the activity task is gone) and the heartbeat RPC will always fail IIUC. I think we should consider updating SDKs to not call the heartbeat RPC in situations they know will fail on the server, but it is technically harmless IIUC. Won't block the PR on it, but it's inefficient.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the timeout case known_not_found should already end up being true here

Copy link
Copy Markdown
Contributor

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we're sending too many unnecessary heartbeats for users, but it's non-blocking

@Sushisource Sushisource enabled auto-merge (squash) September 26, 2025 16:09
@Sushisource Sushisource merged commit 0bfde18 into master Sep 26, 2025
18 of 19 checks passed
@Sushisource Sushisource deleted the no-hb-flush-on-success branch September 26, 2025 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Do not send heartbeats on activity success

3 participants