Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resend history for pending standby activity workflow task #2796

Merged
merged 1 commit into from
May 12, 2022

Conversation

yycptt
Copy link
Member

@yycptt yycptt commented May 3, 2022

What changed?

  • Resend history instead of pushing to matching when pending standby activity/workflow task is between resend delay and discard delay.

Why?

  • Resend history instead of pushing to matching to avoid persisting too many activity/workflow tasks in matching when replication latency is high or during ns migration.

How did you test it?

  • Added unit test

Potential risks

  • More refresh & resend history call on history service

Is hotfix candidate?

  • no

@yycptt yycptt requested review from yux0 and wxing1292 May 3, 2022 00:05
@yycptt yycptt requested a review from a team as a code owner May 3, 2022 00:05
@@ -302,7 +302,7 @@ func (t *timerQueueStandbyTaskExecutor) executeActivityRetryTimerTask(
t.getCurrentTime,
t.config.StandbyTaskMissingEventsResendDelay(),
t.config.StandbyTaskMissingEventsDiscardDelay(),
t.pushActivity,
t.fetchHistoryFromRemote,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or only refresh? or even simply just no-op?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just arguing from load perspective - keep it no-op if possible.
In case of replication delay, it seems to me that fetchHistoryFromRemote will amplify load.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on what is the common reason we need to fetch from remote? Is it replication delay or missing replication task?
If it is replication delay, we probably should keep waiting, which is no-op.
If it is missing task, then we should fetch, but we should only fetch once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synced offline with @yiminc and @wxing1292 last week. We think resend is still needed here and for majority of the case after resending once, the task will be unblocked.

Using no-op is probably not a good idea as it will likely block the queue until the discard time and still send the task to matching.

@yycptt yycptt requested a review from meiliang86 May 3, 2022 00:19
@yycptt yycptt merged commit 2b86009 into temporalio:master May 12, 2022
@yycptt yycptt deleted the resend-activity-workflow-task branch May 12, 2022 05:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants