Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add random delay to ArchiveExecutionTask #3565

Merged
merged 1 commit into from Dec 8, 2022

Conversation

MichaelSnowden
Copy link
Contributor

What changed?

  1. I made the archival queue a scheduled queue instead of a timer queue.
  2. I added a random delay to each task we produce

Why?
I did this to prevent us from having a ton of archive execution tasks arrive concurrently if many workflows close simultaneously.

How did you test it?
I added unit tests that verify edge cases like setting the dynamic config upper bound to 0.

Potential risks
Does not run via any production entrypoints.

Is hotfix candidate?
No.

@MichaelSnowden MichaelSnowden marked this pull request as ready for review November 7, 2022 22:05
@MichaelSnowden MichaelSnowden requested a review from a team as a code owner November 7, 2022 22:05
@MichaelSnowden MichaelSnowden marked this pull request as draft November 8, 2022 23:54
@MichaelSnowden MichaelSnowden force-pushed the durable-archival/scheduled branch 2 times, most recently from 68982ee to 5400383 Compare December 2, 2022 00:17
@MichaelSnowden MichaelSnowden marked this pull request as ready for review December 2, 2022 00:17
Copy link
Member

@yycptt yycptt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Archival queue is an immediate queue, there's no delay by definition & task visibility timestamp for immediate tasks (which is only for metrics purpose) will be overwritten to time.Now() by shard context.

@@ -102,6 +103,7 @@ type (
namespaceRegistry namespace.Registry
mutableState MutableState
config *configs.Config
logger log.Logger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do it need logger here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably from an older revision. Removed, thanks for spotting!

service/history/configs/config.go Outdated Show resolved Hide resolved
@@ -198,10 +202,20 @@ func (r *TaskGeneratorImpl) GenerateWorkflowCloseTasks(
},
)
if r.config.DurableArchivalEnabled() {
delay := backoff.JitDuration(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backoff.JitDuration(delay, 1) / 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

@yux0
Copy link
Contributor

yux0 commented Dec 5, 2022

Archival queue is an immediate queue, there's no delay by definition & task visibility timestamp for immediate tasks (which is only for metrics purpose) will be overwritten to time.Now() by shard context.

With task processing rate limiter, do we still need the jitter here? I think this is the main reason we want to change it to scheduled queue.

@MichaelSnowden MichaelSnowden force-pushed the durable-archival/scheduled branch 2 times, most recently from 8cc6fb8 to 02ccc81 Compare December 5, 2022 21:44
ArchiveRequestRPS dynamicconfig.IntPropertyFn
ArchiveSignalTimeout dynamicconfig.DurationPropertyFn
DurableArchivalEnabled dynamicconfig.BoolPropertyFn
RandomArchiveExecutionDelayUpperBound dynamicconfig.DurationPropertyFn
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used? Remove?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, thanks!

@yycptt
Copy link
Member

yycptt commented Dec 7, 2022

Archival queue is an immediate queue, there's no delay by definition & task visibility timestamp for immediate tasks (which is only for metrics purpose) will be overwritten to time.Now() by shard context.

With task processing rate limiter, do we still need the jitter here? I think this is the main reason we want to change it to scheduled queue.

If we can avoid the spiky load in the first place, I think it's better than relying on the rate limiter to handle it after it occurs, which exists a possibility that the rate limiter can't handle it well (especially for tasks that needs to read history.) Plus, I don't think scheduled queue is a lot more expensive than immediate queues (especially when load is high).

@MichaelSnowden MichaelSnowden merged commit eb88af4 into master Dec 8, 2022
@MichaelSnowden MichaelSnowden deleted the durable-archival/scheduled branch December 8, 2022 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants