Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sending a task to dlq after a number of attempts #5367

Merged
merged 19 commits into from Feb 1, 2024
Merged

Conversation

prathyushpv
Copy link
Contributor

What changed?

Adding code to send a task to DLQ after a number of attempts.
This number can be configured through dynamic config.

Why?

Repeatedly failing task can be moved to DLQ and inspected.

How did you test it?

Unit tests

Potential risks

None

Is hotfix candidate?

No

@prathyushpv prathyushpv marked this pull request as ready for review January 29, 2024 23:02
@prathyushpv prathyushpv requested a review from a team as a code owner January 29, 2024 23:02
Copy link
Member

@yycptt yycptt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attempts right now is increased whenever an error is encountered during task processing, which can be expected errors like resource exhausted. I think those errors/attempts should not be taken into accounting when deciding if a task should be DLQed.

I'd propose maintaining a separate unexpected error count and use that to decide if task should be put into DLQ. We should also log the error count in the logs.

@prathyushpv
Copy link
Contributor Author

Attempts right now is increased whenever an error is encountered during task processing, which can be expected errors like resource exhausted. I think those errors/attempts should not be taken into accounting when deciding if a task should be DLQed.

I'd propose maintaining a separate unexpected error count and use that to decide if task should be put into DLQ. We should also log the error count in the logs.

That makes sense! I have made the changes.

service/history/queues/executable.go Outdated Show resolved Hide resolved
common/dynamicconfig/constants.go Outdated Show resolved Hide resolved
common/log/tag/tags.go Outdated Show resolved Hide resolved
service/history/configs/config.go Outdated Show resolved Hide resolved
service/history/queues/executable.go Outdated Show resolved Hide resolved
service/history/queues/queue_base_test.go Outdated Show resolved Hide resolved
service/history/queues/executable.go Outdated Show resolved Hide resolved
service/history/configs/config.go Outdated Show resolved Hide resolved
@@ -398,9 +406,17 @@ func (e *executableImpl) HandleErr(err error) (retErr error) {
return nil
}

e.attemptsWithUnexpectedError++
if e.attemptsWithUnexpectedError >= e.attemptsBeforeSendingToDlq() && e.dlqEnabled() {
e.logger.Error("Marking task as terminally failed after maximum number of attempts with unexpected errors, will send to DLQ",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you extract a constant for the log message used above "Marking task as terminally failed, will send to DLQ" and reuse it here, and then add a comment that this message should be kept in sync with the one in the operational guide in develop/docs/dlq.md? If this is different, people might miss it because we tell users to search for that exact string when looking for DLQ reasons.

service/history/queues/executable.go Outdated Show resolved Hide resolved
@prathyushpv prathyushpv merged commit 251778c into main Feb 1, 2024
57 checks passed
@prathyushpv prathyushpv deleted the ppv/executionCount branch February 1, 2024 01:18
ShahabT pushed a commit that referenced this pull request Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants