Backport code to drop internal errors encountered during task processing #5385

tdeebswihart · 2024-02-01T21:25:49Z

What changed?

Internal errors encountered during task processing will be dropped when this new config is enabled.

Why?

These errors represent unprocessable tasks, so should not block our task
queues.

How did you test it?

Potential risks

We're not 100% certain that we only return internal errors when a task
is unprocessable, so this will be enabled by dynamicconfig for now.

Is hotfix candidate?

…ed (#5382) Internal errors encountered during task processing will be dropped (or sent to the DLQ) when this new config is enabled. These errors represent unprocessable tasks, so should not block our task queues. We're not 100% certain that we only return internal errors when a task is unprocessable, so this will be enabled by dynamicconfig for now.

This is more accurate for the behavior in 1.22.x but I want to use the same DC property

ast2023 · 2024-02-01T23:00:26Z

service/history/queues/executable.go

@@ -341,6 +348,13 @@ func (e *executableImpl) HandleErr(err error) (retErr error) {
 		e.logger.Error("Drop task due to serialization error", tag.Error(err))
 		return nil
 	}
+	if common.IsInternalError(err) {
+		e.logger.Error("Encountered internal error processing tasks", tag.Error(err))


should not this log.Error be inside the if e.dropInternalErrors()? if you are not returning you log the error anyway in line 361.

Good catch! Looks like we missed this in my patch to v1.23 as well. I'll follow up with 1.23 to fix that when I next have a chance

I think idea here is that even when dropInternalErrors is false, we still log and emit metric for the internal errors so that we can be confident that they are safe to drop. and then enable the flag. It's a shadow mode basically.

Having this log outside dropInternalErrors makes it easier to filter logs and only look at internal error logs. With only the log on L361, I don't think there's a good way to audit just the internal errors. The Error() method of an InternalError doesn't say it's an internal error 🤦.

We can add extra tags, if double logging is a concern. The volume of InternalError should be ~0 though.

We could add a tags.ErrorType tag to the main log message here. That'd give us a single log that handles both cases.

I'm going to do so in a separate PR and port over the code from https://github.com/temporalio/temporal/pull/5234/files#diff-3045f1928c46472037eb67e844c90f8745e01cd321e058ba212e3ea1a6b5147fR51 to unwrap things like fmt.wrapErr

Put up a pr at #5400

ast2023 · 2024-02-01T23:05:38Z

service/history/queues/executable_test.go

@@ -297,6 +303,34 @@ func (s *executableSuite) TestExecuteHandleErr_Corrupted() {
 	s.NoError(executable.HandleErr(err))
 }

+func (s *executableSuite) TestExecute_DropsInternalErrors_WhenEnabled() {
+	executable := s.newTestExecutable(func(p *params) {


I would call it dropErrorsExecutable

ast2023 · 2024-02-01T23:06:55Z

service/history/queues/executable_test.go

+		},
+	)
+
+	s.NoError(executable.HandleErr(executable.Execute()))


I would introduce variable for executable.HandleErr result. This way action will be separated from assertion.

ast2023 · 2024-02-01T23:08:43Z

service/history/queues/memory_scheduled_queue_test.go

@@ -184,6 +184,7 @@ func (s *memoryScheduledQueueSuite) newSpeculativeWorkflowTaskTimeoutTestExecuta
 			nil,
 			nil,
 			nil,
+			func() bool { return false },


this is "drop internal errors", right?

ast2023 · 2024-02-01T23:11:10Z

service/history/queues/slice_test.go

@@ -69,7 +69,7 @@ func (s *sliceSuite) SetupTest() {
 	s.controller = gomock.NewController(s.T())

 	s.executableInitializer = func(readerID int64, t tasks.Task) Executable {
-		return NewExecutable(readerID, t, nil, nil, nil, NewNoopPriorityAssigner(), clock.NewRealTimeSource(), nil, nil, nil, metrics.NoopMetricsHandler)
+		return NewExecutable(readerID, t, nil, nil, nil, NewNoopPriorityAssigner(), clock.NewRealTimeSource(), nil, nil, nil, metrics.NoopMetricsHandler, func() bool { return false })


You can create a function doDropInternalErrors or something like this.

tdeebswihart and others added 2 commits February 1, 2024 13:22

Use drop instead of DLQ everywhere but the public config value

58b6e6f

This is more accurate for the behavior in 1.22.x but I want to use the same DC property

tdeebswihart requested a review from yycptt February 1, 2024 21:25

tdeebswihart requested a review from a team as a code owner February 1, 2024 21:25

Grab the config from the shard object instead

25eef11

yycptt approved these changes Feb 1, 2024

View reviewed changes

tdeebswihart merged commit 64fe53c into release/v1.22.x Feb 1, 2024
4 checks passed

tdeebswihart deleted the tds/v1.22.x-backport-drop-internalerror branch February 1, 2024 23:00

ast2023 reviewed Feb 1, 2024

View reviewed changes

ast2023 approved these changes Feb 1, 2024

View reviewed changes

rodrigozhou added the release/1.22.5 label Feb 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport code to drop internal errors encountered during task processing #5385

Backport code to drop internal errors encountered during task processing #5385

tdeebswihart commented Feb 1, 2024

ast2023 Feb 1, 2024

tdeebswihart Feb 1, 2024

yycptt Feb 2, 2024 •

edited

Loading

tdeebswihart Feb 7, 2024

tdeebswihart Feb 7, 2024

ast2023 Feb 1, 2024

ast2023 Feb 1, 2024

ast2023 Feb 1, 2024

ast2023 Feb 1, 2024

Backport code to drop internal errors encountered during task processing #5385

Backport code to drop internal errors encountered during task processing #5385

Conversation

tdeebswihart commented Feb 1, 2024

What changed?

Why?

How did you test it?

Potential risks

Is hotfix candidate?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yycptt Feb 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yycptt Feb 2, 2024 •

edited

Loading