feat(workflow): Handle unhandled rejections in workflow code #415

bergundy · 2021-11-30T23:19:52Z

This PR makes the best effort to associate unhandled rejections from workflow code to a specific runId.
It also makes the unhandled rejection behavior consistent between node 14 and 16 and propagates failure back to the user.
Previously, in node 16 the process would crash and in node 14 we would incorrectly ignore rejections leading to unexpected workflow behavior (see the correct behavior in the added tests).

bergundy · 2021-11-30T23:20:32Z

The solution here is a bit complex and might not cover enough cases in the wild.
I'm looking into alternatives, this is the best I could come up with for now.

bergundy · 2021-12-01T03:33:09Z

v8 has Object::GetCreationContext() but I don't see it exposed in node, we might need native code to access this method.

cretz · 2021-12-01T14:24:29Z

packages/test/src/workflows/unhandled-rejection.ts

+  })();
+
+  await p1;
+  await p2;


Does this not just rethrow here? Before this PR, is this treated any different than if I just replaced this line with throw new Error('whatever')?

The problem is that the second async function throws without anything catching it.
The way v8 deals with it is by providing a hook for unhandled rejections: https://v8.github.io/api/head/classv8_1_1Isolate.html#a702f0ba4e5dee8a98aeb92239d58784e.
This is exposed in node with process.on('unhandledRejection').

But since that promise is awaited on here, it throws here right? If you removed await p2 I think it'd be unhandled, but the promise is explicitly awaited on.

it's awaited on too late, only after the activity resolves which causes an unhandled rejection

cretz · 2021-12-01T14:26:45Z

packages/worker/src/workflow/threaded-vm.ts

+      this.requestIdToCompletion = new Map();
+      for (const completion of completions) {
+        completion.reject(
+          new UnexpectedError(


Sorry I am unfamiliar with how the threads work here. Is there no way we can capture an error thrown from an async workflow instead of having it terminate a worker thread? Can the workflow function not be wrapped in a try+await+catch inside the thread? If concerned about top-level code throwing, can the entire require be wrapped in a try+await+catch?

We terminate the thread if it we can't determine the context that threw the handled error to avoid accidentally completing the activation successfully.
There's no way to catch these errors unfortunately.

We terminate the thread if it we can't determine the context that threw the handled error to avoid accidentally completing the activation successfully.

I am not following, sorry. In my head, I don't see why we'd ever allow user code to terminate a thread unless it's a very fatal error. We can't wrap everything a user may do in a recoverable scenario? Maybe have one error path that completes activation if you can determine the context and another that logs/swallows or whatever without doing something that could fail the whole worker.

Any workflow that can fail the worker or mess with the worker thread pool or whatever in any SDK, especially for something as trivial as a thrown exception in a promise not awaited on, seems bad. Maybe I'm misunderstanding the use here and overthinking "fail the worker".

Sushisource · 2021-12-02T17:53:25Z

packages/test/src/run-a-worker.ts

 import * as activities from './activities';

 async function main() {
+  if (['1', 'y', 'yes', 't', 'true'].includes((process.env.DEBUG ?? '').toLowerCase())) {


IMO Just checking for existence is much better than this dance. Easy enough to clear an env var if you want to set it to false.

Also just DEBUG is too generic I think. Something more temporal specific might be good.

FWIW, don't use TEMPORAL_DEBUG. That env var is used in Java and Go SDKs to remove the workflow deadlock timer so code can be stepped slowly. (and maybe we will have the same here if/when such a time check comes about if not already there)

Sushisource · 2021-12-02T17:58:56Z

packages/worker/src/workflow/workflow-worker-thread.ts

+      const runId = match[1];
+      const workflow = workflowByRunId.get(runId);
+      if (workflow !== undefined) {
+        console.log('found workflow', runId);


Leftover console log?

Sushisource · 2021-12-02T18:00:13Z

packages/worker/src/workflow/vm.ts

+    // Apparently nextTick does not get it triggered so we use setTimeout here.
+    await new Promise((resolve) => setTimeout(resolve, 0));


Seems odd. Maybe we could set a flag in the handler, and run next tick until we see the flag set (then unset it again) ?

I think it might not be enough

bergundy · 2021-12-07T06:51:53Z

At this point my proposed solution seems both incomplete and flaky.
I'm going to bench this for now because I can't find a reliable way to handle unhandled rejections on time.
Looks like even setTimeout doesn't work if many workflows are run concurrently.

For now I think the best thing to do is crash the worker in node 14 so the behavior is at least consistent with node 16 and we don't run into issues where workflows are stuck and cannot be replayed.
This would be the case if errors were turned into workflow task failures instead of failing the entire workflow (as they are treated in the other SDKs).

…-worker

Sushisource · 2021-12-10T21:06:48Z

packages/worker/src/workflow/workflow-worker-thread.ts

+  const runId = ctor('return __TEMPORAL__.runId')();
+  if (runId !== undefined) {
+    const workflow = workflowByRunId.get(runId);
+    if (workflow !== undefined) {


The else here seems like a weird situation that we might want to log on or something.

I logged the runId below

…workflow

bergundy self-assigned this Nov 30, 2021

cretz reviewed Dec 1, 2021

View reviewed changes

bergundy force-pushed the unhandled-rejections branch from 135c1ec to 14f386c Compare December 1, 2021 15:21

Sushisource reviewed Dec 2, 2021

View reviewed changes

bergundy mentioned this pull request Dec 5, 2021

test: Add helper script: run-a-workflow and add debug logger to run-a-worker #409

Closed

bergundy added 2 commits December 10, 2021 10:57

test: Add helper script: run-a-workflow and add debug logger to run-a…

9c6ca89

…-worker

feat(workflow): Handle unhandled rejections in workflow code

d071636

bergundy force-pushed the unhandled-rejections branch from 14f386c to bd41dd8 Compare December 10, 2021 20:57

bergundy requested review from cretz and Sushisource December 10, 2021 21:01

bergundy force-pushed the unhandled-rejections branch from bd41dd8 to 7d7671e Compare December 10, 2021 21:03

Sushisource reviewed Dec 10, 2021

View reviewed changes

bergundy added 2 commits December 10, 2021 13:22

Make tracking the workflow that caused unhandled rejection more reliable

f85c2cb

Log the runId if an unhandled rejection is caught for a non existent …

d678c37

…workflow

bergundy force-pushed the unhandled-rejections branch from 7d7671e to d678c37 Compare December 10, 2021 21:28

cretz approved these changes Dec 10, 2021

View reviewed changes

Remove DEBUG env var in run-a-worker

202b958

bergundy enabled auto-merge (squash) December 10, 2021 21:51

bergundy disabled auto-merge December 10, 2021 21:51

bergundy enabled auto-merge (squash) December 10, 2021 21:53

bergundy force-pushed the unhandled-rejections branch from d57b439 to 7d99d81 Compare December 11, 2021 00:06

Properly handle unhandled rejection outside of WF context

08e5d14

bergundy force-pushed the unhandled-rejections branch from 7d99d81 to 08e5d14 Compare December 11, 2021 00:06

bergundy merged commit 27e9fcd into main Dec 11, 2021

bergundy deleted the unhandled-rejections branch December 11, 2021 00:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(workflow): Handle unhandled rejections in workflow code #415

feat(workflow): Handle unhandled rejections in workflow code #415

bergundy commented Nov 30, 2021

bergundy commented Nov 30, 2021

bergundy commented Dec 1, 2021

cretz Dec 1, 2021

bergundy Dec 1, 2021

cretz Dec 1, 2021

bergundy Dec 1, 2021

cretz Dec 1, 2021

bergundy Dec 1, 2021

cretz Dec 1, 2021

Sushisource Dec 2, 2021

cretz Dec 10, 2021 •

edited

Loading

Sushisource Dec 2, 2021

Sushisource Dec 2, 2021

bergundy Dec 10, 2021

bergundy commented Dec 7, 2021

Sushisource Dec 10, 2021

bergundy Dec 10, 2021

		// Apparently nextTick does not get it triggered so we use setTimeout here.
		await new Promise((resolve) => setTimeout(resolve, 0));

feat(workflow): Handle unhandled rejections in workflow code #415

feat(workflow): Handle unhandled rejections in workflow code #415

Conversation

bergundy commented Nov 30, 2021

bergundy commented Nov 30, 2021

bergundy commented Dec 1, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cretz Dec 10, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bergundy commented Dec 7, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cretz Dec 10, 2021 •

edited

Loading