How to report failure #21

jordaaash · 2017-03-24T18:00:23Z

If a job callback fails asynchronously, done() doesn't appear to take an argument in typical node style. I read through #3 and my question is the same as OP's first:

(1) How does a job consumer report failure, such as to requeue the job for another attempt later?

I'm not necessarily looking to retry the failed job, but I'd like to tell Boss that the job failed as soon as it does fail, rather than waiting for the timeout. Is this possible?

If not, would you be open to a PR accepting an error argument to the done() call? I would think this would be non-breaking.

The text was updated successfully, but these errors were encountered:

timgit · 2017-03-24T19:05:15Z

tell Boss that the job failed as soon as it does fail, rather than waiting for the timeout.

Can you elaborate on what problem this would solve? Is it a concern over pg-boss "waiting"?

jordaaash · 2017-03-24T20:07:23Z

If the job has already failed and stopped running, several things. If the job is a singleton, a new job can't be run until the job is known to have failed. Additionally, since the process running the jobs may be interested in logging errors, and optionally retrying certain jobs, with or without delay/backoff, it's useful to know accurately when (and why) they've failed. The reality is simply that the job didn't time out after 30 seconds if it failed and stopped running after 1 second. It also makes debugging simpler (as failed jobs right now could have failed for any reason at any time prior to their timeout expiring).

jordaaash · 2017-03-24T20:08:42Z

I've only read through about half the source and tests so far, so if there's a way that I'm not seeing to do this through events or something else, please let me know!

timgit · 2017-03-24T20:14:55Z

If the job is a singleton, a new job can't be run until the job is known to have failed.

Good point on the singleton jobs. Agreed.

since the process running the jobs may be interested in logging errors

Would you clarify the intent here? Which process? Do you want a done(error) callback to then round trip back into the same process via the 'error' event? I would think that if you have the error with which to pass to done(), you'd just log/report it right there.

jordaaash · 2017-03-24T20:38:12Z

So in my case, I largely treat error logging as a cross-cutting concern that the script running in node is responsible for at a high level, rather than having error logging handled closer to where the error is generated.

This means that the runner that's calling done() when a job completes could pass an error without needing to log that error itself (or be aware if/how the error is being logged).

I realize this may not be a general use case, but it seems that there isn't a way to do it currently, so I have logging located at a lower level now.

On the plus side, using done(error) would also allow me to have my jobs (which return Bluebird promises) just call .asCallback(done) :)

timgit · 2017-03-24T21:33:06Z

Thanks for the clarification. In your case, would you want pg-boss to emit 'error' if you pass done(error)?

jordaaash · 2017-03-24T21:35:33Z

I think I'd want to mark the job as failed, rather than expired, and emit the error on the manager (and thus on the boss instance). Does this sound right?

timgit · 2017-03-25T01:59:08Z

Yes, that sounds right. I am considering the emitting of 'error' on the instance via a done(error) should be an opt-in config in the constructor options, though. That coolio?

jordaaash · 2017-03-25T17:32:54Z

Yeah, I don't see any practical problem with it being opt-in. Would you mind sharing your thought process with me on that?

timgit · 2017-03-26T00:30:10Z

The current behavior of 'error' is unexpected or unhandled errors and should rarely occur. Leaving this behavior as-is maintains the existing semantics of 'error' with node EventEmitters (they will crash hard if not handled).

If an EventEmitter does not have at least one listener registered for the 'error' event, and an 'error' event is emitted, the error is thrown, a stack trace is printed, and the Node.js process exits.

Even though I'm pushing for peeps to handle 'error', it's quite possible that some are not going to do it. This leaves open a normative and common use case of an error in a handler that could crash the entire process. In fact, the more I think about this, the more I'm convinced that manager should emit a distinct event instead of 'error' to maintain the semantic difference between "oh no something unexpected occurred that is worthy of crashing your process" vs. "this happened and I'd like you to log it". And this would even apply to the current error handling I recently shipped with 1.0 which I should migrate off of 'error'.

By the way, using done() in this fashion is a convenience batching feature. Since it needs to update the database then emit an event, it could throw if the database becomes unreachable for any reason. You can always add a catch() on done() to mitigate this, but just keep that in mind since you may lose your original error that you wanted logged.

I was thinking of naming this other event 'warn' along the lines of popular error severity levels but I'm open to other names since I feel like I can't ever name things properly the first time around. 😱

jordaaash · 2017-03-26T00:38:09Z

Got it, I think that makes a lot of sense. This way, error is a problem with the library, or something else serious, and the new event is for something perhaps more ordinary or recoverable.

So, what about a reference to the job itself? A failed event strikes me as the most natural option given the current event API (and could have a matching job state). It also matches the semantics of several other popular Node job queues (kue, bull, bee-queue).

timgit · 2017-03-27T04:24:20Z

Added this in 1.1.0

jordaaash · 2017-03-27T04:54:39Z

Woah, very nice!

timgit closed this as completed Mar 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to report failure #21

How to report failure #21

jordaaash commented Mar 24, 2017

timgit commented Mar 24, 2017

jordaaash commented Mar 24, 2017 •

edited

Loading

jordaaash commented Mar 24, 2017

timgit commented Mar 24, 2017 •

edited

Loading

jordaaash commented Mar 24, 2017

timgit commented Mar 24, 2017

jordaaash commented Mar 24, 2017

timgit commented Mar 25, 2017

jordaaash commented Mar 25, 2017

timgit commented Mar 26, 2017 •

edited

Loading

jordaaash commented Mar 26, 2017

timgit commented Mar 27, 2017

jordaaash commented Mar 27, 2017

How to report failure #21

How to report failure #21

Comments

jordaaash commented Mar 24, 2017

timgit commented Mar 24, 2017

jordaaash commented Mar 24, 2017 • edited Loading

jordaaash commented Mar 24, 2017

timgit commented Mar 24, 2017 • edited Loading

jordaaash commented Mar 24, 2017

timgit commented Mar 24, 2017

jordaaash commented Mar 24, 2017

timgit commented Mar 25, 2017

jordaaash commented Mar 25, 2017

timgit commented Mar 26, 2017 • edited Loading

jordaaash commented Mar 26, 2017

timgit commented Mar 27, 2017

jordaaash commented Mar 27, 2017

jordaaash commented Mar 24, 2017 •

edited

Loading

timgit commented Mar 24, 2017 •

edited

Loading

timgit commented Mar 26, 2017 •

edited

Loading