-
-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to report failure #21
Comments
Can you elaborate on what problem this would solve? Is it a concern over pg-boss "waiting"? |
If the job has already failed and stopped running, several things. If the job is a singleton, a new job can't be run until the job is known to have failed. Additionally, since the process running the jobs may be interested in logging errors, and optionally retrying certain jobs, with or without delay/backoff, it's useful to know accurately when (and why) they've failed. The reality is simply that the job didn't time out after 30 seconds if it failed and stopped running after 1 second. It also makes debugging simpler (as failed jobs right now could have failed for any reason at any time prior to their timeout expiring). |
I've only read through about half the source and tests so far, so if there's a way that I'm not seeing to do this through events or something else, please let me know! |
Good point on the singleton jobs. Agreed.
Would you clarify the intent here? Which process? Do you want a done(error) callback to then round trip back into the same process via the 'error' event? I would think that if you have the error with which to pass to done(), you'd just log/report it right there. |
So in my case, I largely treat error logging as a cross-cutting concern that the script running in node is responsible for at a high level, rather than having error logging handled closer to where the error is generated. This means that the runner that's calling I realize this may not be a general use case, but it seems that there isn't a way to do it currently, so I have logging located at a lower level now. On the plus side, using |
Thanks for the clarification. In your case, would you want pg-boss to emit 'error' if you pass done(error)? |
I think I'd want to mark the job as failed, rather than expired, and emit the error on the manager (and thus on the boss instance). Does this sound right? |
Yes, that sounds right. I am considering the emitting of 'error' on the instance via a done(error) should be an opt-in config in the constructor options, though. That coolio? |
Yeah, I don't see any practical problem with it being opt-in. Would you mind sharing your thought process with me on that? |
The current behavior of 'error' is unexpected or unhandled errors and should rarely occur. Leaving this behavior as-is maintains the existing semantics of 'error' with node EventEmitters (they will crash hard if not handled).
Even though I'm pushing for peeps to handle 'error', it's quite possible that some are not going to do it. This leaves open a normative and common use case of an error in a handler that could crash the entire process. In fact, the more I think about this, the more I'm convinced that manager should emit a distinct event instead of 'error' to maintain the semantic difference between "oh no something unexpected occurred that is worthy of crashing your process" vs. "this happened and I'd like you to log it". And this would even apply to the current error handling I recently shipped with 1.0 which I should migrate off of 'error'. By the way, using I was thinking of naming this other event 'warn' along the lines of popular error severity levels but I'm open to other names since I feel like I can't ever name things properly the first time around. 😱 |
Got it, I think that makes a lot of sense. This way, So, what about a reference to the job itself? A |
Added this in 1.1.0 |
Woah, very nice! |
If a job callback fails asynchronously,
done()
doesn't appear to take an argument in typical node style. I read through #3 and my question is the same as OP's first:I'm not necessarily looking to retry the failed job, but I'd like to tell Boss that the job failed as soon as it does fail, rather than waiting for the timeout. Is this possible?
If not, would you be open to a PR accepting an error argument to the
done()
call? I would think this would be non-breaking.The text was updated successfully, but these errors were encountered: