Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve waitUntilFinished timeout #1558

Closed
ckorherr opened this issue Nov 30, 2022 · 2 comments
Closed

Improve waitUntilFinished timeout #1558

ckorherr opened this issue Nov 30, 2022 · 2 comments

Comments

@ckorherr
Copy link

We are using waitUntilFinished to return the job result from an API endpoint. Jobs have to be executed within 30 seconds, if that doesn't happen we don't care about the result anymore.

So this is what we do:

    return await job.waitUntilFinished(this.events, 30 * 1000).catch(async e => {
      if (e.message?.startsWith(`Job wait ${name} timed out before finishing`)) {
        try {
          await job.remove();
        } catch { }
      }
      throw e;
    });

It would be great to have an option to move the job automatically to removed, but not sure if this might only be interesting for our use case. Anyway, it would also be great if we would not have to check e.message to be a specific string.

I would propose to create a custom error class implementation and make it available to check for a timeout this way:

    return await job.waitUntilFinished(this.events, 30 * 1000).catch(async e => {
      if (e instanceof WaitUntilFinishedTimeoutError) {
        try {
          await job.remove();
        } catch { }
      }
      throw e;
    });
@manast
Copy link
Contributor

manast commented Nov 30, 2022

In general, we cannot remove a job as the job may have already been started by a worker. Also, we do not recommend using "waitUntilFinished" for production code, here there is a blog post with some background information: https://blog.taskforce.sh/do-not-wait-for-your-jobs-to-complete/

@ckorherr
Copy link
Author

Yes, I know that blog post, but we still have CPU intensive jobs that need to be handled by multiple worker instances.
To return just the job id to the caller of our API is no option, it must be synchronous.

Another queue for completion would not change anything, because we still need to keep the client connection open and synchronize the result.

We scale the worker instances automatically based on the queue length, this way we can ensure that most of the time jobs are handled within the timeout. Usually our jobs take up to 3 sec.

Might be that Bull is not the perfect solution for our problem, but it works pretty good.

I've found a better way to cancel the job inside the Worker:

   ...
   new Worker('...', handleJob, ...);

   function handleJob(job: Job<{ request: any; context: ContextData; }>, token?: string): Promise<any> {
     if (job.timestamp + JOB_TIMEOUT < Date.now()) {
       throw new Error('Timeout!');
     }
     return wrapWithCorrelationContext(processor, deserializeContext(job.data.context))(job.name, job.data.request);
   }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants