Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeat Jobs not Rescheduling #1345

Closed
lukepolo opened this issue Aug 1, 2022 · 9 comments
Closed

Repeat Jobs not Rescheduling #1345

lukepolo opened this issue Aug 1, 2022 · 9 comments

Comments

@lukepolo
Copy link
Contributor

lukepolo commented Aug 1, 2022

We've seen this in production, sometimes our repeat jobs never get rescheduled, even though they are completed.

image

This happens randomly , but happens pretty often after restarts / deployments.

We do gracefully shutdown our queues as well.

@lukepolo lukepolo closed this as completed Aug 1, 2022
@lukepolo
Copy link
Contributor Author

lukepolo commented Aug 1, 2022

Resolved, we had duplicate prefixes that caused deleting of the jobs

I lied, ran into the issue again .

@lukepolo lukepolo reopened this Aug 1, 2022
@lukepolo
Copy link
Contributor Author

lukepolo commented Aug 1, 2022

well apparently something when it gets marked as completed , and I must be shutting down the queues, so it stops doing whatever its suppose todo.

Im unable to find the job via getRepeatableJobs either .

Way to reproduce:

  1. Create a repeatable job once per hour
  2. move the job into active (early)
  3. let job complete
  4. Delete delayed job
  5. try to reschedule cron

@lukepolo
Copy link
Contributor Author

lukepolo commented Aug 1, 2022

just added a check to see if we add a job, and its already completed, just re add it

      if (_job.opts.repeat.cron && (await _job.isCompleted())) {
        await _job.remove();
        return await this.push(queuedJob);
      }

@manast
Copy link
Contributor

manast commented Aug 2, 2022

If you delete the delayed job then you are effectively breaking the iterations, since it is when the delayed job gets promoted when the next iteration is scheduled. What I find strange is that you do not find the job when calling "getRepeatableJobs". Why are you deleting the delayed job in the first place?

@lukepolo
Copy link
Contributor Author

lukepolo commented Aug 2, 2022

For the cron jobs we call delete repeatable jobs on boot (because they may have changed schedules) and re add them .

We actually don’t call delete job it was a way to reproduce the issue (we’re not sure how it gets in the situation where it gets completed and the delayed job disappears)

@lukepolo
Copy link
Contributor Author

lukepolo commented Aug 2, 2022

it may be related to #1344 , they were failing as well (we only keep 500 failed jobs) notice same issue with failed jobs / completed

@lukepolo
Copy link
Contributor Author

lukepolo commented Aug 2, 2022

Ok so dug way deeper on this, its all because of how we are trying to make sure we have updated cron schedules.

We currently are remove all repeatable jobs, then re-adding them. This allows us to make sure we have all updated cron schedules (we could check 1 by 1).

The problem is when we do this , if the old cron had completed / failed , it would reuse the same repeat key, 1 ,2 3, 4,5.

So the problem is that we need to remove all jobs related to that cron rather than just the repeatable job. Although im not sure this is adventurous since we may remove failed jobs that need to be looked at.

So , our current plan is to look at the repeatable jobs one by one, and dont remove them they exist in the schedule.

if they don't exist in the schedule any longer, we will remove the cron and all of its failed / completed as well.

@lukepolo lukepolo closed this as completed Aug 2, 2022
@tylercote
Copy link

Hey Luke, do you mind sharing a code snippet of what this looks like? I think I am facing the same issue right now.

@lukepolo
Copy link
Contributor Author

Im grabbing these from multiple classes so it may not make 100% sense but ill post what i can.

We first get the list of cron jobs that we are going to schedule . Then remove them.


 protected async removeCurrentCronJobs() {
    const queueManager: QueueManager = this.app.make(Bindings.Queue);
    for (const availableQueue of this.queueConfig.availableQueues) {
      const queue = queueManager.getQueue(availableQueue);
      const runningCronJobs = await queue.getCronJobs();
      for (const runningCronJob of runningCronJobs) {
        await queue.removeCronJob(runningCronJob);
      }
    }
  }

 public async getCronJobs(): Promise<Array<JobInterface>> {
    return (await this.queue.getRepeatableJobs()).map((job) => {
      return {
        id: job.id,
        _job: job,
        name: job.name,
        data: undefined,
      };
    });
  }

 public async removeCronJob(job: string | JobInterface): Promise<boolean> {
    let removed = false;
    if (typeof job === "string") {
      const jobs = await this.queue.getRepeatableJobs();
      for (const _job of jobs) {
        if (_job.id === job) {
          removed = await this.queue.removeRepeatableByKey(_job.key);
          break;
        }
      }
      return removed;
    }

    return await this.queue.removeRepeatableByKey(job._job.key);
  }


Then we re-add them but we also add a timestamp to combat some of the issues above

 if (jobOptions.repeat?.cron) {
      /**
       * To deal with keeping cron jobs up to date we go through a 2 step process
       *
       * 1. Delete all current CronJobs
       * 2. Add all cron jobs back
       *
       * If a old job had failed / completed, the new cron job would not be added,
       * to combat this we add a timestamp to the job id
       */
      jobOptions.jobId = `${jobOptions.jobId}-${new Date().getTime()}`;
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants