Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TRI-1152] chore: upgrade webapp to use graphile-worker 0.14.0-rc.0 #396

Closed
ericallam opened this issue Aug 25, 2023 · 1 comment · Fixed by #1097 · May be fixed by #524
Closed

[TRI-1152] chore: upgrade webapp to use graphile-worker 0.14.0-rc.0 #396

ericallam opened this issue Aug 25, 2023 · 1 comment · Fixed by #1097 · May be fixed by #524
Assignees
Labels

Comments

@ericallam
Copy link
Member

ericallam commented Aug 25, 2023

We're currently on 0.13.0 but 0.14.0 comes with automatic queue management, so it will be easier to do #392 along with other things:

  • New "batch jobs" feature for merging payloads with a job_key (see README)
  • Significantly improved 'large jobs table' performance (e.g. when a large queue is locked, or there's a lot of jobs queued for task identifiers your worker instance doesn't support, or a lot of failed jobs). Around 20x improvement in this 'worst case' performance for real user workloads.
  • Added new (experimental) much faster add_jobs batch API.
  • Fix error handling of cron issues in 'run' method.
  • CronItem.match can now accept either a pattern string or a matcher function
  • Jobs that were locked more than 4 hours will be reattempted as before, however they are slightly de-prioritised by virtue of having their run_at updated, giving interim jobs a chance to be executed (and lessening the impact of queue stalling through hanging tasks).

The full release notes are here:

https://github.com/graphile/worker/blob/main/RELEASE_NOTES.md#v0140

As mentioned in the release notes, this is a breaking change, meaning workers running 0.13 cannot run against the 0.14 schema. So we'll need to make this clear in any release notes for this feature.

From SyncLinear.com | TRI-1152

@maige-app maige-app bot added enhancement New feature or request area/integrations labels Aug 25, 2023
@ericallam ericallam changed the title chore: upgrade webapp to use graphile-worker 0.14.0-rc.0 [TRI-1152] chore: upgrade webapp to use graphile-worker 0.14.0-rc.0 Aug 25, 2023
@nicktrn
Copy link
Collaborator

nicktrn commented Oct 2, 2023

Just ticking some boxes to know what needs actioning.

Breaking changes

  • Bump minimum Node version to 14
  • Bump minimum PG version to 12
  • priority, attempts and max_attempts are now smallint, i.e. values have to be between -32768 and 32768
    • Using current max of 100
  • CronItem.pattern has been renamed to CronItem.match
    • TODO: Rename this in recurring tasks and ZodWorker
  • Database error codes have been removed because we've moved to CHECK constraints
    • We don't rely on specific error codes

Internal changes

  • The 'jobs' table no longer has queue_name and task_identifier columns
    • We don't rely on specific column names (see below for raw query returns)
  • Internal SQL functions (get_job, fail_job, complete_job) have been moved to JS
  • Most triggers removed, could cause problems if directly inserting into jobs table
    • We don't do this

Migration

  • Will NOT run if there are any locked jobs - read: locked_at in last 4 hours
    • TODO: Graceful shutdown should usually take care of this. May want to provide a simple script to clear all locks.
  • Ensure the jobs table is not referenced directly in a custom function
    • We don't do this (yet)

Raw queries

  • Ensure SQL function signatures still match
    • Integer types were only changed internally
  • Ensure SQL function returns still match
    • TODO: Check GraphileJobSchema, particularly queue_name and task_identifier

Summary

Seems a smooth migration will come down to three things:

  1. Ensuring no jobs are currently locked, possibly creating an "unlock" script
  2. Renaming CronItem.pattern to CronItem.match
  3. Adjusting parsers for SQL function call returns

Optimal upgrade steps once this is taken care of:

  1. Shut down running webapp
  2. Launch new version
  3. (optional - only if graceful shutdown impossible) Repeat step 1. Run helper script to fail all currently locked jobs, then repeat step 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants