-
-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow queries from pg-boss #126
Comments
Would you share the number of records in archive and how long it’s taking? I think it’s fine to add an index, and it would be great if you could add it and show the diff in perf |
Here are a few examples of slow query logs we are seeing. All are related to the archiving or eviction queries: 27 seconds {
"data": {
"parameters": [
"1 hour"
],
"query": "\n WITH archived_rows AS (\n DELETE FROM pgboss.job\n WHERE\n (completedOn + CAST($1 as interval) < now())\n OR (\n state = 'created'\n AND name LIKE '__state__completed__%'\n AND createdOn + CAST($1 as interval) < now()\n )\n RETURNING *\n )\n INSERT INTO pgboss.archive (\n id, name, priority, data, state, retryLimit, retryCount, retryDelay, retryBackoff, startAfter, startedOn, singletonKey, singletonOn, expireIn, createdOn, completedOn\n )\n SELECT \n id, name, priority, data, state, retryLimit, retryCount, retryDelay, retryBackoff, startAfter, startedOn, singletonKey, singletonOn, expireIn, createdOn, completedOn\n FROM archived_rows\n ",
"time": 27139
},
"message": "Slow query execution time: 27139"
} 219 seconds {
"data": {
"parameters": [
"30 days"
],
"query": "\n DELETE FROM pgboss.archive\n WHERE (archivedOn + CAST($1 as interval) < now())\n ",
"time": 219570
},
"message": "Slow query execution time: 219570"
} 145 seconds {
"data": {
"parameters": [
"30 days"
],
"query": "\n DELETE FROM pgboss.archive\n WHERE (archivedOn + CAST($1 as interval) < now())\n ",
"time": 145607
},
"message": "Slow query execution time: 145607"
} 50 seconds {
"data": {
"parameters": [
"1 hour"
],
"query": "\n WITH archived_rows AS (\n DELETE FROM pgboss.job\n WHERE\n (completedOn + CAST($1 as interval) < now())\n OR (\n state = 'created'\n AND name LIKE '__state__completed__%'\n AND createdOn + CAST($1 as interval) < now()\n )\n RETURNING *\n )\n INSERT INTO pgboss.archive (\n id, name, priority, data, state, retryLimit, retryCount, retryDelay, retryBackoff, startAfter, startedOn, singletonKey, singletonOn, expireIn, createdOn, completedOn\n )\n SELECT \n id, name, priority, data, state, retryLimit, retryCount, retryDelay, retryBackoff, startAfter, startedOn, singletonKey, singletonOn, expireIn, createdOn, completedOn\n FROM archived_rows\n ",
"time": 50476
},
"message": "Slow query execution time: 50476"
} There are many of these in the logs each day. I can get you more details later on how many records, etc., and the result of adding an index on the archive table. |
~24M rows at the moment (which is about 10x more than expected), though I see this is in part because pgboss also keeps a row representing the job state transition like |
@timgit another thought – would it be a reasonable idea to allow individual queues to define their own archive retention? I realize that might be a big shift in the inner workings of the lib, but in our case we have some jobs that we want to keep around for awhile, and others that can be evicted within a day. |
Looks like the index can't help because of how the comparison is written. We need to isolate the index column on the left hand side of the comparison: Before: WHERE (archivedOn + CAST($1 as interval) < now()) After: WHERE (archivedOn < now() - CAST($1 as interval)) |
With a small enough table this doesn't seem to help either – it needs to do a I was able to get the index used by using BEGIN;
EXPLAIN ANALYZE DELETE FROM pgboss.archive WHERE archivedOn BETWEEN (now() - CAST('2 days' as interval)) AND (now() - CAST('1 day' as interval));
ROLLBACK;
|
Thanks for looking into this and playing around the options. There are a lot of options I can use to address this, some even advanced such as partitioning the archive by date so archiving becomes a partition deletion instead of costly query. Until that happens, I'm thinking the best course of action here would be to create your own retention system to pull the jobs you need out of the archive so you won't need to rely on it for anything of importance. If you were to have your own process copying the meaningful jobs out of the archive on an interval lower than the archive interval, then you could reduce the archive retention period low enough that the execution time is resolved. |
i added an index to archivedOn in 3.2.1, which should help out in most cases, although I agree it's not a perfect solution for very large volumes. |
We have an archive retention policy of 30 days before we evict from the
archive
table, but it seems that this produces some very poor query times as the number of archived jobs increases...Should pgboss create an index on
archivedOn
since it heavily depends on that for it's eviction query?The text was updated successfully, but these errors were encountered: