-
Notifications
You must be signed in to change notification settings - Fork 1.5k
PG: Add wait events counts from pg_stat_activity #20588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4cb43e3
to
4fd8fc5
Compare
@@ -557,6 +557,23 @@ def trim_leading_set_stmts(sql): | |||
""".strip(), | |||
} | |||
|
|||
QUERY_PG_WAIT_EVENT_METRICS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do the same aggregations (and more) with samples in the UI. The issue is that we’re not collecting samples for background processes. I assume the reason for adding these metrics is to get visibility into them. Would it be better to include background processes in sampling instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the system processes are not visible and having visibility on the recovery process would have helped a lot during recent incidents. Also, wait events from sampling is not a metric and the 15 days retention make it hard to use.
If that can be done with samples, that would be great but AFAIK, the last attempt failed years ago? And that wouldn't solve the 15 days retention issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that can be done with samples, that would be great but AFAIK, the last attempt failed years ago?
That's feasible with not too much effort (we have to do some additional filtering), but we haven't prioritized it yet. Do you have a strong need for bg samples?
And that wouldn't solve the 15 days retention issues.
Understood - let's implement this regardless of the bg samples initiative.
Report wait_event in a postgresql.activity.wait_event metric. This metric is aggregated by user, db, app and backend_type. This metric will provide more visibility on system processes: Number of active autovacuum workers or parallel workers, the state of recovery process...
4fd8fc5
to
f63ce79
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
What does this PR do?
Report wait_event in a
postgresql.activity.wait_event
metric.Motivation
This metric is aggregated by user, db, app and backend_type. This metric will provide more visibility on system processes: Number of active autovacuum workers or parallel workers, the state of recovery process...
Review checklist (to be filled by reviewers)
qa/skip-qa
label if the PR doesn't need to be tested during QA.backport/<branch-name>
label to the PR and it will automatically open a backport PR once this one is merged