Skip to content

PG: Add wait events counts from pg_stat_activity #20588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 26, 2025

Conversation

bonnefoa
Copy link
Contributor

What does this PR do?

Report wait_event in a postgresql.activity.wait_event metric.

Motivation

This metric is aggregated by user, db, app and backend_type. This metric will provide more visibility on system processes: Number of active autovacuum workers or parallel workers, the state of recovery process...

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

@@ -557,6 +557,23 @@ def trim_leading_set_stmts(sql):
""".strip(),
}

QUERY_PG_WAIT_EVENT_METRICS = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do the same aggregations (and more) with samples in the UI. The issue is that we’re not collecting samples for background processes. I assume the reason for adding these metrics is to get visibility into them. Would it be better to include background processes in sampling instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the system processes are not visible and having visibility on the recovery process would have helped a lot during recent incidents. Also, wait events from sampling is not a metric and the 15 days retention make it hard to use.
If that can be done with samples, that would be great but AFAIK, the last attempt failed years ago? And that wouldn't solve the 15 days retention issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that can be done with samples, that would be great but AFAIK, the last attempt failed years ago?

That's feasible with not too much effort (we have to do some additional filtering), but we haven't prioritized it yet. Do you have a strong need for bg samples?

And that wouldn't solve the 15 days retention issues.

Understood - let's implement this regardless of the bg samples initiative.

Report wait_event in a postgresql.activity.wait_event metric. This
metric is aggregated by user, db, app and backend_type.
This metric will provide more visibility on system processes: Number of
active autovacuum workers or parallel workers, the state of recovery
process...
@bonnefoa bonnefoa force-pushed the bonnefoa/pg-wait-events branch from 4fd8fc5 to f63ce79 Compare June 25, 2025 15:02
Copy link

codecov bot commented Jun 25, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.22%. Comparing base (077bcac) to head (f63ce79).
Report is 5 commits behind head on master.

Additional details and impacted files
Flag Coverage Δ
activemq ?
cassandra ?
confluent_platform ?
hive ?
hivemq ?
hudi ?
ignite ?
jboss_wildfly ?
kafka ?
postgres 93.17% <100.00%> (+3.50%) ⬆️
presto ?
solr ?
tomcat ?
weblogic ?

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@nenadnoveljic nenadnoveljic self-requested a review June 26, 2025 08:12
@nenadnoveljic nenadnoveljic added this pull request to the merge queue Jun 26, 2025
Merged via the queue into master with commit 83c44e4 Jun 26, 2025
51 checks passed
@nenadnoveljic nenadnoveljic deleted the bonnefoa/pg-wait-events branch June 26, 2025 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants