Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more columns/dimensions to pg_wait_sampling #97

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Medvecrab
Copy link
Contributor

Following the discussion in #94, I decided to add more columns to pg_wait_sampling_current/history/profile

I've added most of the columns from request in #94, but some fields from "pg_stat_activity" (= localBackendStatusTable) like query_text, query_start, query_id (if we actually took it from "pg_stat_activity") work in the following way: they save values at the start of the query and DO NOT clear those fields when the query has ended. So after executing "select 1;" in some client backend we will always sample its query_text and query_start until we execute other query. We specifically avoided this kind of sampling when we were making our way of tracking query_id with Executor hooks.

All new columns are added to new views with suffix "_extended". Those views COULD be changed between extension versions, while existing views without those suffixes SHOULD NOT be changed.

One more thing to highlight - since we have to look into localBackendStatusTable when we sample some columns, we have reduced perfomance. This is unavoidable and is noted in updated README. BUT, in PostgreSQL 13-16 we can't reliably link ProcGlobal and localBackendStatusTable arrays and from this was born get_beentry_by_procpid. For each interesting process from ProcGlobal we have to iterate through localBackendStatusTable. This has O(n^2) time complexity, where n is a number of all backends. Very inefficient. I probably could remake the collector code to iterate through localBackendStatusTable first and then find corresponding entries in ProcGlobal but I have not investigated it and am not sure it would be faster (it may be faster, depending on ProcGlobal access/iteration)

There could some sloppy code, including possible "you should copy the struct here, not use pointer" moments

Everyone is welcome to review the code and share their thoughts.

Oleg Tselebrovskiy added 2 commits March 26, 2025 16:21
…sions

Sometimes it can be useful to have additional info collected with wait_events,
so we add two new views/functions that include more information. The structure of
those views could be changed in new versions of pg_wait_sampling extension
Also fix some typos/reword some sentences
@DmitryNFomin
Copy link

@Medvecrab
many thanks for this.

Performance penalty is exact risk that I would like to avoid while still want to have some extra dimensions.
If we add fields from PGROC isBackgroundWorker, databaseId and roleId (like here #95) we do not get increased observer effect.

So my suggestion to add fields from PGPROC to the "main" views and fields from localBackendStatusTable to _extended views

@Medvecrab
Copy link
Contributor Author

Performance penalty is exact risk that I would like to avoid while still want to have some extra dimensions. If we add fields from PGROC isBackgroundWorker, databaseId and roleId (like here #95) we do not get increased observer effect.

So my suggestion to add fields from PGPROC to the "main" views and fields from localBackendStatusTable to _extended views

The idea of *_extended views is to add ALL additional fields to them (also any that could be added in the future) so the original views remain the same for backwards compatibility. I agree that if we were rewriting pg_wait_sampling anew, it would make some sense to distribute columns differently, but alas. This is also the reason that with my patch we still set profiling per profile/query_id with only the old GUC variables, not with the new ones

When all fields from localBackendStatusTable are turned off (well, none of them are listed in either history_dimensions or profile_dimensions), the performance shouldn't take a hit (there are specific guards for those cases in PR)

@DmitryNFomin
Copy link

agree with your point, just one more comment, can we add isBackgroundWorker from PGPROC? I understand that backend_type is more detailed, but it we can get it without performance penalty while we can distinct regular backends in performance analysis already

@Medvecrab
Copy link
Contributor Author

Seems like a fair request, will probably add after/during review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants