Skip to content

docs(merge-queue): add mq_pr_time_in_queue_1h_seconds to Prometheus metrics reference#531

Merged
trunk-staging-io[bot] merged 4 commits into
mainfrom
sam-gutentag/prometheus-total-time-in-queue
May 7, 2026
Merged

docs(merge-queue): add mq_pr_time_in_queue_1h_seconds to Prometheus metrics reference#531
trunk-staging-io[bot] merged 4 commits into
mainfrom
sam-gutentag/prometheus-total-time-in-queue

Conversation

@samgutentag
Copy link
Copy Markdown
Member

Summary

Adds the new mq_pr_time_in_queue_1h_seconds histogram metric to the Prometheus metrics reference table. This metric measures total end-to-end time a PR spends in the queue (entry to exit), covering all phases — not just wait time or test time individually.

Source

Files changed

  • merge-queue/administration/metrics.md — added one row to the rolling 1-hour window metrics table

Open questions

  • Confirm the exact metric name (mq_pr_time_in_queue_1h_seconds) matches what's emitted in production
  • Should we add an example PromQL query for this metric in the "Duration analysis" section?

Test plan

  • Preview in GitBook change request
  • Verify metric name against trunk2 PR diff (trunk-io/trunk2#3439)
  • Check for broken links

…theus docs

Documents the new total time in queue histogram metric, which measures
end-to-end queue duration from entry to exit across all phases. Useful
for teams using batching, bisection testing, or pending failure modes
where wait+test time alone understates total latency.

Pairs with trunk-io/trunk2#3439.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@trunk-staging-io
Copy link
Copy Markdown

trunk-staging-io Bot commented Apr 7, 2026

😎 Merged successfully - details.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 5, 2026

Claude finished @samgutentag's task in 53s —— View job


PR Review

  • Read CLAUDE.md for repo guidelines
  • Read the changed file (merge-queue/administration/metrics.md)
  • Review the git diff
  • Provide review feedback

The change is clean and minimal — one row added to the rolling 1-hour window metrics table. The metric is correctly typed as Histogram with the le label, and its placement at the end of the table is logical (it's the aggregate metric that encompasses the more specific _wait_duration and _test_duration entries above it).

Minor suggestion (non-blocking):

The description is noticeably more verbose than the other entries in the table:

Metric Description
mq_pr_wait_duration_1h_seconds Distribution of time PRs spent waiting before testing starts
mq_pr_test_duration_1h_seconds Distribution of time PRs spent in the testing phase
mq_pr_time_in_queue_1h_seconds Distribution of total time PRs spent in the queue, from entry to exit (includes waiting, testing, and any other phases)

The parenthetical (includes waiting, testing, and any other phases) adds useful context but could be trimmed to maintain style consistency. Consider:

Distribution of total end-to-end time PRs spent in the queue, from entry to exit

Or if you want to keep the explicit breakdown:

Distribution of total time PRs spent in the queue from entry to exit, covering all phases

This is a non-blocking style suggestion — the current wording is clear and accurate.


Result: ✅ No blocking issues. The metric name, type, labels, and placement are all correct.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-approved: Claude code review passed.

@samgutentag
Copy link
Copy Markdown
Member Author

Triage status (2026-05-05): up to date with main. Needs review before merge.

Copy link
Copy Markdown
Contributor

@blx blx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is accurate as already written; I added some clarifications for further detail but please feel free to reword or edit those as you like

Comment thread merge-queue/administration/metrics.md Outdated
Comment thread merge-queue/administration/metrics.md
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 6, 2026

Claude finished @samgutentag's task in 51s —— View job


PR Review

  • Read CLAUDE.md for repo guidelines (no CLAUDE.md present in repo)
  • Read the changed file (merge-queue/administration/metrics.md)
  • Review the git diff
  • Provide review feedback

Review summary

The change is a clean one-row addition to the rolling 1-hour window metrics table. Metric name, type (Histogram), and label (le) are all consistent with the surrounding rows. The description is grammatically correct and accurate.

Typos / grammar / formatting

None found. The new row matches the markdown table syntax of the surrounding rows.

Documentation best practices

1. Take @blx's suggested rewording. It improves the entry on two fronts:

  • Cross-links pending failure — inline links to related concepts is consistent with how other docs in this repo handle terminology references
  • Adds a clarifying paragraph below the table that disambiguates "Waiting to Enter Queue" from "in the queue" — important context that prevents users from misinterpreting their own metrics

The engineering author's clarifications are exactly the kind of conceptual scaffolding that belongs alongside a metric definition. I'd accept those suggestions as-is.

2. Pre-existing minor issue (not introduced by this PR): merge-queue/administration/metrics.md:205-206 — there is no blank line between the table and the Each histogram emits… paragraph that follows. Most markdown renderers handle this, but a blank line is the safer/more conventional separator. Worth fixing while you're in this section, but not blocking.

3. Re: the open question about adding a PromQL example. The "Duration analysis" section likely already covers mq_pr_wait_duration_1h_seconds and mq_pr_test_duration_1h_seconds patterns — adding a parallel example for mq_pr_time_in_queue_1h_seconds would be a natural follow-up but isn't required to ship this row. Suggest filing it as a separate small change so this PR stays scoped to the metric reference.


Result: ✅ No blocking issues. Recommend accepting @blx's code suggestions before merging since they materially improve the entry.
| Branch

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-approved: Claude code review passed.

@samgutentag
Copy link
Copy Markdown
Member Author

samgutentag commented May 6, 2026

Verification status (2026-05-07, re-verified): live

Verified: customers can use this. Ready to publish.

  • Eng PR: trunk2#3439 ("Add total time in queue metric for merge queue prometheus metrics", merged 2026-04-06, merge commit still on main)
  • Linear ticket: none referenced
  • Flag: none (no LaunchDarkly gate found in PR body or diff; pure metric addition)
  • Signals: pure Prometheus metric addition, ungated. The merge commit is intact on trunk2 main.

Review feedback applied in commit 7e33806:

  • @blx's first inline suggestion (line 205): metric row description rewritten to "Distribution of total time PRs spent in the queue, from entry to exit (includes waiting, testing, and any other phases, such as pending failure)." Sam added the .md extension on the link target to match the convention used in terraform.md and advanced-settings.md.
  • @blx's second inline suggestion (line 209): clarifying paragraph about the "Waiting to Enter Queue" state added between the histogram bucket detail and the gauge-semantics warning hint, so the clarification sits next to the metric definitions rather than after the warning.
  • Pre-existing nit (blank line between table and following paragraph at line 206) is also resolved in the same commit.

Merge readiness: GitHub reports MERGEABLE/CLEAN. Local commit 7e33806 is one ahead of origin and ready to push.

Suggested next action: push 7e33806, mark out of draft, then merge.

Address the two review comments on PR #531:
- Enrich the mq_pr_time_in_queue_1h_seconds row description with a link
  to pending-failure-depth (added .md to match the repo-wide convention
  for file links to that page)
- Add a clarifying paragraph that "Waiting to Enter Queue" time is not
  counted in either Wait Duration or Time in Queue metrics; placed
  between the histogram bucket detail and the warning hint so it sits
  next to the metric definitions it clarifies

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 7, 2026

Claude finished @samgutentag's task in 1m 25s —— View job


PR Review

  • Read CLAUDE.md for repo guidelines (no CLAUDE.md present in repo)
  • Read the changed file (merge-queue/administration/metrics.md)
  • Review the git diff
  • Provide review feedback

Review summary

The latest commit (7e33806) cleanly applies @blx's two inline suggestions and resolves the pre-existing blank-line nit I flagged in my prior review. Diff is 4 additions, 0 deletions, all in merge-queue/administration/metrics.md.

Verifications

  • Link target: ../optimizations/pending-failure-depth.md resolves to merge-queue/optimizations/pending-failure-depth.md, which exists. The .md suffix matches the convention in terraform.md:116 and advanced-settings.md:288, both of which link to the same target with the explicit extension. The author's annotation about this is correct.
  • Metric row: Name (mq_pr_time_in_queue_1h_seconds), type (Histogram), and label (le (bucket boundary)) are consistent with the surrounding rows. Placement (last row in the rolling-window table, after the more granular _wait_duration and _test_duration entries) is logical — it's the aggregate.
  • Clarifying paragraph placement: Sitting between the histogram bucket detail (line 207) and the gauge-semantics warning (line 211) is the right call. It keeps the conceptual clarification next to the metric definitions, where a reader looking up "what does this metric measure?" will find it. Posting it inside the {% hint %} block, as a verbatim apply would have done, would have broken the hint and buried the clarification.

Typos / grammar / formatting

None. The new content scans cleanly.

Minor style observation (non-blocking)

The new row's description ends with a period:

...such as [pending failure](../optimizations/pending-failure-depth.md)).

The other three rows in the same table do not:

...waiting before testing starts
...in the testing phase
...in the last hour

This is a small inconsistency, but the new description is the only one in the table that contains a parenthetical and an inline link, so a terminal period reads more naturally here. Either keep it or drop it for strict consistency — I'd leave it as-is.

Open questions from PR body

  • Metric name confirmation — the author's verification note (2026-05-07) confirms trunk2#3439 is merged on main, so the name is real and shipped. ✅
  • PromQL example in "Duration analysis" — agree with prior review: better as a separate, scoped follow-up. Not required to ship this row.

Result: ✅ No blocking issues. The PR is in good shape to merge. The single minor style observation above is genuinely optional. | Branch

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-approved: Claude code review passed.

@samgutentag samgutentag marked this pull request as ready for review May 7, 2026 07:40
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 7, 2026

Claude finished @samgutentag's task in 53s —— View job


PR Review

  • Read repo guidelines (no CLAUDE.md present)
  • Read the changed file (merge-queue/administration/metrics.md)
  • Review the git diff against origin/main
  • Verify link target resolves
  • Provide review feedback

Review summary

This PR is in great shape and ready to merge. The diff is exactly what was advertised: 4 additions, 0 deletions, all in merge-queue/administration/metrics.md. The latest commit (7e33806) cleanly applies @blx's two inline suggestions and resolves the pre-existing blank-line nit from prior review rounds.

Verifications

  • Metric row (metrics.md:205): name (mq_pr_time_in_queue_1h_seconds), type (Histogram), and label (le (bucket boundary)) match the surrounding rows. Placement at the end of the table is correct — it's the aggregate that subsumes _wait_duration and _test_duration.
  • Link target: ../optimizations/pending-failure-depth.md resolves to merge-queue/optimizations/pending-failure-depth.md, which exists. The .md suffix is consistent with how the same target is linked from terraform.md and advanced-settings.md.
  • Clarifying paragraph (metrics.md:209): placed between the histogram bucket detail and the {% hint style="warning" %} block, which is the right call — it sits next to the metric definitions rather than after the gauge-semantics warning. The author's note explains why a verbatim apply would have broken the hint block; the manual placement is the correct fix.
  • Pre-existing blank-line nit (metrics.md:206): now resolved by the blank line between the table and the Each histogram emits… paragraph.

Typos / grammar / formatting

None. The new content reads cleanly and matches the surrounding style.

Minor style observation (non-blocking)

The new row's description ends with a period (...such as [pending failure](...).), while the other three histogram/gauge rows in the same table do not. The new description is the only one containing a parenthetical and an inline link, so a terminal period reads more naturally — leaving it as-is is fine. Calling it out only because it's the single style asymmetry I can find in the diff.

Open questions from PR body

  • Metric name confirmation — verified by the author against trunk2#3439 (merged 2026-04-06, on main). ✅
  • PromQL example in "Duration analysis" — better as a separate, scoped follow-up. Not required to ship this row.

Result: ✅ No blocking issues. Ready to merge.
| Branch

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-approved: Claude code review passed.

@trunk-staging-io trunk-staging-io Bot merged commit 204a08d into main May 7, 2026
8 checks passed
@trunk-staging-io trunk-staging-io Bot deleted the sam-gutentag/prometheus-total-time-in-queue branch May 7, 2026 07:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants