Skip to content

refactor(metrics): route all controllers through core/metrics helpers#162

Merged
behinddwalls merged 2 commits into
mainfrom
preetam/metrics-fix
May 29, 2026
Merged

refactor(metrics): route all controllers through core/metrics helpers#162
behinddwalls merged 2 commits into
mainfrom
preetam/metrics-fix

Conversation

@behinddwalls
Copy link
Copy Markdown
Collaborator

@behinddwalls behinddwalls commented May 28, 2026

Summary

Route every controller and the consumer framework through the core/metrics helpers so metric emission is consistent across the codebase.

What changed

Migrated to metrics.Begin / op.Complete + metrics.NamedCounter:

  • RPC controllers: gateway/controller/land.go, orchestrator/controller/ping.go, stovepipe/controller/ping.go
  • Queue controllers: orchestrator/controller/{batch,build,start,score,speculate,buildsignal,log}/*.go
  • Extension: extension/mergechecker/github/checker.go

Each migrated method now wraps its work with op := metrics.Begin(scope, opName); defer func() { op.Complete(retErr) }(), which emits {scope}.{op}.called, {op}.succeeded / {op}.failed, {op}.latency, and {op}.latency_histogram with automatic error_origin, retryable, and dependency classification tags via core/errs. Sub-event counters (deserialize_errors, storage_errors, publish_errors, …) go through metrics.NamedCounter so they share the same sub-scope prefix.

Consumer framework — core/consumer/consumer.go:
Migrated to metrics.NamedCounter / metrics.NamedTimer rather than Begin/Complete because the framework needs custom success=true|false|cancel and operation=ack|nack tags that the standard Op lifecycle can't express.

Op-name de-duplication:
Each migrated method declares const opName = "..." once and references it from every emit (file-level in speculate.go where emits span six helper methods).

Wire-format change

Metric names move from flat keys to {controller_subscope}.{op}.{event}:

Before After
land_request_count land_controller.land.called
land_request_latency land_controller.land.latency
received (per controller) {controller}_controller.process.called
processed {controller}_controller.process.succeeded
deserialize_errors {controller}_controller.process.deserialize_errors
messages_received (consumer) consumer.process.messages_received
controller_latency (consumer) consumer.process.controller_latency
ack_count, nack_count, … consumer.process.ack_count, consumer.process.nack_count, …

Dashboards keyed on the old flat names will need updating; the success=true|false|cancel, operation=ack|nack, and per-controller tags on consumer metrics are preserved.

Test Plan

  • make build — all 135 targets build clean.
  • make test — all 34 test targets pass, including:
    • //core/consumer:consumer_test — verifies the migrated processDelivery still emits controller_latency (with success=true|false) and ack_count (substring matches survive the process. prefix).
    • //extension/mergechecker/github:github_test
    • //gateway/controller:controller_test, //orchestrator/controller:controller_test, //stovepipe/controller:controller_test
    • Per-stage controllers: batch, build, buildsignal, log, score, speculate, start
  • make gazelle, make fmt, make mocks, make tidy — re-running on top of the diff produces zero further changes.
  • make lint-license — clean.
  • No test assertions on the renamed metrics needed updating (consumer_test.go uses strings.Contains, which still matches the new fully-qualified names).

Issues

None.

Migrate every controller and the consumer framework off raw tally.Scope
calls onto the core/metrics helpers (Begin/Complete, NamedCounter,
NamedTimer). RPC controllers (land, ping) and queue controllers (batch,
build, start, score, speculate, buildsignal, log) now follow the same
Begin/Complete pattern as merge, so each Process call emits a
consistent {scope}.process.{called,succeeded,failed,latency,latency_histogram}
set with automatic error classification tags. The mergechecker's Check
gets the same treatment.

core/consumer/consumer.go is migrated to NamedCounter/NamedTimer rather
than Begin/Complete to preserve its custom success=true|false|cancel
and operation=ack|nack tagging.

Wire-format note: metric names move from flat keys (e.g. "received",
"land_request_count") to the {sub-scope}.{op}.{event} pattern (e.g.
"batch_controller.process.deserialize_errors"). Existing dashboards
keyed on the old names will need updating.
Each controller's metric emissions repeated the operation literal
("process", "land", "ping", "check") on every call. Hoist it to a
method-level `const opName = "..."` (file-level in speculate.go where
emissions span six helper methods), so the name is defined once per
file and every emit reads as opName.
@behinddwalls behinddwalls marked this pull request as ready for review May 28, 2026 19:48
@behinddwalls behinddwalls requested review from a team and sbalabanov as code owners May 28, 2026 19:48
@behinddwalls behinddwalls merged commit dcf9f6d into main May 29, 2026
13 checks passed
@github-actions github-actions Bot deleted the preetam/metrics-fix branch May 29, 2026 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants