Skip to content

feat(metrics): worker resource dashboard + pprof#58

Merged
setkyar merged 2 commits into
mainfrom
feat/worker-dashboard
Jun 5, 2026
Merged

feat(metrics): worker resource dashboard + pprof#58
setkyar merged 2 commits into
mainfrom
feat/worker-dashboard

Conversation

@setkyar
Copy link
Copy Markdown
Contributor

@setkyar setkyar commented Jun 5, 2026

Summary

  • Add a standalone, auth-gated worker metrics dashboard for inspecting running pi --mode rpc workers and overall process health — useful for spotting resource hogs, zombie/leaked workers, and diagnosing slowness.
  • GET /metrics serves a self-contained HTML page (polls every 2s); GET /api/metrics returns a non-blocking JSON snapshot: process goroutines/heap/SSE-clients/watched-files plus per-worker CPU/RSS via gopsutil.
  • CPU% is derived server-side from cumulative-CPU deltas (with a pruned per-PID baseline cache) to avoid gopsutil's blocking Percent(interval). Idle workers alive past the reap TTL are flagged as zombies.
  • Mount Go's profiler under GET /api/debug/pprof/ (auth-gated) for deep dives; the dashboard footer links to it.

Design notes

  • workers.Manager.Snapshot() returns one WorkerSnapshot per live worker. PID/uptime/idle come from an optional inspector interface implemented by the real rpc worker; test fakes that don't implement it report zeros.
  • The OS sampler sits behind a processSampler interface (gopsutil-backed in prod, swappable for tests via Server.SetMetricsSampler).
  • pprof.Index hard-codes the /debug/pprof/ prefix, so the index handler is mounted with /api stripped; cmdline/profile/symbol/trace are registered directly.
  • gopsutil adds only small, pure-Go (no cgo) deps on macOS/Linux. See docs/dev/metrics-dashboard.md.

Test plan

  • make check (vitest + go test + vet + build) green
  • Go unit tests: Manager.Snapshot (inspector / non-inspector / empty); /api/metrics JSON shape, auth gating, zombie flag, running-never-zombie, sampler-error degradation, CPU%-delta math, cache pruning; pprof index + named profile + auth gating
  • Live smoke (isolated agent dir): /api/metrics valid JSON, /api/debug/pprof/ → 200, /api/debug/pprof/heap → real profile, /metrics footer links to pprof
  • Reviewer: open /metrics in a browser with a couple of active chat workers and confirm CPU%/RSS/zombie rendering

setkyar added 2 commits June 5, 2026 15:43
Add a standalone, auth-gated dashboard for inspecting running
`pi --mode rpc` workers and process health. /api/metrics returns a
non-blocking JSON snapshot (process goroutines/heap/SSE clients plus
per-worker CPU/RSS via gopsutil); /metrics serves a self-contained
polling page. CPU% is derived server-side from cumulative-CPU deltas to
avoid gopsutil's blocking sampler, and idle workers past the reap TTL are
flagged as zombies.
Expose Go's runtime profiler for deep "why is the app slow" dives. The
index handler is mounted with the /api segment stripped because
pprof.Index hard-codes the /debug/pprof/ prefix when routing to named
profiles; cmdline/profile/symbol/trace are registered directly. All
endpoints go through the existing auth middleware. The dashboard footer
links to the profiler.
@setkyar setkyar merged commit ecf42be into main Jun 5, 2026
2 checks passed
@setkyar setkyar deleted the feat/worker-dashboard branch June 5, 2026 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant