Skip to content

feat(observability): tag every router log with session_key + add turn-flow I/O logs#244

Merged
steventohme merged 2 commits into
mainfrom
steven/router-session-tag-logs
May 25, 2026
Merged

feat(observability): tag every router log with session_key + add turn-flow I/O logs#244
steventohme merged 2 commits into
mainfrom
steven/router-session-tag-logs

Conversation

@steventohme
Copy link
Copy Markdown
Collaborator

Summary

Every router log line in internal/proxy/ is now tagged with session_key, request_id, api_key_id, and ingress. One Cloud Logging filter on jsonPayload.session_key=… will surface the full trace of a session through the router — including planner, scorer, pin lookup, handover, and dispatch — instead of needing to grep across unrelated lines.

How it works

  • New observability.WithLogger(ctx, log) / observability.FromContext(ctx) in internal/observability/logger.go. Drops a *slog.Logger onto the request context; FromContext returns it (or the global default if nothing's set). Existing Get() and FromGin() still work.
  • New bindRequestLogger helper (internal/proxy/session_key.go) derives the session key once and returns a context carrying a logger pre-bound with session_key/request_id/api_key_id/ingress.
  • Called at the top of ProxyMessages, ProxyOpenAIChatCompletion, and ProxyGeminiGenerateContent — right after envelope parse, before any routing logic.
  • Migrated internal/proxy/* from observability.Get() to observability.FromContext(ctx) everywhere ctx was already in scope. A few helpers (writeNewPin, refreshPin, enqueuePinUpsert, logPlannerOutcome, handleForceModelCommand, handleToolCallLoopBreak, handleNoProgressBreak) gained a leading ctx param so they inherit the bound logger.

Turn-flow I/O logs (Debug)

Added structured Debug logs at the points where turn flow makes decisions, so a session's path can be reconstructed from logs alone:

  • Each proxy entry point: Proxy{Messages,OpenAIChatCompletion,GeminiGenerateContent} start with requested model, stream flag, message count, has_tools, token estimate, and a 200-char prompt preview.
  • runTurnLoop: turn-type classification, pin lookup hit/miss (with model/provider/reason/age), scorer decision, and tier-clamp events.
  • logInboundToolTraffic: dumps the trailing 5 assistant tool_use names + 160-char arg previews when tools are present, so a misbehaving turn can be correlated back to the prior tool_use/tool_result shape without dumping the full body.

Full upstream bodies remain behind `LOG_LEVEL=debug` via the existing `logUpstreamBody`.

What this does NOT do (follow-ups)

  • Outbound assistant tool_use blocks aren't logged. `log_io.go` has `logAssistantOutputSummary` ready, but wiring it to streaming SSE responses needs work in the SSE chain — left for a follow-up.
  • LOG_LEVEL → GCP `severity` field mapping isn't here. Today everything still lands as `severity=DEFAULT` in Cloud Logging. Filter on `jsonPayload.session_key` works either way; severity-based filtering is a separate JSON-handler change.

Test plan

  • `go build ./...` clean
  • `go test ./...` clean across all packages
  • Verify in staging that a single session_key appears across planner, pin, dispatch, and complete log lines for one turn

@steventohme steventohme merged commit 4e8cbed into main May 25, 2026
7 checks passed
Comment thread internal/proxy/service.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant