Skip to content

feat(0.12.0): production trace-sink — close the data-leak#18

Merged
tangletools merged 2 commits into
mainfrom
feat/production-trace-sink
May 20, 2026
Merged

feat(0.12.0): production trace-sink — close the data-leak#18
tangletools merged 2 commits into
mainfrom
feat/production-trace-sink

Conversation

@drewstone
Copy link
Copy Markdown
Contributor

Summary

Every production chat session has been emitting zero replayable trace data. Eval runs capture everything; production captures nothing. RL training, research analyses, and the self-improvement loop all run on synthetic personas only. This primitive turns every real user conversation into data the downstream channels (Prime Intellect, GEPA, research, canaries, analyst loop) can consume.

API

`createProductionTraceSink(opts)` returns:

  • `traceStore` — the in-memory store the agent's `TraceEmitter` writes to during a chat session (built on agent-eval's existing `InMemoryTraceStore`; no reinvention)
  • `onRunComplete` — `RunCompleteHook` the agent registers; on `endRun` composes a canonical `ProductionRunRecord`, persists to a durable store, and POSTs the run as OTLP to a configured collector (Langfuse, etc.)
  • `recordFeedback(input)` — appends a `FeedbackLabel` to the run's `FeedbackTrajectory`; creates the trajectory anchored to `runId` on first feedback

Per-agent wiring (~10 lines)

```ts
const sink = createProductionTraceSink({
projectId: 'tax-agent',
otlp: { endpoint: env.LANGFUSE_OTEL_ENDPOINT, authHeader: env.LANGFUSE_OTEL_AUTH },
runRecordStore: drizzleRunRecordStore(db),
feedbackStore: drizzleFeedbackStore(db),
})

const emitter = new TraceEmitter(sink.traceStore, {
onRunComplete: [sink.onRunComplete],
})
await emitter.startRun({ scenarioId: sessionId, projectId: 'tax-agent', layer: 'app-runtime' })
// ... existing chat flow ...
await emitter.endRun({ pass, score })
```

CF Worker semantics: `ctx.waitUntil` the hook from the chat handler.

Fail-loud where it matters

  • runRecordStore failures → logged, not thrown
  • OTLP POST failures (network / non-2xx) → logged, not thrown
  • feedbackStore failures → `null` returned, logged

Test plan

  • `pnpm test` — 144/144 pass (13 new under `tests/production-trace-sink.test.ts`)
  • `pnpm typecheck`
  • Bumps to 0.12.0

Next steps (separate PRs per agent)

Each of tax/legal/gtm/creative wires this into `packages/api-worker/src/services/agent-runtime/chat.ts` + adds the matching `runRecordStore`/`feedbackStore` Drizzle adapters + Langfuse env vars.

drewstone added 2 commits May 20, 2026 15:07
Every production chat session has been emitting zero replayable trace data.
Eval runs capture everything; production captures nothing. RL training,
research analyses, and the self-improvement loop all run on synthetic
personas. This primitive turns every real user conversation into data the
downstream channels (Prime Intellect, GEPA, research, canaries, analyst
loop) can consume.

`createProductionTraceSink(opts)` returns:
  - `traceStore` — the in-memory store the agent's TraceEmitter writes to
    during a chat session (built on agent-eval's existing InMemoryTraceStore;
    no reinvention)
  - `onRunComplete` — RunCompleteHook the agent registers; on endRun
    composes a canonical ProductionRunRecord, persists to a durable store,
    and POSTs the run as OTLP to a configured collector (Langfuse, etc.)
  - `recordFeedback(input)` — appends a FeedbackLabel to the run's
    FeedbackTrajectory; creates the trajectory anchored to runId on
    first feedback

Wiring is ~10 lines in each agent's production chat handler:

  const sink = createProductionTraceSink({
    projectId: 'tax-agent',
    otlp: { endpoint: env.LANGFUSE_OTEL_ENDPOINT, authHeader: env.LANGFUSE_OTEL_AUTH },
    runRecordStore: drizzleRunRecordStore(db),
    feedbackStore: drizzleFeedbackStore(db),
  })

  const emitter = new TraceEmitter(sink.traceStore, {
    onRunComplete: [sink.onRunComplete],
  })

Fail-loud everywhere it matters; fail-quiet only at the IO boundary:
  - runRecordStore failures → logged, not thrown (chat handler stays up)
  - OTLP POST failures (network/non-2xx) → logged, not thrown
  - feedbackStore failures → null returned, logged

13 new tests in `tests/production-trace-sink.test.ts` cover:
  - RunRecord composition for completed / failed / aborted
  - failureClass + notes propagation
  - runRecordStore throwing (hook stays alive)
  - OTLP POST shape (service.name in resource attrs, authorization header)
  - OTLP failure modes (network throw, non-2xx)
  - omitted otlp / omitted authHeader paths
  - recordFeedback create-then-append semantics
  - explicit trajectoryId honour
  - explicit trajectoryId honored

144/144 pass. Cloudflare Worker semantics intended: `ctx.waitUntil` the
hook from the chat handler so the worker stays alive long enough for
the OTLP POST + DB write to flush.

Bumps agent-runtime to 0.12.0.
@tangletools tangletools merged commit 0ed1406 into main May 20, 2026
1 check passed
@tangletools tangletools deleted the feat/production-trace-sink branch May 20, 2026 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants