Skip to content

feat: streaming-event telemetry collector + task.intent directive (0.5.6)#7

Merged
drewstone merged 2 commits into
mainfrom
feat/stream-event-collector
May 10, 2026
Merged

feat: streaming-event telemetry collector + task.intent directive (0.5.6)#7
drewstone merged 2 commits into
mainfrom
feat/stream-event-collector

Conversation

@drewstone
Copy link
Copy Markdown
Contributor

Why

createRuntimeEventCollector only accepts AgentRuntimeEvent (the
sink-style events emitted by runAgentTask). It does not handle
RuntimeStreamEvent (the events yielded by runAgentTaskStream).
Every product agent that needs sanitized telemetry on top of streaming
has the same workaround in front of it: call
sanitizeRuntimeStreamEvent(event, options) event-by-event inside the
for await loop, then re-implement summary/aggregation by hand.

The gtm-agent reference migration (@tangle-network/agent-runtime is
already a dep at ^0.5.3, but no streaming integration has been wired
yet — it would hit this exact gap when it does) and any future product
agent moving from runAgentTask to runAgentTaskStream would all
re-derive the same boilerplate. Closing the gap in core now keeps the
opt-in redaction story consistent across both entry points.

API change

Adds createRuntimeStreamEventCollector(options?: RuntimeTelemetryOptions)
as a sibling factory to createRuntimeEventCollector. It returns:

{
  events: Array<Record<string, unknown>>
  onEvent(event: RuntimeStreamEvent): void
  summary(): RuntimeStreamEventSummary
}

Honors the same RuntimeTelemetryOptions redaction flags
(includeInputs, includeUserAnswers, includeControlPayloads,
includeEvidenceIds, includeMetadata,
includeRequirementDescriptions, includeEvalDetails). The
summary() rollup gives eventCount, eventCountsByType,
firstSessionId, finalStatus, finalReason, and concatenated
finalText from text_delta events.

Sibling factory vs unified union

I considered three shapes:

  1. Unified factory with a discriminated union over both event
    types.
    Rejected: the stream and non-stream events share type
    literals (task_start, readiness_end, task_end) but with
    different field shapes (timestamp and session on the stream
    side, knowledge decision on the stream side, etc.). A unified
    dispatcher would have to discriminate on the presence of optional
    fields, which is brittle and silently misroutes events. Consumers
    would also lose precise types at the callsite.
  2. Refactor the two event types into one shape and migrate both
    callers.
    Rejected: a much larger blast radius for a packaging-
    level improvement. Backward-incompatible for every existing
    runAgentTask consumer.
  3. Sibling factory. Picked. Clear typing, zero impact on existing
    createRuntimeEventCollector consumers, identical opt-in semantics.

README directive on task.intent

task.intent flows through sanitized telemetry by default (it's the
stable operation label, not redactable by includeInputs). The README
"Sanitized telemetry" section now states explicitly:

Never set task.intent to user input — use a fixed string
describing the operation kind (e.g. "Run a chat turn", "Score a tax return"). If you need to log user-visible intent, route it
through inputs (which are redacted by default) instead.

Same directive is repeated in the new example's README.

New example

examples/sanitized-telemetry-streaming/ mirrors
examples/sanitized-telemetry/ for streaming. Uses
createIterableBackend to yield a synthetic script (text_delta,
tool_call with sensitive args, tool_result with a secret token,
artifact with an internal s3 uri), runs cleanly with no creds, prints
both the default-redacted and verbose opt-in views plus the
summary() rollup. Wired into examples/README.md.

Test plan

Added three vitest cases in tests/runtime.test.ts:

  • Redaction contract (load-bearing). Runs a stream through the
    collector with sensitive tool_call.args, tool_result.result,
    artifact.uri, artifact.metadata, task.inputs, and
    task.metadata. Asserts the serialized events contain none of:
    rm -rf, sk-leaked, cat /etc/secret.txt, secret-bucket,
    cust-99, redact@example.com. This is the test that fails if we
    ever leak user input.
  • Opt-in surfaces specific fields. Same setup, with
    includeInputs/includeControlPayloads/includeEvidenceIds/includeMetadata
    on. Asserts the previously-redacted fields are now present.
  • Summary rollup. Asserts firstSessionId, finalStatus,
    finalReason, finalText, and reconciles
    eventCount === sum(eventCountsByType).

All 19 tests pass (16 existing + 3 new):

Test Files  1 passed (1)
     Tests  19 passed (19)

pnpm typecheck and pnpm build also clean.

The new example runs cleanly:

pnpm dlx tsx examples/sanitized-telemetry-streaming/sanitized-telemetry-streaming.ts

Default view: "inputs":"[redacted]", "metadata":"[redacted]",
tool_call events have no args field, tool_result events have no
result field, artifact events omit uri/metadata. Verbose view:
all fields visible. (Note: existing examples all use the same
pnpm tsx invocation — tsx isn't a local bin, so pnpm dlx tsx
matches the pattern of every other example in the repo.)

Versioning note

This PR ships 0.5.4 → 0.5.6, intentionally skipping 0.5.5. PR #6
currently holds the 0.5.5 bump for the agent-eval / agent-knowledge
dep tree unification. If PR #6 lands first, the version delta becomes
0.5.5 → 0.5.6 (the file already reads 0.5.6 so no rebase action
needed). If this PR lands first, PR #6's 0.5.4 → 0.5.5 bump becomes
a no-op vs. main and that PR's author should rebase / collapse the
version bump. Coordinated with the PR #6 author via this note.

drewstone added 2 commits May 10, 2026 16:01
…5.6)

Add createRuntimeStreamEventCollector — a sibling of
createRuntimeEventCollector typed for RuntimeStreamEvent. Honors the
same RuntimeTelemetryOptions redaction flags (includeInputs,
includeUserAnswers, includeControlPayloads, includeEvidenceIds,
includeMetadata, includeRequirementDescriptions, includeEvalDetails)
and returns the same {events, onEvent} interface plus a summary()
function that rolls up event counts, session id, final status, and
concatenated text_delta.text.

Sibling factory rather than overload because stream and non-stream
events have different field shapes (timestamps, sessions, text/tool
deltas) and overlapping type literals (task_start, readiness_end, …) —
a unified dispatcher would silently misroute events.

Adds the streaming-collector example mirror at
examples/sanitized-telemetry-streaming/. Documents in README that
task.intent flows through sanitized telemetry by default and must
never carry user input; route user-visible intent through inputs
(redacted by default) instead.

Bumps 0.5.4 → 0.5.6 (intentionally skipping 0.5.5; PR #6 currently
holds 0.5.5 and is expected to land in series).
@drewstone drewstone merged commit 4f5181a into main May 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant