feat: streaming-event telemetry collector + task.intent directive (0.5.6)#7
Merged
Conversation
…5.6)
Add createRuntimeStreamEventCollector — a sibling of
createRuntimeEventCollector typed for RuntimeStreamEvent. Honors the
same RuntimeTelemetryOptions redaction flags (includeInputs,
includeUserAnswers, includeControlPayloads, includeEvidenceIds,
includeMetadata, includeRequirementDescriptions, includeEvalDetails)
and returns the same {events, onEvent} interface plus a summary()
function that rolls up event counts, session id, final status, and
concatenated text_delta.text.
Sibling factory rather than overload because stream and non-stream
events have different field shapes (timestamps, sessions, text/tool
deltas) and overlapping type literals (task_start, readiness_end, …) —
a unified dispatcher would silently misroute events.
Adds the streaming-collector example mirror at
examples/sanitized-telemetry-streaming/. Documents in README that
task.intent flows through sanitized telemetry by default and must
never carry user input; route user-visible intent through inputs
(redacted by default) instead.
Bumps 0.5.4 → 0.5.6 (intentionally skipping 0.5.5; PR #6 currently
holds 0.5.5 and is expected to land in series).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
createRuntimeEventCollectoronly acceptsAgentRuntimeEvent(thesink-style events emitted by
runAgentTask). It does not handleRuntimeStreamEvent(the events yielded byrunAgentTaskStream).Every product agent that needs sanitized telemetry on top of streaming
has the same workaround in front of it: call
sanitizeRuntimeStreamEvent(event, options)event-by-event inside thefor awaitloop, then re-implement summary/aggregation by hand.The gtm-agent reference migration (
@tangle-network/agent-runtimeisalready a dep at
^0.5.3, but no streaming integration has been wiredyet — it would hit this exact gap when it does) and any future product
agent moving from
runAgentTasktorunAgentTaskStreamwould allre-derive the same boilerplate. Closing the gap in core now keeps the
opt-in redaction story consistent across both entry points.
API change
Adds
createRuntimeStreamEventCollector(options?: RuntimeTelemetryOptions)as a sibling factory to
createRuntimeEventCollector. It returns:Honors the same
RuntimeTelemetryOptionsredaction flags(
includeInputs,includeUserAnswers,includeControlPayloads,includeEvidenceIds,includeMetadata,includeRequirementDescriptions,includeEvalDetails). Thesummary()rollup giveseventCount,eventCountsByType,firstSessionId,finalStatus,finalReason, and concatenatedfinalTextfromtext_deltaevents.Sibling factory vs unified union
I considered three shapes:
types. Rejected: the stream and non-stream events share
typeliterals (
task_start,readiness_end,task_end) but withdifferent field shapes (
timestampandsessionon the streamside,
knowledgedecision on the stream side, etc.). A unifieddispatcher would have to discriminate on the presence of optional
fields, which is brittle and silently misroutes events. Consumers
would also lose precise types at the callsite.
callers. Rejected: a much larger blast radius for a packaging-
level improvement. Backward-incompatible for every existing
runAgentTaskconsumer.createRuntimeEventCollectorconsumers, identical opt-in semantics.README directive on
task.intenttask.intentflows through sanitized telemetry by default (it's thestable operation label, not redactable by
includeInputs). The README"Sanitized telemetry" section now states explicitly:
Same directive is repeated in the new example's README.
New example
examples/sanitized-telemetry-streaming/mirrorsexamples/sanitized-telemetry/for streaming. UsescreateIterableBackendto yield a synthetic script (text_delta,tool_call with sensitive args, tool_result with a secret token,
artifact with an internal s3 uri), runs cleanly with no creds, prints
both the default-redacted and verbose opt-in views plus the
summary()rollup. Wired intoexamples/README.md.Test plan
Added three vitest cases in
tests/runtime.test.ts:collector with sensitive
tool_call.args,tool_result.result,artifact.uri,artifact.metadata,task.inputs, andtask.metadata. Asserts the serialized events contain none of:rm -rf,sk-leaked,cat /etc/secret.txt,secret-bucket,cust-99,redact@example.com. This is the test that fails if weever leak user input.
includeInputs/includeControlPayloads/includeEvidenceIds/includeMetadataon. Asserts the previously-redacted fields are now present.
firstSessionId,finalStatus,finalReason,finalText, and reconcileseventCount === sum(eventCountsByType).All 19 tests pass (16 existing + 3 new):
pnpm typecheckandpnpm buildalso clean.The new example runs cleanly:
Default view:
"inputs":"[redacted]","metadata":"[redacted]",tool_callevents have noargsfield,tool_resultevents have noresultfield,artifactevents omituri/metadata. Verbose view:all fields visible. (Note: existing examples all use the same
pnpm tsxinvocation —tsxisn't a local bin, sopnpm dlx tsxmatches the pattern of every other example in the repo.)
Versioning note
This PR ships 0.5.4 → 0.5.6, intentionally skipping 0.5.5. PR #6
currently holds the 0.5.5 bump for the
agent-eval/agent-knowledgedep tree unification. If PR #6 lands first, the version delta becomes
0.5.5 → 0.5.6 (the file already reads 0.5.6 so no rebase action
needed). If this PR lands first, PR #6's
0.5.4 → 0.5.5bump becomesa no-op vs. main and that PR's author should rebase / collapse the
version bump. Coordinated with the PR #6 author via this note.