feat(otel-export): exportEvalRuns — ship self-improvement provenance to Tangle Intelligence#73
Merged
Merged
Conversation
…to Tangle Intelligence (0.31.0) Adds a reusable client for Intelligence's first-class self-improvement record (POST /v1/ingest/eval-runs, 'Mode D'), alongside the existing OTLP span exporter. A consumer's RSI loop emits one EvalRunEvent per proposal generation (surfaceHash = proposed-change identity, surface = arbitrary provenance, labels.measured flags it unmeasured); a later gate-decided event re-emits the same runId (idempotent upsert) with a real gateDecision + holdoutLift, so proposal→verdict is one diffable record. Unlike the best-effort span exporter, exportEvalRuns RESOLVES with the ingest verdict (accepted/rejected per event) so a loop can assert its provenance landed. Reads TANGLE_API_KEY + INTELLIGENCE_BASE from env; tenant resolved server-side from the key. Wire version + X-Tangle-Wire-Version header handled. +4 tests (payload/header shape, 400 rejection passthrough, empty no-op). Makes Intelligence the de-facto provenance store for any agent-runtime consumer's self-improvement loop, not just one benchmark.
tangletools
pushed a commit
that referenced
this pull request
May 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds
exportEvalRuns(events, config)to@tangle-network/agent-runtime— a reusable client for Tangle Intelligence's first-class self-improvement record,POST /v1/ingest/eval-runs('Mode D'), alongside the existing OTLP span exporter (createOtelExporter).Why
Intelligence is becoming the de-facto store for agentic optimization + self-improvement. Today only OTLP traces have a first-class exporter; self-improvement runs (propose → gate → promote, with provenance) had no reusable client — each consumer hand-rolled a fetch. This makes shipping a self-improvement loop's provenance a one-import capability for every consumer.
Shape
surfaceHash= proposed-change identity;surface= arbitrary provenance;labels.measuredflags unmeasured proposals honestly.gate-decidedevent re-emits the same runId (idempotent upsert) with a realgateDecision+holdoutLift→ proposal→verdict is one diffable record (/v1/runs/diff).{ ok, status, accepted, rejected }) so a loop can assert its provenance landed.TANGLE_API_KEY+INTELLIGENCE_BASEfrom env; tenant resolved server-side from the key;X-Tangle-Wire-Versionhandled.Tests
+4 in
tests/otel-export.test.ts(extended, not forked): payload/header/wire-version shape, idempotency key, 400 per-event rejection passthrough, empty no-op. Full file 12/12 green. Typecheck clean. Additive — no change to existing exports. Version 0.31.0.Verified end-to-end against prod from a downstream consumer: a benchmark's RSI loop shipped 2 proposal-provenance records → 200 OK, queryable at
/v1/runs/<id>.