Skip to content

feat(examples): self-improving-loop — 4-package composition demo (the 100x post artifact)#58

Merged
drewstone merged 1 commit into
mainfrom
feat/example-self-improving-loop
May 25, 2026
Merged

feat(examples): self-improving-loop — 4-package composition demo (the 100x post artifact)#58
drewstone merged 1 commit into
mainfrom
feat/example-self-improving-loop

Conversation

@drewstone
Copy link
Copy Markdown
Contributor

Summary

Single runnable file that wires all four substrate packages into one self-improving loop:

  • @tangle-network/sandboxAgentProfile substrate type (flows unwrapped through everything)
  • @tangle-network/agent-eval/multishotrunMultishot + runJudge
  • @tangle-network/agent-runtime/analyst-loop — pattern referenced in the analyst phase
  • @tangle-network/agent-knowledge — referenced in the README cross-walk for the researcher path

The staff audit (/.evolve/audits/2026-05-25-claude-staff-audit.md) flagged the missing composition demo as the 100x-post-worthy artifact. This is it.

What it shows

Six phases in ~180 LOC:

  1. baseline AgentProfile v0
  2. runMultishot across 3 personas + 1 conversation judge
  3. analyst reads transcripts + scores, proposes a systemPrompt mutation
  4. applyMutation → AgentProfile v1
  5. re-run multishot with v1
  6. gate compares v0 vs v1 means → ship / hold

Default = offline + reproducible

Scripted LLM responses keep the demo deterministic. TANGLE_API_KEY=... MOCK=0 runs it live.

Smoke output

v0 mean: 3.17 → v1 mean: 8.50 (delta +5.33) → gate ships v1

Verified

  • pnpm typecheck
  • pnpm test ✓ — 284 tests
  • pnpm tsx examples/self-improving-loop/self-improving-loop.ts

Other changes

  • Bumped dev dep @tangle-network/agent-eval ^0.33.1 → ^0.38.0 so the example resolves /multishot.

Single runnable file that wires @tangle-network/agent-runtime +
@tangle-network/agent-eval + @tangle-network/agent-knowledge +
@tangle-network/sandbox into one self-improving loop.

What it shows:
1. baseline AgentProfile v0 (sandbox substrate type)
2. runMultishot across 3 personas + 1 judge (agent-eval/multishot)
3. analyst phase reads transcripts → proposes systemPrompt mutation
4. applyMutation → AgentProfile v1
5. re-run multishot with v1
6. gate compares v0 vs v1 means → ship / hold

Default mode runs offline with scripted LLM responses (reproducible demo);
TANGLE_API_KEY=... MOCK=0 runs against the real router.

Verified live:
- pnpm typecheck clean (after dev-dep bump agent-eval ^0.33.1 → ^0.38.0)
- pnpm test — 284 tests pass
- pnpm tsx examples/self-improving-loop/ produces:
    v0 mean: 3.17 → v1 mean: 8.50 (delta +5.33) → gate ships v1

README diagrams the substrate composition + maps each phase to its
substrate primitive. Cross-links to agent-stack-adoption skill for the
end-to-end 10-phase production runbook.
@drewstone drewstone merged commit ada4074 into main May 25, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant