v1.6.0 — Jingle Bills: Usage Tracking & Stream Cancellation
This release focuses on usage + cost tracking and stream cancellation (FastAPI + terminal demo), with improved hot reload, more reliable event/history handling (timestamps, orphan cleanup, LiteLLM ID normalization), and docs/test stabilization.
If you’re new to Agency Swarm, here’s what that means:
- Streaming cancellation: When you stream a response (token-by-token), you can stop it early—useful when you already have the answer or want to avoid spending extra tokens.
- Hot reload (terminal demo): Like FastAPI, the terminal demo now auto-restarts on local .py or .md file changes, so you can iterate faster.
- Usage + cost tracking: FastAPI responses and the terminal demo now include token totals and an estimated USD cost (multi-agent runs included). We use the real model prices, including models supported by LiteLLM.
- More reliable history: Messages keep the timestamp from when they were emitted, orphaned tool calls/outputs are removed, and LiteLLM placeholder IDs are normalized.
Features
- Add Token Usage + Cost Tracking (FastAPI + terminal demo; multi-agent aware) by @ArtemShatokhin in #489
- What you get: A
usagesummary with request count, token breakdown, and estimated cost. - Where you’ll see it: Returned by FastAPI endpoints and tracked in the terminal demo (use
/cost, and it’s also printed on exit). - Multi-agent aware: Includes tokens/cost from the main agent and any sub-agent calls made during the run.
- What you get: A
- Add Stream Cancellation support for FastAPI and terminal demo hot reload by @ArtemShatokhin in #469
- FastAPI streaming: The stream emits a
run_idearly; clients can cancel an in-flight run via a cancel endpoint. If the client disconnects, the run is cancelled automatically. - Cancel modes:
immediate(stop right away) andafter_turn(finish the current turn, then stop). - Terminal demo hot reload: Watches for
.pyand.mdchanges and restarts automatically during local development.
- FastAPI streaming: The stream emits a
- Add Cancel Feature to the Terminal Demo by @ArtemShatokhin in #474
- How to use it: Press
ESCduring streaming to cancel the current response. - Cleaner history after cancel: The demo filters out duplicate/orphaned tool call messages so the saved chat stays valid and replayable.
- How to use it: Press
Improvements
- Improve Terminal Tool Formatting by @ArtemShatokhin in #471
- Capture and assign timestamps at event emission time by @ArtemShatokhin in #485
- What changed: Each event/message now gets its timestamp when it’s emitted during streaming (not later during persistence).
- Why it matters: Sorting and replaying history is more consistent across streaming, saved threads, and UIs.
- Tool timelines: Hosted tool result preservation messages inherit the tool call’s timestamp for more accurate “when did this happen?” debugging.
- Orphan cleanup: Before saving history, the framework drops orphaned tool calls/outputs (e.g. a tool call without its matching output) to avoid replay errors.
- Add user_context forwarding to FastAPI endpoints by @bonk1t in #470
- Added include_search_results assignment to automatically created tools by @ArtemShatokhin in #468
- Add nest asyncio patch to IPython tool to simplify async function execution by @ArtemShatokhin in #494
Fixes
- LiteLLM fake id handling by @bonk1t in #479
- Problem: Some LiteLLM / Chat Completions streams reuse a placeholder id (
__fake_id__) across many output items, which breaks consumers that key by item id. - Fix: Placeholder ids are rewritten into stable per-item ids (tool calls prefer
call_id), so deduplication and message linking behave predictably.
- Problem: Some LiteLLM / Chat Completions streams reuse a placeholder id (
- fix: normalize LiteLLM placeholder ids by @bonk1t in #491
- fix: normalize agent template names and harden file cleanup by @bonk1t in #480
- Updated mcp error handling to match default function tool behavior by @ArtemShatokhin in #487
- Fix copilot demo packaging by @bonk1t in #476
- fix: restore copilot demo streaming by @bonk1t in #484
- fix: make sync APIs safe inside running event loop by @bonk1t in #496
Refactors / Chores
- refactor: enforce codebase consistency across imports, types, and structure by @bonk1t in #483
- Refactor: Consolidate File Utilities and Cleanup Imports by @bonk1t in #482
- set default truncation="auto" and update openai-agents to 0.6.2 by @bonk1t in #475
- chore: update default model to gpt-5.1 by @bonk1t in #477
- chore: deprecate agent send_message_tool_class by @bonk1t in #472
- Update model references to gpt-5.2 by @bonk1t in #495
Documentation
- Improve docs by @VRSEN in #467
- updated pricing section with new plans and credits system by @MykhailoShchuka in #486
Tests
These test changes focus on stability and determinism to reduce intermittent failures in CI.
- Fix context sharing integration test determinism by @bonk1t in #481
- Tests: stabilization and fixes by @bonk1t in #488
Full Changelog: v1.5.0...v1.6.0