feat(openai-agents): emit structured handoffs + output_type on agent spans#4066
feat(openai-agents): emit structured handoffs + output_type on agent spans#4066hansmire wants to merge 1 commit intotraceloop:mainfrom
Conversation
…spans The Agents SDK's `AgentSpanData` documents its `handoffs` field as ``list[str]`` (the names of handoff target agents). The current instrumentor assumed each entry was an ``Agent`` object and read a `.name` attribute, which silently produced ``"unknown"`` for every handoff when the SDK follows its documented contract. It also stored each handoff under a separately-indexed attribute name (``openai.agent.handoff0``, ``...1``, ``...2``, ...) nested as a JSON blob, which downstream UIs can't easily aggregate across. This patch: 1. Normalises `handoffs` extraction — strings pass through, Agent-like objects fall back to `.name`, unknown types produce ``"unknown"``. 2. Emits a unified ``gen_ai.agent.handoffs`` JSON-list attribute so backends can show the full handoff target list at a glance. The legacy per-index ``openai.agent.handoffN`` attributes are kept for back-compat with existing dashboards. 3. Emits ``gen_ai.agent.output_type`` capturing `AgentSpanData.output_type`, matching what Braintrust's native Agents-SDK processor logs via `_agent_log_data` and that this instrumentor was previously dropping. Both emissions are defensive: absent/empty values skip the attribute entirely (no stringified ``None`` / empty-list snuck into metadata). Tests extend the existing VCR-backed `test_agent_with_function_tool_spans` to pin `gen_ai.agent.output_type == "str"` on the WeatherAgent span and assert the absent-handoffs regression guard; a new direct unit test pins the handoff-name normalisation for both string and Agent-object inputs.
|
Max Hansmire seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
The Agents SDK's
AgentSpanDatadocuments itshandoffsfield aslist[str](the names of handoff target agents). The current instrumentor assumed each entry was anAgentobject and read.name, which silently produced"unknown"for every handoff when the SDK follows its documented contract. It also stored each handoff under a separately-indexed attribute (openai.agent.handoff0,...1,...2, ...) nested as a JSON blob, which trace UIs can't easily aggregate across.It also dropped
AgentSpanData.output_typeentirely — which Braintrust's native Agents-SDK processor surfaces via_agent_log_data.This PR:
handoffsextraction — strings pass through, Agent-like objects fall back to.name, unknown types produce"unknown".gen_ai.agent.handoffsJSON-list attribute so backends can show the full handoff target list at a glance. The legacy per-indexopenai.agent.handoffNattributes are kept for back-compat with existing dashboards.gen_ai.agent.output_typefromAgentSpanData.output_type.Both emissions are defensive: absent/empty values skip the attribute entirely (no stringified
"None"/ empty-list snuck into metadata).Before / After
In the Braintrust UI
Side-by-side shots of the agent-span Metadata panel for the same Super Agent run, before vs. after this PR:
Before —
openai.agent.handoff0renders as{name: "unknown", instructions: "No instructions"}. Neithergen_ai.agent.handoffsnorgen_ai.agent.output_typeappears.After —
openai.agent.handoff0.nameis the real agent name ("Super Agent"for this self-handoff case),gen_ai.agent.handoffsshows the full target list, andgen_ai.agent.output_typeis captured.Attribute-level diff
Same trace, attribute keys present on the
Super Agent.agentspan:5 keys → 7 keys.
Tests
test_agent_with_function_tool_spansto pingen_ai.agent.output_type == "str"and assert the absent-handoffs regression guard (no stringifiedNoneon an empty list).test_agent_span_attributes_handoffs_from_agent_objectsthat pins the handoff-name normalisation for bothlist[str](documented) andlist[Agent](legacy).All 11 tests in
tests/test_openai_agents.pypass locally.uv run ruff checkclean.End-to-end verification
Installed this branch into a downstream agent repo and re-ran an agent with a self-handoff (
activate_code_interpreterhandoff-to-self). Before the patch:openai.agent.handoff0 = {"name": "unknown", "instructions": "No instructions"}. After:gen_ai.agent.handoffs = ["Super Agent"],openai.agent.handoff0 = {"name": "Super Agent"},gen_ai.agent.output_type = "str".Notes
Part of a small series of
openai-agentsparity fixes (#4061 cached_tokens + reasoning_tokens, #4062 tool span type + duration, #4063 tool span input + output, #4065 LLM span response metadata). Each stands alone offmainand can be merged in any order.Scope check:
AgentSpanData.toolsis NOT currently populated by the Agents SDK's Runner (verified live — the field readsNoneeven for agents with non-empty tool lists), so it's not a parity delta worth closing downstream. Both Braintrust's native processor and this instrumentor would benefit from an upstream SDK fix that plumbs the registered tool names intoAgentSpanData.tools; I can file that separately if useful.