Skip to content

feat(openai-agents): emit structured handoffs + output_type on agent spans#4066

Draft
hansmire wants to merge 1 commit intotraceloop:mainfrom
hansmire:fix/openai-agents-agent-metadata
Draft

feat(openai-agents): emit structured handoffs + output_type on agent spans#4066
hansmire wants to merge 1 commit intotraceloop:mainfrom
hansmire:fix/openai-agents-agent-metadata

Conversation

@hansmire
Copy link
Copy Markdown

@hansmire hansmire commented Apr 29, 2026

Summary

The Agents SDK's AgentSpanData documents its handoffs field as list[str] (the names of handoff target agents). The current instrumentor assumed each entry was an Agent object and read .name, which silently produced "unknown" for every handoff when the SDK follows its documented contract. It also stored each handoff under a separately-indexed attribute (openai.agent.handoff0, ...1, ...2, ...) nested as a JSON blob, which trace UIs can't easily aggregate across.

It also dropped AgentSpanData.output_type entirely — which Braintrust's native Agents-SDK processor surfaces via _agent_log_data.

This PR:

  1. Normalises handoffs extraction — strings pass through, Agent-like objects fall back to .name, unknown types produce "unknown".
  2. Emits a unified gen_ai.agent.handoffs JSON-list attribute so backends can show the full handoff target list at a glance. The legacy per-index openai.agent.handoffN attributes are kept for back-compat with existing dashboards.
  3. Emits gen_ai.agent.output_type from AgentSpanData.output_type.

Both emissions are defensive: absent/empty values skip the attribute entirely (no stringified "None" / empty-list snuck into metadata).

Before / After

In the Braintrust UI

Side-by-side shots of the agent-span Metadata panel for the same Super Agent run, before vs. after this PR:

Agent-span metadata UI before/after

Beforeopenai.agent.handoff0 renders as {name: "unknown", instructions: "No instructions"}. Neither gen_ai.agent.handoffs nor gen_ai.agent.output_type appears.

Afteropenai.agent.handoff0.name is the real agent name ("Super Agent" for this self-handoff case), gen_ai.agent.handoffs shows the full target list, and gen_ai.agent.output_type is captured.

Attribute-level diff

Same trace, attribute keys present on the Super Agent.agent span:

Agent-span metadata before/after

5 keys → 7 keys.

Tests

  • Extended the existing VCR-backed test_agent_with_function_tool_spans to pin gen_ai.agent.output_type == "str" and assert the absent-handoffs regression guard (no stringified None on an empty list).
  • Added a direct unit test test_agent_span_attributes_handoffs_from_agent_objects that pins the handoff-name normalisation for both list[str] (documented) and list[Agent] (legacy).

All 11 tests in tests/test_openai_agents.py pass locally. uv run ruff check clean.

End-to-end verification

Installed this branch into a downstream agent repo and re-ran an agent with a self-handoff (activate_code_interpreter handoff-to-self). Before the patch: openai.agent.handoff0 = {"name": "unknown", "instructions": "No instructions"}. After: gen_ai.agent.handoffs = ["Super Agent"], openai.agent.handoff0 = {"name": "Super Agent"}, gen_ai.agent.output_type = "str".

Notes

Part of a small series of openai-agents parity fixes (#4061 cached_tokens + reasoning_tokens, #4062 tool span type + duration, #4063 tool span input + output, #4065 LLM span response metadata). Each stands alone off main and can be merged in any order.

Scope check: AgentSpanData.tools is NOT currently populated by the Agents SDK's Runner (verified live — the field reads None even for agents with non-empty tool lists), so it's not a parity delta worth closing downstream. Both Braintrust's native processor and this instrumentor would benefit from an upstream SDK fix that plumbs the registered tool names into AgentSpanData.tools; I can file that separately if useful.

…spans

The Agents SDK's `AgentSpanData` documents its `handoffs` field as
``list[str]`` (the names of handoff target agents).  The current
instrumentor assumed each entry was an ``Agent`` object and read a
`.name` attribute, which silently produced ``"unknown"`` for every
handoff when the SDK follows its documented contract.  It also stored
each handoff under a separately-indexed attribute name
(``openai.agent.handoff0``, ``...1``, ``...2``, ...) nested as a JSON
blob, which downstream UIs can't easily aggregate across.

This patch:

1. Normalises `handoffs` extraction — strings pass through, Agent-like
   objects fall back to `.name`, unknown types produce ``"unknown"``.
2. Emits a unified ``gen_ai.agent.handoffs`` JSON-list attribute so
   backends can show the full handoff target list at a glance.  The
   legacy per-index ``openai.agent.handoffN`` attributes are kept for
   back-compat with existing dashboards.
3. Emits ``gen_ai.agent.output_type`` capturing
   `AgentSpanData.output_type`, matching what Braintrust's native
   Agents-SDK processor logs via `_agent_log_data` and that this
   instrumentor was previously dropping.

Both emissions are defensive: absent/empty values skip the attribute
entirely (no stringified ``None`` / empty-list snuck into metadata).

Tests extend the existing VCR-backed `test_agent_with_function_tool_spans`
to pin `gen_ai.agent.output_type == "str"` on the WeatherAgent span and
assert the absent-handoffs regression guard; a new direct unit test
pins the handoff-name normalisation for both string and Agent-object
inputs.
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Max Hansmire seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bd9f0e6d-ca77-4be0-8841-0bc3015b0b62

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

hansmire pushed a commit to hansmire/openllmetry that referenced this pull request Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants