Skip to content

feat(openai-agents): record response-identification metadata on LLM spans#4065

Draft
hansmire wants to merge 1 commit intotraceloop:mainfrom
hansmire:fix/openai-agents-response-metadata
Draft

feat(openai-agents): record response-identification metadata on LLM spans#4065
hansmire wants to merge 1 commit intotraceloop:mainfrom
hansmire:fix/openai-agents-response-metadata

Conversation

@hansmire
Copy link
Copy Markdown

@hansmire hansmire commented Apr 29, 2026

Summary

The Responses API Response object carries several fields that downstream trace backends rely on for turn chaining, model-version debugging, and reasoning / service-tier visibility, but that this instrumentor dropped on the floor. This PR plumbs them through to OTel span attributes on every LLM span.

Response field Span attribute
response.id gen_ai.response.id
response.model gen_ai.response.model (kept gen_ai.request.model for back-compat)
response.status gen_ai.response.status
response.previous_response_id gen_ai.request.previous_response_id
response.service_tier gen_ai.openai.request.service_tier
response.reasoning.effort gen_ai.request.reasoning_effort
response.reasoning.summary gen_ai.request.reasoning_summary

All additions are defensive: when a field is missing / None on the response, no attribute is emitted (no stringified "None" values — pinned by a regression test).

Before / After

Same 529 eval, same agent, same LLM-span detail panel in the Braintrust trace UI:

PR #4065 before/after

Before the LLM span's Metadata panel contains only gen_ai.request.model, temperature, top_p, usage.* — no response identification, no service tier, no reasoning config.
After the panel additionally renders gen_ai.response.id, gen_ai.response.model, gen_ai.response.status, gen_ai.openai.request.service_tier, and gen_ai.request.reasoning_effort — exposing turn chain, served-model version, and reasoning/service-tier settings for debugging.

Why

Braintrust's own native Agents SDK processor surfaces the full response.model_dump(exclude={"input","output","metadata","usage"}) on every LLM span — which is how its UI shows turn-by-turn chains and the exact model version that served each request. Previously openllmetry's equivalent openai.response span carried only temperature, top_p, max_tokens and a conflated gen_ai.request.model. Trace backends couldn't:

  • Chain turns (no response.id / previous_response_id) — critical for debugging agents that use auto_previous_response_id=True.
  • Tell what actually ran (only request.model was set; the specific served version like gpt-5.4-2026-03-05 was lost).
  • See request config (service tier, reasoning effort / summary all absent).

Constants

OTel semconv constants are used where available:

  • GenAIAttributes.GEN_AI_RESPONSE_ID
  • GenAIAttributes.GEN_AI_RESPONSE_MODEL
  • GenAIAttributes.GEN_AI_OPENAI_REQUEST_SERVICE_TIER
  • SpanAttributes.LLM_REQUEST_REASONING_EFFORT
  • SpanAttributes.LLM_REQUEST_REASONING_SUMMARY

Two fields fall back to string literals because semconv_ai doesn't publish a constant yet:

  • gen_ai.response.status
  • gen_ai.request.previous_response_id

Happy to add them upstream in semconv_ai in a follow-up if preferred.

Tests

Two new direct unit tests on _extract_response_attributes (no VCR needed):

  • test_extract_response_captures_response_identification_fields — feeds a SimpleNamespace with every field set, asserts each maps to the correct span attribute with the expected value.
  • test_extract_response_absent_fields_dont_set_attributes — regression guard for the None-passthrough branches.

All 12 tests in tests/test_openai_agents.py pass locally. uv run ruff check clean.

Notes

Part of a small series of openai-agents parity fixes (#4061 cached_tokens + reasoning_tokens, #4062 tool span type + duration, #4063 tool span input + output). Each stands alone off main and can be merged in any order.

…pans

The Responses API `Response` object carries several fields that downstream
trace backends rely on for turn chaining, model-version debugging and
reasoning/service-tier visibility, but that this instrumentor dropped on
the floor.  Concretely:

  * `response.id`                 — `gen_ai.response.id`
  * `response.model`              — `gen_ai.response.model`
    (kept existing `gen_ai.request.model` for back-compat)
  * `response.status`             — `gen_ai.response.status`
  * `response.previous_response_id`
                                  — `gen_ai.request.previous_response_id`
  * `response.service_tier`       — `gen_ai.openai.request.service_tier`
  * `response.reasoning.effort`   — `gen_ai.request.reasoning_effort`
  * `response.reasoning.summary`  — `gen_ai.request.reasoning_summary`

For comparison, Braintrust's native Agents SDK processor surfaces the
full `response.model_dump(exclude={"input","output","metadata","usage"})`
on every LLM span — which is how its UI shows turn-by-turn chains and
the exact model version that served each request.  Previously
openllmetry's equivalent span carried only `temperature`, `top_p`,
`max_tokens` and a conflated `gen_ai.request.model`.

All additions are defensive: when a field is missing / None on the
response, no attribute is emitted (no stringified "None" values).
Fields are set via existing OTel semconv constants where available
(`GEN_AI_RESPONSE_ID`, `GEN_AI_RESPONSE_MODEL`,
`GEN_AI_OPENAI_REQUEST_SERVICE_TIER`, `LLM_REQUEST_REASONING_EFFORT`,
`LLM_REQUEST_REASONING_SUMMARY`) and as string literals for the two
fields without published constants yet (`gen_ai.response.status`,
`gen_ai.request.previous_response_id`).

Includes two direct unit tests on `_extract_response_attributes` that
pin the attribute mapping contract and guard against regressions when
a field is absent.
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Max Hansmire seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 808c43c0-7175-4a50-a8c6-8926e993703d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

hansmire pushed a commit to hansmire/openllmetry that referenced this pull request Apr 29, 2026
hansmire pushed a commit to hansmire/openllmetry that referenced this pull request Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants