Skip to content

fix(openai): instrument responses.parse() for structured-output tracing#4198

Merged
galzilber merged 3 commits into
mainfrom
gz/instrument-openai-responses-parse
May 28, 2026
Merged

fix(openai): instrument responses.parse() for structured-output tracing#4198
galzilber merged 3 commits into
mainfrom
gz/instrument-openai-responses-parse

Conversation

@galzilber
Copy link
Copy Markdown
Contributor

@galzilber galzilber commented May 28, 2026

Summary

  • openai.responses.parse() (the canonical structured-output method for the Responses API + Pydantic models) was silently uninstrumented — calls produced no LLM span, token usage, or cost.
  • Only Responses.create / retrieve / cancel were patched in v1/__init__.py. This wires .parse (sync + async) through the existing responses_get_or_create_wrapper and adds matching unwrap entries.
  • ParsedResponse subclasses Response, so the existing wrapper handles the response shape without changes.

responses.stream() is similarly uninstrumented but has a context-manager shape — tracked as a follow-up.

Test plan

  • New test_responses_parse_emits_span — uses httpx.MockTransport so it runs in CI without an OPENAI_API_KEY or VCR cassette, asserts the span is emitted with gen_ai.request.model + gen_ai.usage.input_tokens + gen_ai.usage.output_tokens.
  • Verified TDD red→green: test fails (got 0 spans) without the _try_wrap lines, passes with them.
  • Full test_responses.py (24 tests) passes — no regressions.
  • Ruff clean.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Instrumentation added for OpenAI response parsing (sync and async), so parsing operations are traced and uninstrumented consistently, improving telemetry for structured response handling.
  • Tests

    • Added integration tests and recorded fixtures (sync and async) verifying parsing instrumentation, asserting generated tracing spans, provider/model attributes, and input/output usage and parsed output.

Review Change Stack

Uploading image.png…

Calls to openai.responses.parse() (the canonical structured-output entry
point for the Responses API) produced no LLM span, token usage, or cost,
because only Responses.create / retrieve / cancel were patched. Wire
.parse (sync + async) through the existing responses_get_or_create_wrapper
and add matching unwrap entries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 336af51c-4e08-4f7f-8849-0d5e3c02b511

📥 Commits

Reviewing files that changed from the base of the PR and between 99c3aa8 and db22494.

📒 Files selected for processing (1)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py

📝 Walkthrough

Walkthrough

Wraps OpenAI v1 Responses.parse and AsyncResponses.parse with existing response wrappers to emit OpenTelemetry spans, and adds sync/async tests plus cassette fixtures that validate structured-output parsing (text_format=Person) produces a single openai.response span with usage attributes.

Changes

Responses.parse() Instrumentation

Layer / File(s) Summary
Wrap parse() methods in instrumentor
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py
Responses.parse and AsyncResponses.parse are wrapped with responses_get_or_create_wrapper and async_responses_get_or_create_wrapper during instrumentation, and unwrapped in uninstrumentation.
Test parse() instrumentation with structured output + cassettes
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py, packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/*
Adds a Pydantic Person model and sync/async tests calling responses.parse(text_format=Person), plus two cassette YAML fixtures recording the /v1/responses request/response used by those tests.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • traceloop/openllmetry#4078: Adjusts response wrapper functions to safely handle APIResponse/AsyncAPIResponse unwrapping for parse() results, complementing this PR's instrumentation wiring.

Suggested reviewers

  • nina-kollman
  • doronkopit5
  • max-deygin-traceloop
  • netanel-tl

Poem

🐰 I hopped through code to wrap each parse,
Sync and async now leave a bright trace,
Pydantic names and tokens counted,
Tests and cassettes snug in place,
Hooray — spans dance in the telemetry space!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: instrumenting responses.parse() for structured-output tracing in the OpenAI integration.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch gz/instrument-openai-responses-parse

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py`:
- Around line 471-474: Replace the hardcoded API key passed to the OpenAI
constructor in the test (the OpenAI(...) call) with a value read from the
environment; update the instantiation to source api_key from an environment
variable (e.g., via os.getenv("OPENAI_API_KEY") or os.environ["OPENAI_API_KEY"])
and, if needed for CI, provide a non-secret fallback via test configuration
rather than a literal string in code so no API key is committed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 731a76c4-4162-44fd-bc26-8f5b5e2d001e

📥 Commits

Reviewing files that changed from the base of the PR and between b39151b and 50b21ab.

📒 Files selected for processing (2)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py

Comment thread packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a cassete

Replace MockTransport-based test with VCR-recorded sync + async tests
that exercise the real OpenAI Responses API. Cassettes scrub auth
headers via the existing conftest filter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

spans = span_exporter.get_finished_spans()
assert len(spans) == 1, f"expected one openai.response span, got {len(spans)}"
span = spans[0]
assert span.name == "openai.response"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess we want to assert something about the instrumentation in matter of span input messages/output messages to make the test stronger
same about the async test

Copy link
Copy Markdown
Member

@doronkopit5 doronkopit5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just add more assertions

Strengthen the parse tests to verify the user prompt is captured under
gen_ai.input.messages and the structured JSON response under
gen_ai.output.messages, and that the captured text round-trips back to
the Pydantic model and matches response.output_parsed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

@galzilber galzilber merged commit 375a4d5 into main May 28, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants