fix(openai): instrument responses.parse() for structured-output tracing by galzilber · Pull Request #4198 · traceloop/openllmetry

galzilber · 2026-05-28T09:37:40Z

Summary

openai.responses.parse() (the canonical structured-output method for the Responses API + Pydantic models) was silently uninstrumented — calls produced no LLM span, token usage, or cost.
Only Responses.create / retrieve / cancel were patched in v1/__init__.py. This wires .parse (sync + async) through the existing responses_get_or_create_wrapper and adds matching unwrap entries.
ParsedResponse subclasses Response, so the existing wrapper handles the response shape without changes.

responses.stream() is similarly uninstrumented but has a context-manager shape — tracked as a follow-up.

Test plan

New test_responses_parse_emits_span — uses httpx.MockTransport so it runs in CI without an OPENAI_API_KEY or VCR cassette, asserts the span is emitted with gen_ai.request.model + gen_ai.usage.input_tokens + gen_ai.usage.output_tokens.
Verified TDD red→green: test fails (got 0 spans) without the _try_wrap lines, passes with them.
Full test_responses.py (24 tests) passes — no regressions.
Ruff clean.

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Instrumentation added for OpenAI response parsing (sync and async), so parsing operations are traced and uninstrumented consistently, improving telemetry for structured response handling.
Tests
- Added integration tests and recorded fixtures (sync and async) verifying parsing instrumentation, asserting generated tracing spans, provider/model attributes, and input/output usage and parsed output.

Calls to openai.responses.parse() (the canonical structured-output entry point for the Responses API) produced no LLM span, token usage, or cost, because only Responses.create / retrieve / cancel were patched. Wire .parse (sync + async) through the existing responses_get_or_create_wrapper and add matching unwrap entries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-28T09:37:58Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 336af51c-4e08-4f7f-8849-0d5e3c02b511

📥 Commits

Reviewing files that changed from the base of the PR and between 99c3aa8 and db22494.

📒 Files selected for processing (1)

packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py

🚧 Files skipped from review as they are similar to previous changes (1)

packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py

📝 Walkthrough

Walkthrough

Wraps OpenAI v1 Responses.parse and AsyncResponses.parse with existing response wrappers to emit OpenTelemetry spans, and adds sync/async tests plus cassette fixtures that validate structured-output parsing (text_format=Person) produces a single openai.response span with usage attributes.

Changes

Responses.parse() Instrumentation

Layer / File(s)	Summary
Wrap parse() methods in instrumentor `packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py`	`Responses.parse` and `AsyncResponses.parse` are wrapped with `responses_get_or_create_wrapper` and `async_responses_get_or_create_wrapper` during instrumentation, and unwrapped in uninstrumentation.
Test parse() instrumentation with structured output + cassettes `packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py`, `packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses/*`	Adds a Pydantic `Person` model and sync/async tests calling `responses.parse(text_format=Person)`, plus two cassette YAML fixtures recording the `/v1/responses` request/response used by those tests.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

traceloop/openllmetry#4078: Adjusts response wrapper functions to safely handle APIResponse/AsyncAPIResponse unwrapping for parse() results, complementing this PR's instrumentation wiring.

Suggested reviewers

nina-kollman
doronkopit5
max-deygin-traceloop
netanel-tl

Poem

🐰 I hopped through code to wrap each parse,
Sync and async now leave a bright trace,
Pydantic names and tokens counted,
Tests and cassettes snug in place,
Hooray — spans dance in the telemetry space!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: instrumenting responses.parse() for structured-output tracing in the OpenAI integration.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch gz/instrument-openai-responses-parse

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py`:
- Around line 471-474: Replace the hardcoded API key passed to the OpenAI
constructor in the test (the OpenAI(...) call) with a value read from the
environment; update the instantiation to source api_key from an environment
variable (e.g., via os.getenv("OPENAI_API_KEY") or os.environ["OPENAI_API_KEY"])
and, if needed for CI, provide a non-secret fallback via test configuration
rather than a literal string in code so no API key is committed.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 731a76c4-4162-44fd-bc26-8f5b5e2d001e

📥 Commits

Reviewing files that changed from the base of the PR and between b39151b and 50b21ab.

📒 Files selected for processing (2)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py

doronkopit5 · 2026-05-28T10:01:34Z

use a cassete

Replace MockTransport-based test with VCR-recorded sync + async tests that exercise the real OpenAI Responses API. Cassettes scrub auth headers via the existing conftest filter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-28T14:13:47Z

Actionable comments posted: 0

doronkopit5 · 2026-05-28T14:31:13Z

+    spans = span_exporter.get_finished_spans()
+    assert len(spans) == 1, f"expected one openai.response span, got {len(spans)}"
+    span = spans[0]
+    assert span.name == "openai.response"


i guess we want to assert something about the instrumentation in matter of span input messages/output messages to make the test stronger
same about the async test

doronkopit5

just add more assertions

Strengthen the parse tests to verify the user prompt is captured under gen_ai.input.messages and the structured JSON response under gen_ai.output.messages, and that the captured text round-trips back to the Pydantic model and matches response.output_parsed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-28T14:39:10Z

Actionable comments posted: 0

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

Comment thread packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py Outdated

doronkopit5 reviewed May 28, 2026

View reviewed changes

Comment thread packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py

Copy link
Copy Markdown

Member

doronkopit5 May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a cassete

doronkopit5 reviewed May 28, 2026

View reviewed changes

doronkopit5 approved these changes May 28, 2026

View reviewed changes

galzilber merged commit 375a4d5 into main May 28, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(openai): instrument responses.parse() for structured-output tracing#4198

fix(openai): instrument responses.parse() for structured-output tracing#4198
galzilber merged 3 commits into
mainfrom
gz/instrument-openai-responses-parse

galzilber commented May 28, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 28, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

doronkopit5 May 28, 2026

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

doronkopit5 May 28, 2026

Uh oh!

doronkopit5 left a comment

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

galzilber commented May 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

doronkopit5 May 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

doronkopit5 May 28, 2026

Choose a reason for hiding this comment

Uh oh!

doronkopit5 left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

galzilber commented May 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 28, 2026 •

edited

Loading