fix(openai): instrument responses.parse() for structured-output tracing#4198
Conversation
Calls to openai.responses.parse() (the canonical structured-output entry point for the Responses API) produced no LLM span, token usage, or cost, because only Responses.create / retrieve / cancel were patched. Wire .parse (sync + async) through the existing responses_get_or_create_wrapper and add matching unwrap entries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughWraps OpenAI v1 ChangesResponses.parse() Instrumentation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py`:
- Around line 471-474: Replace the hardcoded API key passed to the OpenAI
constructor in the test (the OpenAI(...) call) with a value read from the
environment; update the instantiation to source api_key from an environment
variable (e.g., via os.getenv("OPENAI_API_KEY") or os.environ["OPENAI_API_KEY"])
and, if needed for CI, provide a non-secret fallback via test configuration
rather than a literal string in code so no API key is committed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 731a76c4-4162-44fd-bc26-8f5b5e2d001e
📒 Files selected for processing (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.pypackages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py
Replace MockTransport-based test with VCR-recorded sync + async tests that exercise the real OpenAI Responses API. Cassettes scrub auth headers via the existing conftest filter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Actionable comments posted: 0 |
| spans = span_exporter.get_finished_spans() | ||
| assert len(spans) == 1, f"expected one openai.response span, got {len(spans)}" | ||
| span = spans[0] | ||
| assert span.name == "openai.response" |
There was a problem hiding this comment.
i guess we want to assert something about the instrumentation in matter of span input messages/output messages to make the test stronger
same about the async test
Strengthen the parse tests to verify the user prompt is captured under gen_ai.input.messages and the structured JSON response under gen_ai.output.messages, and that the captured text round-trips back to the Pydantic model and matches response.output_parsed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Actionable comments posted: 0 |
Summary
openai.responses.parse()(the canonical structured-output method for the Responses API + Pydantic models) was silently uninstrumented — calls produced no LLM span, token usage, or cost.Responses.create/retrieve/cancelwere patched inv1/__init__.py. This wires.parse(sync + async) through the existingresponses_get_or_create_wrapperand adds matchingunwrapentries.ParsedResponsesubclassesResponse, so the existing wrapper handles the response shape without changes.responses.stream()is similarly uninstrumented but has a context-manager shape — tracked as a follow-up.Test plan
test_responses_parse_emits_span— useshttpx.MockTransportso it runs in CI without anOPENAI_API_KEYor VCR cassette, asserts the span is emitted withgen_ai.request.model+gen_ai.usage.input_tokens+gen_ai.usage.output_tokens.got 0 spans) without the_try_wraplines, passes with them.test_responses.py(24 tests) passes — no regressions.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests