feat(openai-agents): emit tool spans with correct type + durations#4062
feat(openai-agents): emit tool spans with correct type + durations#4062hansmire wants to merge 1 commit intotraceloop:mainfrom
Conversation
|
Max Hansmire seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds ISO‑8601 timestamp parsing to convert OpenAI Agents SDK Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (1)
506-519: Add a regression assertion for the actual span behavior (not only the helper).This test validates parsing, but the PR’s externally visible behavior is “tool span gets
execute_tool+ non-null duration.” Consider asserting that directly intest_agent_with_function_tool_spansto prevent regressions in wiring.Possible assertion additions
@@ def test_agent_with_function_tool_spans(exporter, function_tool_agent): assert tool_span.attributes[GenAIAttributes.GEN_AI_TOOL_NAME] == "get_weather" assert tool_span.attributes[GenAIAttributes.GEN_AI_TOOL_TYPE] == "function" + assert tool_span.attributes[GenAIAttributes.GEN_AI_OPERATION_NAME] == "execute_tool" + + tool_duration_ms = (tool_span.end_time - tool_span.start_time) / 1_000_000 + assert tool_duration_ms > 0, f"Tool span should have positive duration, got {tool_duration_ms}ms"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py` around lines 506 - 519, Add a regression assertion in test_agent_with_function_tool_spans to verify the actual exported tool span has the expected name and a non-null/non-zero duration: after invoking the agent, locate the finished/exported spans (the same collection the test already inspects), find the tool span and assert span.name == "execute_tool" and that the span has a non-null/non-zero duration (e.g., duration or end_time - start_time > 0 depending on span object fields). This ensures the integration (not just _iso_to_nanoseconds) produces a tool span with a real duration.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`:
- Around line 48-63: The _iso_to_nanoseconds function currently uses
datetime.fromisoformat(...) and then dt.timestamp(), which treats timezone-naive
datetimes as local time; modify _iso_to_nanoseconds so after parsing (dt =
_dt.datetime.fromisoformat(s)) you explicitly handle tzinfo: if dt.tzinfo is
None, set dt = dt.replace(tzinfo=_dt.timezone.utc) (or return None if you prefer
rejecting naive values), then compute int(dt.timestamp() * 1_000_000_000);
reference symbols: _iso_to_nanoseconds, dt, fromisoformat, timestamp, and
_dt.timezone.utc. Ensure imports remain local to the function and preserve the
existing Z -> +00:00 normalization.
---
Nitpick comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py`:
- Around line 506-519: Add a regression assertion in
test_agent_with_function_tool_spans to verify the actual exported tool span has
the expected name and a non-null/non-zero duration: after invoking the agent,
locate the finished/exported spans (the same collection the test already
inspects), find the tool span and assert span.name == "execute_tool" and that
the span has a non-null/non-zero duration (e.g., duration or end_time -
start_time > 0 depending on span object fields). This ensures the integration
(not just _iso_to_nanoseconds) produces a tool span with a real duration.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3229d4ff-8539-45b5-94b2-3f5286add91b
📒 Files selected for processing (2)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
Two related fixes to FunctionSpanData handling: 1. Add `gen_ai.operation.name = "execute_tool"` to tool span attributes. This is the OTel GenAI semantic-convention value for tool-execution spans. Backends that derive their own span-type taxonomy from this attribute (e.g. Braintrust's OTel ingestion) currently classify openai-agents tool spans as generic `task` because nothing signals "this is a tool". After this change Braintrust renders them as `tool`, matching the anthropic/langchain/etc. instrumentors here. 2. Plumb `Span.started_at` / `Span.ended_at` from the OpenAI Agents SDK through to OTel's span `start_time` / `end_time` for tool spans. Previously the instrumentor used wall-clock defaults inside the `on_span_start` / `on_span_end` callbacks, which lose the actual tool invocation window and produce null / zero durations downstream. Adds a small ISO-8601 → nanoseconds-since-epoch helper (`_iso_to_nanoseconds`) since the Agents SDK sets `started_at`/`ended_at` as ISO strings and OTel expects integer ns. Helper returns None on unparseable / missing input; callers fall back to the OTel default. Includes unit tests for the helper.
b1c5247 to
ebc1930
Compare
Summary
Two small, related fixes in
FunctionSpanDatahandling:Tool spans weren't being recognised as tools by backends that derive their own span-type taxonomy from the OTel GenAI semantic conventions. Adding
gen_ai.operation.name = \"execute_tool\"(the OTel semconv value for tool-execution spans) lets ingestion layers classify them correctly. Before this change, Braintrust's OTel ingestion classified our openai-agents tool spans as generictaskbecause nothing signalled "tool"; the anthropic/langchain/pydantic-ai instrumentors all already produce tool-recognised spans.Tool spans had null / zero duration downstream. The instrumentor used OTel's default wall-clock inside
on_span_start/on_span_endinstead of theSpan.started_at/Span.ended_atISO timestamps the OpenAI Agents SDK already records on the span. Piping them through preserves the actual tool-invocation window.Changes
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:_iso_to_nanoseconds(iso_str)— converts Agents-SDK ISO-8601 timestamps to integer ns for OTel; returns None on missing/unparseable.FunctionSpanDatabranch inon_span_start: addGenAIAttributes.GEN_AI_OPERATION_NAME: \"execute_tool\"to the tool attribute dict and passstart_time=_iso_to_nanoseconds(span.started_at).on_span_end: passend_time=_iso_to_nanoseconds(span.ended_at)tootel_span.end(...).Tests
Added
test_iso_to_nanosecondsintests/test_openai_agents.pycovering None, empty, unparseable, epoch 0, trailing-Z, and microsecond-precision inputs.End-to-end verification
Installed this branch into a downstream agent repo and re-ran a tool-calling Agents SDK run. The resulting Braintrust trace:
Before:
After:
Non-tool span types (agent
task, LLMllm) remain unchanged.Related
Independent from but complementary to #4061 (cached_tokens / reasoning_tokens recording); happy to combine if preferred.
Summary by CodeRabbit
New Features
Tests