Skip to content

feat(openai-agents): emit tool spans with correct type + durations#4062

Open
hansmire wants to merge 1 commit intotraceloop:mainfrom
hansmire:fix/openai-agents-tool-span-type-duration
Open

feat(openai-agents): emit tool spans with correct type + durations#4062
hansmire wants to merge 1 commit intotraceloop:mainfrom
hansmire:fix/openai-agents-tool-span-type-duration

Conversation

@hansmire
Copy link
Copy Markdown

@hansmire hansmire commented Apr 29, 2026

Summary

Two small, related fixes in FunctionSpanData handling:

  1. Tool spans weren't being recognised as tools by backends that derive their own span-type taxonomy from the OTel GenAI semantic conventions. Adding gen_ai.operation.name = \"execute_tool\" (the OTel semconv value for tool-execution spans) lets ingestion layers classify them correctly. Before this change, Braintrust's OTel ingestion classified our openai-agents tool spans as generic task because nothing signalled "tool"; the anthropic/langchain/pydantic-ai instrumentors all already produce tool-recognised spans.

  2. Tool spans had null / zero duration downstream. The instrumentor used OTel's default wall-clock inside on_span_start / on_span_end instead of the Span.started_at / Span.ended_at ISO timestamps the OpenAI Agents SDK already records on the span. Piping them through preserves the actual tool-invocation window.

Changes

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:

  • New helper _iso_to_nanoseconds(iso_str) — converts Agents-SDK ISO-8601 timestamps to integer ns for OTel; returns None on missing/unparseable.
  • FunctionSpanData branch in on_span_start: add GenAIAttributes.GEN_AI_OPERATION_NAME: \"execute_tool\" to the tool attribute dict and pass start_time=_iso_to_nanoseconds(span.started_at).
  • on_span_end: pass end_time=_iso_to_nanoseconds(span.ended_at) to otel_span.end(...).

Tests

Added test_iso_to_nanoseconds in tests/test_openai_agents.py covering None, empty, unparseable, epoch 0, trailing-Z, and microsecond-precision inputs.

uv run pytest tests/test_openai_agents.py::test_iso_to_nanoseconds -v
======================== 1 passed in 0.23s ========================
uv run ruff check opentelemetry/instrumentation/openai_agents/_hooks.py tests/test_openai_agents.py
All checks passed!

End-to-end verification

Installed this branch into a downstream agent repo and re-ran a tool-calling Agents SDK run. The resulting Braintrust trace:

Before:

[task]  get_linked_accounts.tool   dur=null   gen_ai.operation.name=null

After:

[tool]  get_linked_accounts.tool   dur=0.006s  gen_ai.operation.name=execute_tool

Non-tool span types (agent task, LLM llm) remain unchanged.

Related

Independent from but complementary to #4061 (cached_tokens / reasoning_tokens recording); happy to combine if preferred.

Summary by CodeRabbit

  • New Features

    • Improved tool span timing by parsing ISO-8601 timestamps to provide nanosecond-precision start/end times for spans.
    • Tool spans now include an explicit operation name attribute ("execute_tool") for clearer observability.
  • Tests

    • Added tests validating timestamp parsing (including edge cases) and verifying non-null, monotonic span timing.

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Max Hansmire seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e667cacd-daf9-46b0-82ab-f9149735a45e

📥 Commits

Reviewing files that changed from the base of the PR and between b1c5247 and ebc1930.

📒 Files selected for processing (2)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py

📝 Walkthrough

Walkthrough

Adds ISO‑8601 timestamp parsing to convert OpenAI Agents SDK Span.started_at/Span.ended_at into nanoseconds and use them for OpenTelemetry start_time/end_time on tool/function spans; sets GenAIAttributes.GEN_AI_OPERATION_NAME to "execute_tool". Parsing failures leave times unset.

Changes

Cohort / File(s) Summary
Instrumentation hook
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
Adds private helper _iso_to_nanoseconds to parse ISO‑8601 timestamps to ns. Uses parsed started_at when starting tool/function spans and uses parsed ended_at to end tracked spans with an explicit end_time. Sets GenAIAttributes.GEN_AI_OPERATION_NAME = "execute_tool" for tool spans. Gracefully falls back to leaving start/end times None on parse failure or missing values.
Tests
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
Extends tests to assert gen_ai.operation.name == "execute_tool" and that exported span start_time/end_time are present, positive, and non-decreasing. Adds test_iso_to_nanoseconds() covering null/unparseable inputs and multiple ISO‑8601 variants (Z, offsets, naive treated as UTC) with correct ns conversion.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nibble timestamps, parse each string with care,
From ISO fields to nanoseconds fair.
Tool spans start and finish with a hop and a tune,
I plant little traces beneath the moon.
Hooray — I mapped the seconds, now off for a nap 🥕✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(openai-agents): emit tool spans with correct type + durations' accurately describes the main changes: adding GenAI semantic convention metadata for tool classification and implementing proper timestamp parsing for span duration tracking.
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (1)

506-519: Add a regression assertion for the actual span behavior (not only the helper).

This test validates parsing, but the PR’s externally visible behavior is “tool span gets execute_tool + non-null duration.” Consider asserting that directly in test_agent_with_function_tool_spans to prevent regressions in wiring.

Possible assertion additions
@@ def test_agent_with_function_tool_spans(exporter, function_tool_agent):
     assert tool_span.attributes[GenAIAttributes.GEN_AI_TOOL_NAME] == "get_weather"
     assert tool_span.attributes[GenAIAttributes.GEN_AI_TOOL_TYPE] == "function"
+    assert tool_span.attributes[GenAIAttributes.GEN_AI_OPERATION_NAME] == "execute_tool"
+
+    tool_duration_ms = (tool_span.end_time - tool_span.start_time) / 1_000_000
+    assert tool_duration_ms > 0, f"Tool span should have positive duration, got {tool_duration_ms}ms"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py`
around lines 506 - 519, Add a regression assertion in
test_agent_with_function_tool_spans to verify the actual exported tool span has
the expected name and a non-null/non-zero duration: after invoking the agent,
locate the finished/exported spans (the same collection the test already
inspects), find the tool span and assert span.name == "execute_tool" and that
the span has a non-null/non-zero duration (e.g., duration or end_time -
start_time > 0 depending on span object fields). This ensures the integration
(not just _iso_to_nanoseconds) produces a tool span with a real duration.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`:
- Around line 48-63: The _iso_to_nanoseconds function currently uses
datetime.fromisoformat(...) and then dt.timestamp(), which treats timezone-naive
datetimes as local time; modify _iso_to_nanoseconds so after parsing (dt =
_dt.datetime.fromisoformat(s)) you explicitly handle tzinfo: if dt.tzinfo is
None, set dt = dt.replace(tzinfo=_dt.timezone.utc) (or return None if you prefer
rejecting naive values), then compute int(dt.timestamp() * 1_000_000_000);
reference symbols: _iso_to_nanoseconds, dt, fromisoformat, timestamp, and
_dt.timezone.utc. Ensure imports remain local to the function and preserve the
existing Z -> +00:00 normalization.

---

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py`:
- Around line 506-519: Add a regression assertion in
test_agent_with_function_tool_spans to verify the actual exported tool span has
the expected name and a non-null/non-zero duration: after invoking the agent,
locate the finished/exported spans (the same collection the test already
inspects), find the tool span and assert span.name == "execute_tool" and that
the span has a non-null/non-zero duration (e.g., duration or end_time -
start_time > 0 depending on span object fields). This ensures the integration
(not just _iso_to_nanoseconds) produces a tool span with a real duration.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3229d4ff-8539-45b5-94b2-3f5286add91b

📥 Commits

Reviewing files that changed from the base of the PR and between 3efa0eb and b1c5247.

📒 Files selected for processing (2)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py

Two related fixes to FunctionSpanData handling:

1. Add `gen_ai.operation.name = "execute_tool"` to tool span attributes.
   This is the OTel GenAI semantic-convention value for tool-execution
   spans. Backends that derive their own span-type taxonomy from this
   attribute (e.g. Braintrust's OTel ingestion) currently classify
   openai-agents tool spans as generic `task` because nothing signals
   "this is a tool". After this change Braintrust renders them as
   `tool`, matching the anthropic/langchain/etc. instrumentors here.

2. Plumb `Span.started_at` / `Span.ended_at` from the OpenAI Agents SDK
   through to OTel's span `start_time` / `end_time` for tool spans.
   Previously the instrumentor used wall-clock defaults inside the
   `on_span_start` / `on_span_end` callbacks, which lose the actual
   tool invocation window and produce null / zero durations downstream.

Adds a small ISO-8601 → nanoseconds-since-epoch helper
(`_iso_to_nanoseconds`) since the Agents SDK sets `started_at`/`ended_at`
as ISO strings and OTel expects integer ns. Helper returns None on
unparseable / missing input; callers fall back to the OTel default.

Includes unit tests for the helper.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants