feat(openai-agents): emit tool spans with correct type + durations by hansmire · Pull Request #4062 · traceloop/openllmetry

hansmire · 2026-04-29T00:57:31Z

Summary

Two small, related fixes in FunctionSpanData handling:

Tool spans weren't being recognised as tools by backends that derive their own span-type taxonomy from the OTel GenAI semantic conventions. Adding gen_ai.operation.name = \"execute_tool\" (the OTel semconv value for tool-execution spans) lets ingestion layers classify them correctly. Before this change, Braintrust's OTel ingestion classified our openai-agents tool spans as generic task because nothing signalled "tool"; the anthropic/langchain/pydantic-ai instrumentors all already produce tool-recognised spans.
Tool spans had null / zero duration downstream. The instrumentor used OTel's default wall-clock inside on_span_start / on_span_end instead of the Span.started_at / Span.ended_at ISO timestamps the OpenAI Agents SDK already records on the span. Piping them through preserves the actual tool-invocation window.

Changes

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:

New helper _iso_to_nanoseconds(iso_str) — converts Agents-SDK ISO-8601 timestamps to integer ns for OTel; returns None on missing/unparseable.
FunctionSpanData branch in on_span_start: add GenAIAttributes.GEN_AI_OPERATION_NAME: \"execute_tool\" to the tool attribute dict and pass start_time=_iso_to_nanoseconds(span.started_at).
on_span_end: pass end_time=_iso_to_nanoseconds(span.ended_at) to otel_span.end(...).

Tests

Added test_iso_to_nanoseconds in tests/test_openai_agents.py covering None, empty, unparseable, epoch 0, trailing-Z, and microsecond-precision inputs.

uv run pytest tests/test_openai_agents.py::test_iso_to_nanoseconds -v
======================== 1 passed in 0.23s ========================
uv run ruff check opentelemetry/instrumentation/openai_agents/_hooks.py tests/test_openai_agents.py
All checks passed!

End-to-end verification

Installed this branch into a downstream agent repo and re-ran a tool-calling Agents SDK run. The resulting Braintrust trace:

Before:

[task]  get_linked_accounts.tool   dur=null   gen_ai.operation.name=null

After:

[tool]  get_linked_accounts.tool   dur=0.006s  gen_ai.operation.name=execute_tool

Non-tool span types (agent task, LLM llm) remain unchanged.

Summary by CodeRabbit

New Features
- Improved tool span timing by parsing ISO-8601 timestamps to provide nanosecond-precision start/end times for spans.
- Tool spans now include an explicit operation name attribute ("execute_tool") for clearer observability.
Tests
- Added tests validating timestamp parsing (including edge cases) and verifying non-null, monotonic span timing.

CLAassistant · 2026-04-29T00:57:37Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Max Hansmire seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

coderabbitai · 2026-04-29T00:57:44Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e667cacd-daf9-46b0-82ab-f9149735a45e

📥 Commits

Reviewing files that changed from the base of the PR and between b1c5247 and ebc1930.

📒 Files selected for processing (2)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py

📝 Walkthrough

Walkthrough

Adds ISO‑8601 timestamp parsing to convert OpenAI Agents SDK Span.started_at/Span.ended_at into nanoseconds and use them for OpenTelemetry start_time/end_time on tool/function spans; sets GenAIAttributes.GEN_AI_OPERATION_NAME to "execute_tool". Parsing failures leave times unset.

Changes

Cohort / File(s)	Summary
Instrumentation hook `packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`	Adds private helper `_iso_to_nanoseconds` to parse ISO‑8601 timestamps to ns. Uses parsed `started_at` when starting tool/function spans and uses parsed `ended_at` to end tracked spans with an explicit `end_time`. Sets `GenAIAttributes.GEN_AI_OPERATION_NAME = "execute_tool"` for tool spans. Gracefully falls back to leaving start/end times `None` on parse failure or missing values.
Tests `packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py`	Extends tests to assert `gen_ai.operation.name == "execute_tool"` and that exported span `start_time`/`end_time` are present, positive, and non-decreasing. Adds `test_iso_to_nanoseconds()` covering null/unparseable inputs and multiple ISO‑8601 variants (Z, offsets, naive treated as UTC) with correct ns conversion.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nibble timestamps, parse each string with care,
From ISO fields to nanoseconds fair.
Tool spans start and finish with a hop and a tune,
I plant little traces beneath the moon.
Hooray — I mapped the seconds, now off for a nap 🥕✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(openai-agents): emit tool spans with correct type + durations' accurately describes the main changes: adding GenAI semantic convention metadata for tool classification and implementing proper timestamp parsing for span duration tracking.
Docstring Coverage	✅ Passed	Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (1)

506-519: Add a regression assertion for the actual span behavior (not only the helper).

This test validates parsing, but the PR’s externally visible behavior is “tool span gets execute_tool + non-null duration.” Consider asserting that directly in test_agent_with_function_tool_spans to prevent regressions in wiring.

Possible assertion additions

@@ def test_agent_with_function_tool_spans(exporter, function_tool_agent):
     assert tool_span.attributes[GenAIAttributes.GEN_AI_TOOL_NAME] == "get_weather"
     assert tool_span.attributes[GenAIAttributes.GEN_AI_TOOL_TYPE] == "function"
+    assert tool_span.attributes[GenAIAttributes.GEN_AI_OPERATION_NAME] == "execute_tool"
+
+    tool_duration_ms = (tool_span.end_time - tool_span.start_time) / 1_000_000
+    assert tool_duration_ms > 0, f"Tool span should have positive duration, got {tool_duration_ms}ms"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py`
around lines 506 - 519, Add a regression assertion in
test_agent_with_function_tool_spans to verify the actual exported tool span has
the expected name and a non-null/non-zero duration: after invoking the agent,
locate the finished/exported spans (the same collection the test already
inspects), find the tool span and assert span.name == "execute_tool" and that
the span has a non-null/non-zero duration (e.g., duration or end_time -
start_time > 0 depending on span object fields). This ensures the integration
(not just _iso_to_nanoseconds) produces a tool span with a real duration.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`:
- Around line 48-63: The _iso_to_nanoseconds function currently uses
datetime.fromisoformat(...) and then dt.timestamp(), which treats timezone-naive
datetimes as local time; modify _iso_to_nanoseconds so after parsing (dt =
_dt.datetime.fromisoformat(s)) you explicitly handle tzinfo: if dt.tzinfo is
None, set dt = dt.replace(tzinfo=_dt.timezone.utc) (or return None if you prefer
rejecting naive values), then compute int(dt.timestamp() * 1_000_000_000);
reference symbols: _iso_to_nanoseconds, dt, fromisoformat, timestamp, and
_dt.timezone.utc. Ensure imports remain local to the function and preserve the
existing Z -> +00:00 normalization.

---

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py`:
- Around line 506-519: Add a regression assertion in
test_agent_with_function_tool_spans to verify the actual exported tool span has
the expected name and a non-null/non-zero duration: after invoking the agent,
locate the finished/exported spans (the same collection the test already
inspects), find the tool span and assert span.name == "execute_tool" and that
the span has a non-null/non-zero duration (e.g., duration or end_time -
start_time > 0 depending on span object fields). This ensures the integration
(not just _iso_to_nanoseconds) produces a tool span with a real duration.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3229d4ff-8539-45b5-94b2-3f5286add91b

📥 Commits

Reviewing files that changed from the base of the PR and between 3efa0eb and b1c5247.

📒 Files selected for processing (2)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py

Two related fixes to FunctionSpanData handling: 1. Add `gen_ai.operation.name = "execute_tool"` to tool span attributes. This is the OTel GenAI semantic-convention value for tool-execution spans. Backends that derive their own span-type taxonomy from this attribute (e.g. Braintrust's OTel ingestion) currently classify openai-agents tool spans as generic `task` because nothing signals "this is a tool". After this change Braintrust renders them as `tool`, matching the anthropic/langchain/etc. instrumentors here. 2. Plumb `Span.started_at` / `Span.ended_at` from the OpenAI Agents SDK through to OTel's span `start_time` / `end_time` for tool spans. Previously the instrumentor used wall-clock defaults inside the `on_span_start` / `on_span_end` callbacks, which lose the actual tool invocation window and produce null / zero durations downstream. Adds a small ISO-8601 → nanoseconds-since-epoch helper (`_iso_to_nanoseconds`) since the Agents SDK sets `started_at`/`ended_at` as ISO strings and OTel expects integer ns. Helper returns None on unparseable / missing input; callers fall back to the OTel default. Includes unit tests for the helper.

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread ...elemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py

hansmire force-pushed the fix/openai-agents-tool-span-type-duration branch from b1c5247 to ebc1930 Compare April 29, 2026 01:12

This was referenced Apr 29, 2026

feat(openai-agents): capture tool call arguments + result on tool spans #4063

Draft

feat(openai-agents): record response-identification metadata on LLM spans #4065

Draft

feat(openai-agents): emit structured handoffs + output_type on agent spans #4066

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(openai-agents): emit tool spans with correct type + durations#4062

feat(openai-agents): emit tool spans with correct type + durations#4062
hansmire wants to merge 1 commit intotraceloop:mainfrom
hansmire:fix/openai-agents-tool-span-type-duration

hansmire commented Apr 29, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

CLAassistant commented Apr 29, 2026

Uh oh!

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hansmire commented Apr 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Tests

End-to-end verification

Related

Summary by CodeRabbit

Uh oh!

CLAassistant commented Apr 29, 2026

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hansmire commented Apr 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading