fix(anthropic): streaming base64 images inflate input token count#3975
fix(anthropic): streaming base64 images inflate input token count#3975karthikbolla wants to merge 3 commits intotraceloop:mainfrom
Conversation
The streaming _set_token_usage() function in streaming.py was computing
input_tokens as:
input_tokens = prompt_tokens + cache_read_tokens + cache_creation_tokens
where prompt_tokens = usage["input_tokens"] from the Anthropic API's
message_start SSE event. However, the API's input_tokens field already
represents the total billed input token count, which includes image
tokens and cached tokens. Adding the cache sub-fields on top caused
double-counting that inflated the span's GEN_AI_USAGE_INPUT_TOKENS
attribute.
For a 500x500 image request, this produced 1,633 in the OTel span
(343 text tokens + ~1,290 image tokens counted twice) instead of the
correct 343 that the API reports.
Fixed by using usage["input_tokens"] directly as input_tokens without
adding cache sub-fields. The cache_read_input_tokens and
cache_creation_input_tokens are still recorded as dedicated span
attributes for cache-hit observability.
Applied the same fix consistently to _set_token_usage() and
_aset_token_usage() in __init__.py.
Fixes traceloop#3949
|
karthik bolla seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughInput token computation for Anthropic streaming was changed: spans now use Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (1)
2834-2836: Add a span-shape assertion before indexingspans[0].Line 2835 assumes at least one finished span. A direct shape assertion makes this test less brittle and failures clearer.
Proposed test hardening
spans = span_exporter.get_finished_spans() + assert [span.name for span in spans] == ["anthropic.chat"] anthropic_span = spans[0]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py` around lines 2834 - 2836, Add an explicit shape assertion before indexing spans[0] to avoid an IndexError and make failures clearer: after calling span_exporter.get_finished_spans() assert that spans is non-empty (e.g. assert spans or assert len(spans) >= 1) before assigning anthropic_span = spans[0]; reference the variables span_exporter, get_finished_spans, spans, and anthropic_span when locating the insertion point.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py`:
- Line 2842: The file ends without a trailing newline causing Ruff W292; fix by
adding a single newline character at the end of
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (i.e.,
after the final closing ")" shown in the diff) so the file terminates with a
newline.
---
Nitpick comments:
In `@packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py`:
- Around line 2834-2836: Add an explicit shape assertion before indexing
spans[0] to avoid an IndexError and make failures clearer: after calling
span_exporter.get_finished_spans() assert that spans is non-empty (e.g. assert
spans or assert len(spans) >= 1) before assigning anthropic_span = spans[0];
reference the variables span_exporter, get_finished_spans, spans, and
anthropic_span when locating the insertion point.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8a192af8-2666-42d1-87b2-bca451dc9b25
📒 Files selected for processing (4)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.pypackages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.pypackages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_messages/test_anthropic_streaming_base64_image_token_count_legacy.yamlpackages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (1)
2843-2843:⚠️ Potential issue | 🟡 MinorAdd trailing newline at EOF (Ruff W292).
Line 2843 still leaves the file without a final newline.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py` at line 2843, The file packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py is missing a trailing newline at EOF; fix this by editing test_messages.py and ensure the file ends with a single newline character (add a final blank line at the end of the file so the last line terminates properly), then re-run linting to confirm Ruff W292 is resolved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py`:
- Around line 2825-2832: The test currently invokes
anthropic_client.messages.create(..., stream=True) but the regression affects
the messages.stream() path; update or add a test that calls
anthropic_client.messages.stream(model="claude-haiku-4-5-20251001",
max_tokens=100, messages=messages) (or equivalent named parameters) and iterate
over the returned iterator (for _ in response: pass) to exercise the direct
messages.stream() API; ensure the new test uses the same inputs as the existing
base64-image test so the exact API path is covered.
---
Duplicate comments:
In `@packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py`:
- Line 2843: The file
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py is
missing a trailing newline at EOF; fix this by editing test_messages.py and
ensure the file ends with a single newline character (add a final blank line at
the end of the file so the last line terminates properly), then re-run linting
to confirm Ruff W292 is resolved.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e9c85d10-6308-47da-b4f1-0f26cc1d2d5c
📒 Files selected for processing (1)
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
|
Can't reproduce the describe bug using Anthropic API. I tested this end-to-end on Results:
The main branch computes
|
Summary
Fixes a bug where using
client.messages.stream()with base64 images causesthe OTel span's
gen_ai.usage.input_tokensattribute to be significantlyinflated compared to what the Anthropic API actually reports.
Root Cause
In
streaming.py,_set_token_usage()computed input tokens as:where
prompt_tokenswas alreadyusage["input_tokens"]from the API — thetotal billed input count that already includes image tokens and all cached
token sub-components. Adding the cache sub-fields again caused double-counting.
The same arithmetic flaw also exists in the sync and async
_set_token_usage()functions in
__init__.py, which this PR also fixes for consistency.Fix Approach
Use
usage["input_tokens"]directly asinput_tokens. Thecache_read_input_tokensand
cache_creation_input_tokensfields are still recorded as their own dedicatedspan attributes (for cache observability), but are no longer added into the
GEN_AI_USAGE_INPUT_TOKENStotal.Before / After Behaviour
usage.input_tokenscreate()stream()Testing Done
test_anthropic_streaming_base64_image_token_count_legacythat sends a 500×500 base64 PNG image via
stream=Trueand asserts that the OTelspan's
input_tokensmatches the API-reported value (343), not the inflated value (1,633).Files Changed
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.pypackages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.pypackages/opentelemetry-instrumentation-anthropic/tests/test_messages.py(new test)packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_messages/test_anthropic_streaming_base64_image_token_count_legacy.yaml(new cassette)Fixes #3949
Summary by CodeRabbit
Bug Fixes
Tests