Skip to content

fix(chromadb): emit one result event per document across all queries#4105

Merged
dvirski merged 2 commits into
mainfrom
dr/fix(chromadb)-emit-one-result-event-per-document-across-all-queries-
May 17, 2026
Merged

fix(chromadb): emit one result event per document across all queries#4105
dvirski merged 2 commits into
mainfrom
dr/fix(chromadb)-emit-one-result-event-per-document-across-all-queries-

Conversation

@dvirski
Copy link
Copy Markdown
Contributor

@dvirski dvirski commented May 11, 2026

Related to #1870

problem:
ChromaDB's query() returns results as a list-of-lists (one inner list per query embedding). The instrumentation only unzipped the outer list, producing one db.query.result
event per query instead of one per result document. A second bug also indexed into each attribute value with [0], so string IDs were silently truncated to their first
character.

fix:
Added an inner loop over each query's result list so every document gets its own span event. Removed the erroneous [0] indexing so attribute values are used as-is.

Summary by CodeRabbit

  • Bug Fixes

    • Query telemetry now emits one event per returned document (N×K for multi-query), with event attributes included only when present and metadata serialized when it's a dictionary.
  • Tests

    • Added and updated tests to assert per-result event counts and validate each event's attributes for single and multi-embedding queries.
  • Documentation

    • Clarified the nested query-result structure and the resulting per-result event behavior.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 11, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 895b5342-0f53-405c-b98d-c038494b5aca

📥 Commits

Reviewing files that changed from the base of the PR and between 6ac4eda and 943e427.

📒 Files selected for processing (3)
  • packages/opentelemetry-instrumentation-chromadb/opentelemetry/instrumentation/chromadb/wrapper.py
  • packages/opentelemetry-instrumentation-chromadb/tests/test_query.py
  • packages/opentelemetry-instrumentation-chromadb/tests/test_query_results.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/opentelemetry-instrumentation-chromadb/tests/test_query.py
  • packages/opentelemetry-instrumentation-chromadb/tests/test_query_results.py
  • packages/opentelemetry-instrumentation-chromadb/opentelemetry/instrumentation/chromadb/wrapper.py

📝 Walkthrough

Walkthrough

This PR refactors ChromaDB query result event emission to respect the library's nested structure: multiple queries each returning multiple results. The _add_query_result_events function now emits one db.query.result event per result document using nested iteration, and tests are updated/added to validate the N×K event counts and payloads.

Changes

Query Result Events Refactoring

Layer / File(s) Summary
Query Result Events Implementation
packages/opentelemetry-instrumentation-chromadb/opentelemetry/instrumentation/chromadb/wrapper.py
_add_query_result_events is refactored to iterate queries then result items (nested zip_longest), emitting one db.query.result event per document. Event attributes are conditionally populated for non-None values; metadata is JSON-serialized only when it is a dict. A docstring documents the nested structure and resulting N×K event count.
Existing Test Assertion Update
packages/opentelemetry-instrumentation-chromadb/tests/test_query.py
test_chroma_query now asserts two DB_QUERY_RESULT events when n_results=2, matching the new one-event-per-result behavior.
Comprehensive Query Result Tests
packages/opentelemetry-instrumentation-chromadb/tests/test_query_results.py
New test module with a collection fixture and three test functions validating the new event emission: single query with 2 results produces 2 events; two queries with 2 results each produces 4 events; each event includes required attributes and correct IDs.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • galzilber
  • nina-kollman
  • max-deygin-traceloop
  • netanel-tl
  • doronkopit5

Poem

🐰 I hop through nested lists so neat,
One tiny event for each result I meet.
Metadata serialized when it's a dict,
Each chunk recorded, no piece is skipped.
Telemetry sings as the spans all click.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: fixing ChromaDB instrumentation to emit one result event per document across all queries, which directly matches the core functionality change in wrapper.py and test updates.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dr/fix(chromadb)-emit-one-result-event-per-document-across-all-queries-

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 11, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-chromadb/tests/test_query_results.py (1)

54-76: ⚡ Quick win

Use multi-character IDs so this test actually exercises the [0] truncation fix.

This PR fixes two bugs: (1) event count and (2) the previous [0] indexing that truncated string IDs to a single character. With single-character IDs ("1", "2"), "1"[0] == "1", so the id-equality assertion below would still pass against the old buggy code — i.e., this test only protects against the count regression, not the truncation regression. Switching to multi-character IDs makes the assertion meaningfully cover the truncation fix.

♻️ Proposed change
 def test_chromadb_query_result_events_contain_correct_data(exporter, collection):
     """Each result event should contain id, distance, document and metadata."""
     collection.add(
-        ids=["1", "2"],
+        ids=["doc-id-1", "doc-id-2"],
         documents=["doc one", "doc two"],
         metadatas=[{"source": "fileA"}, {"source": "fileB"}],
         embeddings=[[1.0, 0.0], [0.0, 1.0]],
     )
     collection.query(query_embeddings=[[1.0, 0.0]], n_results=2)

     spans = exporter.get_finished_spans()
     query_span = next(s for s in spans if s.name == "chroma.query")
     result_events = [e for e in query_span.events if e.name == "db.query.result"]

     assert len(result_events) == 2
     for event in result_events:
         assert "db.query.result.id" in event.attributes
         assert "db.query.result.distance" in event.attributes
         assert "db.query.result.document" in event.attributes
         assert "db.query.result.metadata" in event.attributes

     ids_recorded = {e.attributes["db.query.result.id"] for e in result_events}
-    assert ids_recorded == {"1", "2"}
+    assert ids_recorded == {"doc-id-1", "doc-id-2"}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opentelemetry-instrumentation-chromadb/tests/test_query_results.py`
around lines 54 - 76, The test
test_chromadb_query_result_events_contain_correct_data uses single-character IDs
("1","2") which doesn't exercise the previous string-truncation bug; update the
ids passed to collection.add (in the collection.add call within that test) to
multi-character IDs (e.g., "id1", "id2" or "10", "20") so the assertion that
ids_recorded == {"id1", "id2"} actually verifies the fix for the [0] truncation
as well as the event count.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/opentelemetry-instrumentation-chromadb/tests/test_query_results.py`:
- Around line 54-76: The test
test_chromadb_query_result_events_contain_correct_data uses single-character IDs
("1","2") which doesn't exercise the previous string-truncation bug; update the
ids passed to collection.add (in the collection.add call within that test) to
multi-character IDs (e.g., "id1", "id2" or "10", "20") so the assertion that
ids_recorded == {"id1", "id2"} actually verifies the fix for the [0] truncation
as well as the event count.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3a5327b2-f1f8-4739-bfe9-d3cbe25a4a7c

📥 Commits

Reviewing files that changed from the base of the PR and between 6d3e696 and a08a54c.

⛔ Files ignored due to path filters (1)
  • packages/opentelemetry-instrumentation-chromadb/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • packages/opentelemetry-instrumentation-chromadb/opentelemetry/instrumentation/chromadb/wrapper.py
  • packages/opentelemetry-instrumentation-chromadb/tests/test_query.py
  • packages/opentelemetry-instrumentation-chromadb/tests/test_query_results.py

@doronkopit5
Copy link
Copy Markdown
Member

PLease take this small comment and add

packages/opentelemetry-instrumentation-chromadb/tests/test_query_results.py (1)> 54-76: ⚡ Quick win

Use multi-character IDs so this test actually exercises the [0] truncation fix.
This PR fixes two bugs: (1) event count and (2) the previous [0] indexing that truncated string IDs to a single character. With single-character IDs ("1", "2"), "1"[0] == "1", so the id-equality assertion below would still pass against the old buggy code — i.e., this test only protects against the count regression, not the truncation regression. Switching to multi-character IDs makes the assertion meaningfully cover the truncation fix.

@dvirski dvirski force-pushed the dr/fix(chromadb)-emit-one-result-event-per-document-across-all-queries- branch from a08a54c to 6ac4eda Compare May 17, 2026 12:28
@dvirski
Copy link
Copy Markdown
Contributor Author

dvirski commented May 17, 2026

@doronkopit5

Updated the third test in test_query_results.py to use multi-character IDs ("doc-id-aaa", "doc-id-bbb") instead of single-character ones ("1", "2").

Why: the PR fixed an ids[0] truncation bug, but single-character IDs hid it — "1"[0] == "1" so the assertion passed even against the broken code. Multi-character IDs make ids[0] collapse to just "d", so the test now actually fails under the old bug and genuinely covers the truncation fix (not
just the event-count fix).

@dvirski dvirski force-pushed the dr/fix(chromadb)-emit-one-result-event-per-document-across-all-queries- branch from 6ac4eda to 943e427 Compare May 17, 2026 18:08
@dvirski dvirski merged commit 12bdd62 into main May 17, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants