fix(chromadb): emit one result event per document across all queries#4105
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughThis PR refactors ChromaDB query result event emission to respect the library's nested structure: multiple queries each returning multiple results. The ChangesQuery Result Events Refactoring
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-chromadb/tests/test_query_results.py (1)
54-76: ⚡ Quick winUse multi-character IDs so this test actually exercises the
[0]truncation fix.This PR fixes two bugs: (1) event count and (2) the previous
[0]indexing that truncated string IDs to a single character. With single-character IDs ("1","2"),"1"[0] == "1", so the id-equality assertion below would still pass against the old buggy code — i.e., this test only protects against the count regression, not the truncation regression. Switching to multi-character IDs makes the assertion meaningfully cover the truncation fix.♻️ Proposed change
def test_chromadb_query_result_events_contain_correct_data(exporter, collection): """Each result event should contain id, distance, document and metadata.""" collection.add( - ids=["1", "2"], + ids=["doc-id-1", "doc-id-2"], documents=["doc one", "doc two"], metadatas=[{"source": "fileA"}, {"source": "fileB"}], embeddings=[[1.0, 0.0], [0.0, 1.0]], ) collection.query(query_embeddings=[[1.0, 0.0]], n_results=2) spans = exporter.get_finished_spans() query_span = next(s for s in spans if s.name == "chroma.query") result_events = [e for e in query_span.events if e.name == "db.query.result"] assert len(result_events) == 2 for event in result_events: assert "db.query.result.id" in event.attributes assert "db.query.result.distance" in event.attributes assert "db.query.result.document" in event.attributes assert "db.query.result.metadata" in event.attributes ids_recorded = {e.attributes["db.query.result.id"] for e in result_events} - assert ids_recorded == {"1", "2"} + assert ids_recorded == {"doc-id-1", "doc-id-2"}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opentelemetry-instrumentation-chromadb/tests/test_query_results.py` around lines 54 - 76, The test test_chromadb_query_result_events_contain_correct_data uses single-character IDs ("1","2") which doesn't exercise the previous string-truncation bug; update the ids passed to collection.add (in the collection.add call within that test) to multi-character IDs (e.g., "id1", "id2" or "10", "20") so the assertion that ids_recorded == {"id1", "id2"} actually verifies the fix for the [0] truncation as well as the event count.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@packages/opentelemetry-instrumentation-chromadb/tests/test_query_results.py`:
- Around line 54-76: The test
test_chromadb_query_result_events_contain_correct_data uses single-character IDs
("1","2") which doesn't exercise the previous string-truncation bug; update the
ids passed to collection.add (in the collection.add call within that test) to
multi-character IDs (e.g., "id1", "id2" or "10", "20") so the assertion that
ids_recorded == {"id1", "id2"} actually verifies the fix for the [0] truncation
as well as the event count.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3a5327b2-f1f8-4739-bfe9-d3cbe25a4a7c
⛔ Files ignored due to path filters (1)
packages/opentelemetry-instrumentation-chromadb/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
packages/opentelemetry-instrumentation-chromadb/opentelemetry/instrumentation/chromadb/wrapper.pypackages/opentelemetry-instrumentation-chromadb/tests/test_query.pypackages/opentelemetry-instrumentation-chromadb/tests/test_query_results.py
|
PLease take this small comment and add
|
a08a54c to
6ac4eda
Compare
|
Updated the third test in test_query_results.py to use multi-character IDs ("doc-id-aaa", "doc-id-bbb") instead of single-character ones ("1", "2"). Why: the PR fixed an ids[0] truncation bug, but single-character IDs hid it — "1"[0] == "1" so the assertion passed even against the broken code. Multi-character IDs make ids[0] collapse to just "d", so the test now actually fails under the old bug and genuinely covers the truncation fix (not |
6ac4eda to
943e427
Compare
Related to #1870
problem:
ChromaDB's query() returns results as a list-of-lists (one inner list per query embedding). The instrumentation only unzipped the outer list, producing one db.query.result
event per query instead of one per result document. A second bug also indexed into each attribute value with [0], so string IDs were silently truncated to their first
character.
fix:
Added an inner loop over each query's result list so every document gets its own span event. Removed the erroneous [0] indexing so attribute values are used as-is.
Summary by CodeRabbit
Bug Fixes
Tests
Documentation