Skip to content

perf(retrieval): parallelize hierarchical child search#2155

Merged
zhoujh01 merged 1 commit into
mainfrom
perf/parallel-hierarchical-child-search
May 21, 2026
Merged

perf(retrieval): parallelize hierarchical child search#2155
zhoujh01 merged 1 commit into
mainfrom
perf/parallel-hierarchical-child-search

Conversation

@qin-ctx
Copy link
Copy Markdown
Collaborator

@qin-ctx qin-ctx commented May 21, 2026

Description

Parallelizes recursive hierarchical child directory searches with bounded per-request fan-out, and makes vector collection count parsing accept multiple aggregate total key names returned by different vector backends.

Related Issue

N/A

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

  • Add bounded batching for recursive child searches in HierarchicalRetriever using asyncio.gather.
  • Preserve per-search telemetry counting while processing each parallel expansion result.
  • Accept _total, __TOTAL__, and __total_count__ aggregate keys when parsing vector collection counts.

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this on the following platforms:
    • Linux
    • macOS
    • Windows

Local checks:

  • git diff --check
  • .venv/bin/ruff check openviking/retrieve/hierarchical_retriever.py openviking/storage/vectordb_adapters/base.py
  • .venv/bin/ruff format --check openviking/retrieve/hierarchical_retriever.py openviking/storage/vectordb_adapters/base.py
  • .venv/bin/python -m pytest tests/retrieve/test_hierarchical_retriever_rerank.py tests/retrieve/test_hierarchical_retriever_target_dirs.py (9 passed, 1 failed: tests/retrieve/test_hierarchical_retriever_rerank.py::test_retrieve_reranks_level_two_initial_candidates_in_thinking_mode; failure is in existing initial-candidate threshold behavior not changed by this PR)

Note: plain python -m pytest ... does not run in this local environment because system Python has incompatible pydantic / pydantic-core versions.

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Screenshots (if applicable)

N/A

Additional Notes

The local pre-commit hook could not run because it invokes a system Python without the pre_commit module installed. I ran the equivalent configured Ruff checks manually from the repository virtual environment before committing.

Batch child directory vector lookups during recursive retrieval to reduce remote fan-out latency, and accept alternate count aggregate total keys from vector stores.
@github-actions
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🏅 Score: 92
🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Parallelize hierarchical child search

Relevant files:

  • openviking/retrieve/hierarchical_retriever.py

Sub-PR theme: Support multiple aggregate total keys for vector collection count

Relevant files:

  • openviking/storage/vectordb_adapters/base.py

⚡ No major issues detected

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

No code suggestions found for the PR.

@zhoujh01 zhoujh01 merged commit ba9df59 into main May 21, 2026
5 checks passed
@zhoujh01 zhoujh01 deleted the perf/parallel-hierarchical-child-search branch May 21, 2026 05:56
@github-project-automation github-project-automation Bot moved this from Backlog to Done in OpenViking project May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants