Skip to content

fix(retrieval): allow find without rerank and preserve level-2 rerank scores#754

Merged
MaojiaSheng merged 3 commits intovolcengine:mainfrom
mildred522:fix/find-without-rerank
Mar 19, 2026
Merged

fix(retrieval): allow find without rerank and preserve level-2 rerank scores#754
MaojiaSheng merged 3 commits intovolcengine:mainfrom
mildred522:fix/find-without-rerank

Conversation

@mildred522
Copy link
Contributor

Description

This PR fixes two retrieval issues in the find() / HierarchicalRetriever path:

  1. VikingFS.find() incorrectly required rerank_config, even though the documentation defines find() as basic
    semantic/vector retrieval and rerank is optional.
  2. HierarchicalRetriever did not preserve rerank scores for level-2 global hits used as initial candidates, so
    direct file hits from global retrieval could bypass rerank ordering in THINKING mode.

These changes keep find() usable without rerank configuration and make rerank behavior more consistent for level-2
retrieval results.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

  • Remove the hard failure in openviking/storage/viking_fs.py so find() works when rerank_config is not
    configured.
  • Update openviking/retrieve/hierarchical_retriever.py to rerank level-2 global hits before they are used as initial
    candidates, and avoid sending those same level-2 hits through starting-point rerank logic twice.
  • Add regression coverage for both cases:
    • tests/misc/test_vikingfs_find_without_rerank.py
    • tests/retrieve/test_hierarchical_retriever_rerank.py

Testing

I added targeted regression tests for the two fixed behaviors.

Relevant local verification previously completed on this branch:

  • py -3.11 -m pytest tests\\misc\\test_vikingfs_find_without_rerank.py tests\\retrieve\ \test_hierarchical_retriever_rerank.py tests\\client\\test_search.py tests\\server\\test_api_search.py -q
  • py -3.11 -m pytest tests\\integration\\test_http_integration.py -q
  • py -3.11 -m ruff check openviking\\storage\\viking_fs.py openviking\\retrieve\\hierarchical_retriever.py tests\ \misc\\test_vikingfs_find_without_rerank.py tests\\retrieve\\test_hierarchical_retriever_rerank.py

Platform verified:

  • Windows
  • Linux
  • macOS

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes

This PR is intentionally scoped to retrieval fixes only.

Behavioral intent:

  • find() remains a basic semantic retrieval API and should not require rerank configuration.
  • Rerank remains optional, but when rerank is enabled in THINKING mode, level-2 global hits should keep their reranked
    scores consistently.

@MaojiaSheng MaojiaSheng merged commit 82307c3 into volcengine:main Mar 19, 2026
6 checks passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in OpenViking project Mar 19, 2026
chethanuk added a commit to chethanuk/OpenViking that referenced this pull request Mar 19, 2026
- Add .pr_agent.toml with 15 repo-specific review rules derived from real
  bug history (PRs volcengine#505, volcengine#728, volcengine#749, volcengine#740/volcengine#745, volcengine#754, volcengine#735, volcengine#767)
- Rules structured as WHEN/THEN/BECAUSE for deterministic enforcement
- Add 8 custom labels (memory-pipeline, async-change, api-breaking, etc.)
- Add ignore patterns for lock files, third_party, build artifacts
- Enable score review, TODO scan, split-PR detection, security audit
- Configure improve tool with quality threshold and extended mode
- Configure describe tool with PR diagrams and semantic file types
- Update workflow: ark-code-latest model, checkout step for .pr_agent.toml,
  move all config from inline YAML to .pr_agent.toml (single source of truth)
qin-ctx pushed a commit that referenced this pull request Mar 19, 2026
…#780)

- Add .pr_agent.toml with 15 repo-specific review rules derived from real
  bug history (PRs #505, #728, #749, #740/#745, #754, #735, #767)
- Rules structured as WHEN/THEN/BECAUSE for deterministic enforcement
- Add 8 custom labels (memory-pipeline, async-change, api-breaking, etc.)
- Add ignore patterns for lock files, third_party, build artifacts
- Enable score review, TODO scan, split-PR detection, security audit
- Configure improve tool with quality threshold and extended mode
- Configure describe tool with PR diagrams and semantic file types
- Update workflow: ark-code-latest model, checkout step for .pr_agent.toml,
  move all config from inline YAML to .pr_agent.toml (single source of truth)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants