Merged
Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨No code suggestions found for the PR. |
zhoujh01
approved these changes
Apr 28, 2026
r266-tech
added a commit
to r266-tech/OpenViking
that referenced
this pull request
Apr 29, 2026
volcengine#1770 refactored the hardcoded `SCORE_PROPAGATION_ALPHA = 0.5` constant in `openviking/retrieve/hierarchical_retriever.py` into a `RetrievalConfig` field but set the new field default to 0.0. With the existing formula final_score = alpha * embedding_score + (1 - alpha) * parent_score a default of 0.0 silently drops every child's own embedding score and keeps only the propagated parent score, changing default ranking behavior for any caller that does not provide an explicit `retrieval` block. The same PR's README, docs/{en,zh}/concepts/07-retrieval.md, docs/{en,zh}/guides/01-configuration.md and examples/ov.conf.example all state the default is `0.5` and explicitly note that `0.5 keeps the existing equal blend`. Restoring the field default to `0.5` aligns the shipped code with the documented and pre-refactor behavior.
5 tasks
A0nameless0man
pushed a commit
to A0nameless0man/OpenViking
that referenced
this pull request
Apr 30, 2026
* feat(retrieval): configure hotness score blending * feat(retrieval): configure score propagation alpha * test(retrieval): trim redundant propagation coverage * feat(embedding): centralize token estimation * fix(embedding): use shared token estimator * fix(embedding): narrow token truncation scope
sponge225
added a commit
to sponge225/OpenViking
that referenced
this pull request
May 6, 2026
PR volcengine#1770 changes: - RetrievalConfig: configurable hotness_alpha(0.0) and score_propagation_alpha(0.5) - hierarchical_retriever: skip hotness calc when alpha=0 - viking_fs/core: pass retrieval_config through pipeline - embedding_utils: token-based truncation (max_input_tokens=4096) - embedding_config: text_source default to content_only - open_viking_config: add retrieval field Other changes: - benchmark/RAG: nanobot runner, L0/L1 analysis script, pipeline updates - bot/vikingbot: context and tool factory updates
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
背景
这个 PR 主要收敛两类配置问题:
1。变更
retrieval.hotness_alpha,默认0.0,默认关闭 hotness boost,让最终分数回到纯语义相似度。retrieval.score_propagation_alpha,默认0.5,用于配置层级检索里子节点分数和父节点传播分数的混合比例。embedding.text_source默认改为content_only,支持content_only/summary_first/summary_only。embedding.max_input_tokens,默认4096,用于限制原文内容送入 embedding 前的最大估算 token 数。openviking/utils/embedding_utils.py内部做局部估算,不新增公共 token 工具,也不改 Message/Session/Rerank/Bot/Markdown 的既有 token 估算逻辑。tiktoken或 provider-specific tokenizer;这里的 token 是当前 embedding 输入截断场景下的启发式估算。examples/ov.conf.example。兼容性说明
hotness_alpha=0.0会关闭默认 hotness 混分,查询分数更接近向量相似度。score_propagation_alpha=0.5保持原来的父子分数 50/50 混合行为。embedding.text_source默认从摘要优先变为原文优先,这是为了避免普通文本资源写入后向量化内容和用户原文不一致。max_input_tokens是估算 token,不承诺等同于具体模型 tokenizer 的精确 token 数。测试
.venv/bin/python -m pytest tests/unit/test_vectorize_file_strategy.py tests/unit/test_embedding_vectorize_strategy.py tests/retrieve/test_hierarchical_retriever_rerank.py::test_score_propagation_alpha_uses_configured_weight tests/retrieve/test_hierarchical_retriever_rerank.py::test_retrieval_hotness_alpha_blends_when_configured tests/retrieve/test_hierarchical_retriever_rerank.py::test_default_retrieval_config_uses_semantic_score_without_hotness tests/test_config_loader.py::test_openviking_config_retrieval_hotness_alpha_defaults_to_zero tests/test_config_loader.py::test_openviking_config_retrieval_alpha_validates_range:16 passed, 4 warnings.venv/bin/python -m ruff check openviking/utils/embedding_utils.py openviking_cli/utils/config/embedding_config.py openviking_cli/utils/config/retrieval_config.py openviking/retrieve/hierarchical_retriever.py tests/unit/test_vectorize_file_strategy.py tests/unit/test_embedding_vectorize_strategy.py tests/retrieve/test_hierarchical_retriever_rerank.py tests/test_config_loader.py:passedgit diff --check origin/main..HEAD:passedjq empty examples/ov.conf.example:passed