feat(ovpack): support vector snapshots and consistency checks#1965
Merged
Conversation
PR Reviewer Guide 🔍(Review updated until commit 79a39cf)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨No code suggestions found for the PR. |
MaojiaSheng
approved these changes
May 11, 2026
Collaborator
|
ov consistency 不建议作为一级命令 |
|
Persistent review updated to latest commit 79a39cf |
PR Code Suggestions ✨No code suggestions found for the PR. |
ZaynJarvis
pushed a commit
that referenced
this pull request
May 13, 2026
* feat(ovpack): add vector snapshot backup support * feat(ovpack): add vector snapshots and consistency checks * fix(ovpack): validate reserved paths and index expectations * fix(ovpack): limit consistency report output * fix(ovpack): isolate archive content namespace * chore(ovpack): move consistency cli under system
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
本 PR 是 OVPack v2 的完整后续增强,目标是把 OVPack 收敛为 OpenViking 的标准内容迁移/备份格式:包结构可校验、导入恢复可预测、路径语义无歧义,并支持在纯 dense 场景下选择性迁移向量快照。
整体改动分为三条主线:
export/import保留包根并要求同 scope 导入;全量迁移使用backup/restore,backup 包只能走 restore,不允许被普通 import 导入任意父目录。include_vectors=true时可导出 pure-dense 快照。导入通过vector_mode=auto|recompute|require决定恢复快照或重新向量化。当前只支持纯 dense index,底层VectorIndex.IndexType为 hybrid 时拒绝导出向量快照。同时,本 PR 将 OVPack 包内物理结构调整为命名空间隔离:用户内容原样放在
<root>/files/下,OVPack 内部 manifest、index records 和可选 dense 向量快照放在<root>/_ovpack/下。这样不再需要_._点文件转义,也不会因为用户文件名如.notes.txt、_._notes.txt、.ovpack/foo.txt和内部文件路径发生碰撞。这次 diff 看起来较大,主要是因为把原来集中在
local_fs.py的 OVPack 逻辑拆成独立模块;对现有主流程的修改比较集中,review 时可以按下面的影响面看:openviking/storage/local_fs.py,新增openviking/storage/ovpack/。这里是最大头,主要把 OVPack 的格式处理、manifest、导入策略、校验、index record、dense snapshot 和导入导出编排拆开,降低单文件复杂度。include_vectors/vector_mode,并把调用入口切到storage.ovpack.operations;核心存储行为没有在其他模块里扩散修改。openviking/storage/index_consistency.py是独立的数据一致性检查能力,不依赖 OVPack 包格式,只复用系统索引规则检查“应该存在的 index record 是否存在”。ov system consistency <uri>命令,主要是参数透传、请求结构和输出处理。建议 review 顺序:先看
storage/ovpack/operations.py理解整体流程,再看policy.py/validation.py/vectors.py,最后看index_consistency.py、SDK/CLI 接线和文档测试。Related Issue
基于已合入 PR #1927 的后续增量。
Type of Change
Changes Made
openviking/storage/local_fs.py,将格式、manifest、index、policy、validation、vector snapshot 和 operations 分到openviking/storage/ovpack/,避免所有 OVPack 逻辑继续堆在一个 local_fs 文件里。size/sha256、整体content_sha256、内部 index 文件和可选 dense 文件的 hash/数量/维度;导入前完整校验 ZIP 成员、manifest、目录/文件集合、hash、scope、冲突策略,通过后才写入目标环境。<root>/files/,内部元数据统一放在<root>/_ovpack/,manifest 位于<root>/_ovpack/manifest.json,index records 位于<root>/_ovpack/index_records.jsonl,可选 dense 快照位于<root>/_ovpack/dense.f32。_._点文件转义规则和用户路径保留名限制:.abstract.md、.overview.md、.meta.json、.notes.txt、_._notes.txt、.ovpack/foo.txt等用户内容都按原相对路径保存在files/下,导入时原样恢复。resources/user/agent/session公开 scope,只能通过 restore 恢复,不能通过普通 import 导入任意父目录。index_records.jsonl保存可迁移标量和 record 元信息,dense.f32保存连续 little-endian float32 dense 向量;默认不导出向量。include_vectors=true时只读取底层VectorIndex.IndexType,如果包含hybrid直接拒绝;不再通过embedding.hybrid、embedding.sparse、sparse_weight或EnableSparse推断。vector_mode=auto|recompute|require。auto在包内 dense 快照存在、embedding 元数据兼容且目标 index 不是 hybrid 时恢复,否则重新向量化;require遇到缺快照、不兼容或 hybrid index 直接报错;recompute始终重算。POST /api/v1/system/consistency、Python SDKcheck_consistency(uri)、Rust CLIov system consistency <uri>。一致性只检查“按系统规则应该有索引记录的内容是否存在 index record”,不再额外检查 missing vector。.abstract.md存在时才要求 L0 index record,只有.overview.md存在时要求 L1,避免 overview-only 目录误报缺 L0。id、uri、account_id、owner 字段、created_at、updated_at、active_count不从包内恢复;即使恢复 dense 快照,也会按目标 URI、账号和当前时间重新生成。export_ovpack、backup_ovpack、import_ovpack、restore_ovpack和 consistency API。预期行为示例:
新 ZIP 布局示例:
Testing
已执行:
本分支此前也执行过:
结果:OVPack 核心 pytest
16 passed;client 恶意 ZIP 用例5 passed;ruff 通过;git diff --check通过;cargo check -p ov_cli通过但仓库仍有既有 Rust warning,本 PR 未处理这些既有 warning。Checklist
Screenshots (if applicable)
不适用。
Additional Notes
当前版本不支持 sparse/hybrid 向量快照。判断口径只看底层
VectorIndex.IndexType:包含hybrid就拒绝导出向量快照;导入时auto重新向量化,require报错。这样避免把 dense-only 快照恢复到 hybrid index 后产生半迁移状态。OVPack 尚未正式发布,因此本 PR 不为开发过程中的历史草稿包增加 fallback;当前导入只接受 current OVPack v2 manifest 和新 ZIP 布局。无 manifest 旧包仍按文档策略拒绝。