feat(storage): optimize tree func#2372
Conversation
性能测试(tree)前置导入 1000 个资源,用于准备 tree 测试数据: python3 ./OpenViking/load_test_add_resource_1000.py测试命令python3 perf/s3/ls/load_test_tree_100.py \
--account-id tenant_d9f3c5b17f \
--user-id user_b27e46a1cb \
--api-key 'dGVuY...ZA' \
--uri viking://resources \
--iterations 100结果汇总(100 次)
结论在相同结果规模( 原始输出优化前优化后 |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨No code suggestions found for the PR. |
There was a problem hiding this comment.
感谢把 tree 遍历能力下沉到 ragfs。方向是合理的:tree traversal 本质上是 filesystem capability,S3FS 也确实应该有 backend-specific 的 flat listing 优化。
这次我先 request changes,主要有两个合并前需要处理的问题:
Blocking:
- S3 tree 仍然会在 Python 应用可见节点
node_limit之前 materialize 整个 prefix;对于大 S3 目录,即使默认 limit=1000,也可能全量拉取、全量建树、全量排序,仍然有超时或内存风险。 VikingFS.tree()现在硬依赖 wrapped AGFS client 暴露tree_directory();这会破坏只实现旧ls()surface 的现有 client/wrapper。
Non-blocking 结构建议:
- merge 前建议 squash 掉 fixup/format/remove-test 这类过程性 commit。
- S3 tree builder 和 Python tree 测试都比较大;建议把 S3 tree builder 拆成独立模块,并抽出 Python 测试里的重复 fixture/setup,这样后续维护和 review 会轻一些。
qin-ctx
left a comment
There was a problem hiding this comment.
I re-reviewed the latest commit. Most earlier points are addressed: the HTTP/native/async tree_directory surface is filled in, the S3 tree builder moved into tree.rs, and the Python tests now use shared helpers.
I still need to request changes on the S3 optimized path. The current over-fetch limit is applied after list_tree_objects() has already fetched every S3 page, so the large-prefix performance/memory risk remains.
Non-blocking: please also squash the process commits (format code, fix check problem, repeated fix code review issue) before merge.
| /// List all objects under a prefix (flat listing, no delimiter). | ||
| /// Preserves directory marker objects (keys ending with '/'). | ||
| /// Used by tree_directory for efficient flat traversal. | ||
| pub async fn list_tree_objects(&self, prefix: &str) -> Result<Vec<ObjectMeta>> { |
There was a problem hiding this comment.
[Design] (blocking) The current bounded-overfetch fix does not actually bound S3 listing. Python now passes node_limit * _TREE_OVERFETCH_FACTOR down to Rust, and S3FS receives that value, but tree_directory() calls self.client.list_tree_objects(&prefix).await? before applying node_limit. list_tree_objects() then keeps following next_continuation_token until is_truncated is false, so a request such as tree(..., node_limit=1000) can still materialize every object under a multi-million-key prefix before Rust truncates the rebuilt tree.
This keeps the main timeout/memory risk in the optimized S3 path. Please push the bound into the S3 listing layer, or make this a paged/streaming iterator, so ACL filtering can request more pages only when it has fewer than the requested visible nodes instead of loading the whole prefix upfront.
There was a problem hiding this comment.
trait 暴露分页/流式接口,Python 逐页拉取边过滤,精确且有界。但 PyO3 无法把 Rust async stream 直接变成 Python async generator,必须走 continuation-token 分页 ,因此每一层都要改。 该改动会影响到其他操作。因此本 PR 期望提交内容尽可能的收敛, 后续通过独立 PR 使 RAGFS 层整体支持流式能力。
Description
Push the
treerecursive traversal capability down from the Python layer intoragfs(Rust). A default implementation equivalent to Python's_tree_originalis provided incore/filesystem.rs, ands3fsgets a high-performance override based on flat listing. The PythonVikingFSlayer now only handles view shaping, ACL, and agent enrichment, eliminating the duplicated DFS between_tree_original/_tree_agent. Zero semantic regression, with a major performance gain for wide S3 directories.Related Issue
Type of Change
Changes Made
TreeEntry(core/types.rs); addtree_directory()/tree_directory_internal()default recursive implementation to theFileSystemtrait (core/filesystem.rs), preservingnode_limit/level_limit/show_hidden/rel_path/DFS-order semantics identical to the original Python logic.tree_directory()routing and rewrite plugin-internal paths back to global AGFS absolute paths (core/mountable.rs).tree_directory()proxy andFsOperation::TreeDir(core/stats_wrapper.rs,core/stats.rs), fixing the wrapper not delegating, which previously made the S3FS optimization ineffective.tree_directory()and addbuild_tree_entries_from_flat_listing(), replacing per-directory recursiveread_dirwith a single/few paginated flat listings;client.rsaddslist_tree_objects()that retains directory markers (plugins/s3fs/mod.rs,plugins/s3fs/client.rs).tree_directory()andTreeEntry → PyDictserialization (ragfs-python/src/lib.rs).async_client.pyadds an asynctree_directory();viking_fs.pyrefactors_tree_original/_tree_agentinto adapters/enrichment built on a shared_iter_visible_tree_entries(), adds_is_tree_entry_visible()/_ancestor_is_filtered(), and keeps the "applynode_limitafter ACL filtering" semantics.Testing
Added/passing test coverage: Rust Core
tree_directorydefault implementation (RUST-TREE-001018), S3FS override (RUST-S3-001012), MountableFS routing (RUST-MNT-001~004); Python_is_tree_entry_visible/_iter_visible_tree_entries/_tree_original/_tree_agent(PY-FLT/ITER/ORIG/AGENT, 31 cases,tests/storage/test_viking_fs_tree.py). 65 unit tests pass in total. Performance regression: at the same result scale (1000 nodes), tree dropped from ~31615ms/call to ~10726ms/call (~2.95x), p99 from 56835ms to 13035ms (~4.36x), and stability improved from 99/100 to 100/100.Checklist
Screenshots (if applicable)
Additional Notes