Skip to content

Feat/fs count api#1989

Merged
fengluodb merged 3 commits into
volcengine:mainfrom
fengluodb:feat/fs-count-api
May 12, 2026
Merged

Feat/fs count api#1989
fengluodb merged 3 commits into
volcengine:mainfrom
fengluodb:feat/fs-count-api

Conversation

@fengluodb
Copy link
Copy Markdown
Collaborator

增加 count 接口,可以用于统计文件数量

dingben.db@bytedance.com added 2 commits May 12, 2026 15:38
Adds a dedicated `count` endpoint that returns the exact number of files
and sub-directories under a directory by traversing the filesystem,
distinct from `stat`'s vector-index-based estimate. Wired through
VikingFS, FSService, HTTP router and sync/async/local SDK clients.
Wires the new fs.count HTTP endpoint into the Rust CLI. Adds
`-r/--recursive` and `-a/--all` flags. Documentation updated with
CLI usage examples.
@github-actions
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🏅 Score: 80
🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Inconsistent Hidden Directory Handling

The show_all_hidden parameter only skips hidden files, not hidden directories. Hidden directories are counted and recursed into even when show_all_hidden=False, which is inconsistent with the parameter's intended behavior.

if not is_dir and name.startswith(".") and not show_all_hidden:
    continue
if is_dir:
    dirs += 1
    if recursive:
        _count_entries(f"{current_path}/{name}")
else:
    files += 1
Silent Exception Swallowing

The _count_entries helper has a bare except Exception: that swallows all errors when listing directory entries without logging, hiding potential issues.

except Exception:
    return
Possible Blocking Calls in Async Context

The async count method calls self.agfs.stat() and self._ls_entries() without await. If these are blocking operations, they will starve the event loop. Verify that these calls use async variants or are properly offloaded.

    info = self.agfs.stat(path)
except Exception as exc:
    mapped = map_exception(exc, resource=uri)
    if mapped is not None:
        raise mapped from exc
    raise NotFoundError(uri, "directory")

if not info.get("isDir", info.get("is_dir")):
    raise FailedPreconditionError(
        f"{uri} is not a directory",
        details={"resource": uri, "expected": "directory"},
    )

files = 0
dirs = 0

def _count_entries(current_path: str) -> None:
    nonlocal files, dirs
    try:
        entries = self._ls_entries(current_path)

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Skip hidden directories when show_all_hidden is False

The current code only skips hidden files when show_all_hidden is False, but still
counts hidden directories. Update the condition to skip both hidden files and
directories consistently when show_all_hidden is False.

openviking/storage/viking_fs.py [1271-1272]

-if not is_dir and name.startswith(".") and not show_all_hidden:
+if name.startswith(".") and not show_all_hidden:
     continue
Suggestion importance[1-10]: 8

__

Why: The current code only skips hidden files when show_all_hidden is False, but still counts hidden directories. This is a functional bug that causes inconsistent behavior, and the fix addresses it by skipping all hidden entries (files and directories) when show_all_hidden is False.

Medium
Fix incorrect fallback to NotFoundError

The current exception handling incorrectly converts any unmapped exception from
self.agfs.stat into NotFoundError. Remove the fallback NotFoundError raise, since
map_exception should already handle not-found errors, and re-raise the original
exception for other cases.

openviking/storage/viking_fs.py [1240-1246]

 try:
     info = self.agfs.stat(path)
 except Exception as exc:
     mapped = map_exception(exc, resource=uri)
     if mapped is not None:
         raise mapped from exc
-    raise NotFoundError(uri, "directory")
+    raise
Suggestion importance[1-10]: 7

__

Why: The current code incorrectly converts any unmapped exception from self.agfs.stat into NotFoundError, which could mask other errors like permission issues. The fix removes this improper fallback and re-raises the original exception for unmapped cases.

Medium

@fengluodb fengluodb merged commit 3222b14 into volcengine:main May 12, 2026
12 of 13 checks passed
@github-project-automation github-project-automation Bot moved this from Backlog to Done in OpenViking project May 12, 2026
Copy link
Copy Markdown
Collaborator

@qin-ctx qin-ctx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

本次 review 结论:REQUEST_CHANGES。

[Bug] (blocking) CI 的 lint / lint 失败,ruff checkopenviking/storage/viking_fs.py:15 import block 未排序。这个位置不在 PR diff 的新增行里,无法作为 inline comment 锚定,但合并前应修复:运行 ruff check --fix openviking/storage/viking_fs.py 或手动整理 import。

其余问题已作为 inline comments 标在相关新增代码上。

try:
entries = self._ls_entries(current_path)
except Exception:
return
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bug] (blocking) count() 在递归遍历时吞掉 _ls_entries(current_path) 的所有异常并直接 return,会把失败的子树当作 0 个节点返回成功结果。

这个接口文档承诺的是基于真实文件系统遍历的 exact count;如果某个子目录读取失败、后端临时错误、挂载异常或权限异常,调用方会收到一个看似成功但实际偏小的 files/dirs/total。建议不要吞异常,至少把 current_path 转成 URI 后通过 map_exception() 映射并抛出,让调用方知道计数失败。

entry_uri = self._path_to_uri(f"{current_path}/{name}", ctx=ctx)
if not self._is_accessible(entry_uri, real_ctx):
continue
if not is_dir and name.startswith(".") and not show_all_hidden:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Design] (non-blocking) show_all_hidden 的实现和接口说明不完全一致。路由/文档说 include hidden files/directories,但这里仅在 not is_dir 时过滤隐藏名称,因此 .git.cache 这类隐藏目录默认会被计入,并且 recursive=True 时还会继续统计它们下面的内容。

建议明确语义:如果参数应该像 ls -a 一样控制所有隐藏目录项,就在 name.startswith(".") and not show_all_hidden 时同时跳过文件和目录;如果只想控制隐藏文件,则需要更新接口描述和文档,避免调用方误解。


return all_entries

async def count(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] (non-blocking) 这是一个新的公共 API,但 PR 没有新增回归测试。建议至少覆盖非目录报错、缺失 URI、递归/非递归、隐藏文件/隐藏目录,以及 HTTP 和 embedded SDK 返回结构一致性。

这些用例能锁住 count() 的精确计数语义,避免后续和 stat.count 的 vector-index 估算语义混淆。

qin-ctx pushed a commit that referenced this pull request May 12, 2026
ZaynJarvis pushed a commit that referenced this pull request May 13, 2026
* feat(fs): add count API for directory entry counting

Adds a dedicated `count` endpoint that returns the exact number of files
and sub-directories under a directory by traversing the filesystem,
distinct from `stat`'s vector-index-based estimate. Wired through
VikingFS, FSService, HTTP router and sync/async/local SDK clients.

* feat(cli): add `ov count` command for directory entry counting

Wires the new fs.count HTTP endpoint into the Rust CLI. Adds
`-r/--recursive` and `-a/--all` flags. Documentation updated with
CLI usage examples.

* fix

---------

Co-authored-by: dingben.db@bytedance.com <dingben.db@bytedance.com@bytedance.com>
ZaynJarvis pushed a commit that referenced this pull request May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants