Skip to content

perf(storage): optimize s3 backend grep performance#2078

Merged
zhoujh01 merged 3 commits into
volcengine:mainfrom
baojun-zhang:tos-perf
May 20, 2026
Merged

perf(storage): optimize s3 backend grep performance#2078
zhoujh01 merged 3 commits into
volcengine:mainfrom
baojun-zhang:tos-perf

Conversation

@baojun-zhang
Copy link
Copy Markdown
Collaborator

@baojun-zhang baojun-zhang commented May 15, 2026

Description

S3FS grep performance optimization: Addresses grep timeout issue with S3 backend in multi-subdirectory scenarios. Through three key optimizations - flat listing instead of recursive read_dir, concurrent chunked read instead of full download, and bypassing directory checks in read() - reduces grep latency from 60s+ to 1-3s for 1000-subdirectory scenarios.

Related Issue

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

  • Only modify crates/ragfs/src/plugins/s3fs/mod.rs, no impact on other plugins' default implementations
  • Entry handling - stat root path + regex compilation + single-file fast path
  • Flat file collection - use list_objects(delimiter=None) to get all files in one go (replaces O(n) recursive read_dir calls)
  • Concurrent chunked grep - 64KB chunked read + cross-chunk line boundary handling + 16MB long line protection + adaptive concurrency (16-100)
  • New helper functions: s3_is_excluded_path, s3_relative_match_file, s3_relative_depth (reproducing filesystem.rs's pub(crate) semantics)
  • Fix: Semantics of exclude_path == "/" to exclude all content (previously root directory exclusion was broken)
  • new unit tests: Covers complex scenarios like cross-chunk line stitching, EOF boundary, binary content, concurrent race conditions

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this on the following platforms:
    • Linux
    • macOS
    • Windows

Performance Test Results (100 iterations):

Scenario Before Optimization After (native grep) After (ripgrep)
1000-subdirectory grep Timeout 60s+ ~3s ~2.4s
Single-file grep ~0.5s ~0.3s -

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Screenshots (if applicable)

Additional Notes

Design Principles:

  • Zero semantic regression: node_limit, exclude_path, level_limit, recursive, case_insensitive behavior exactly matches default implementation
  • Only modify S3FS: Keep default implementation simple for local plugins (memfs/kvfs, etc.)
  • Resource constraints: Concurrency window clamp(available_parallelism()*2, 16, 100), 16MB partial buffer limit to prevent OOM

Key Optimizations:

  • Flat listing: API calls reduced from 1001 to ~3 for 1000-subdirectory scenarios
  • Bypassing read(): Saves 2 S3 API calls per chunk read (directory_exists checks)
  • File size from listing: No need for additional stat calls

@github-actions
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🏅 Score: 85
🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Behavior change in path encoding

The encode_path function now encodes spaces even when they are not in normalize_encoding_chars (as long as normalize_encoding_chars is non-empty). This is a behavioral change from the previous implementation which only encoded characters explicitly listed in normalize_encoding_chars. This may break existing code relying on the old encoding behavior.

fn encode_path(path: &str, normalize_encoding_chars: &str) -> String {
    if normalize_encoding_chars.is_empty() {
        return path.to_string();
    }

    let target_chars: Vec<char> = normalize_encoding_chars.chars().collect();
    let is_target = |ch: char| ch != '/' && ch.is_ascii() && (ch == ' ' || target_chars.contains(&ch));

    if !path.chars().any(&is_target) {
        return path.to_string();
    }

    let extra = path.chars().filter(|&ch| is_target(ch)).count() * 2;
    let mut encoded = String::with_capacity(path.len() + extra);

    for ch in path.chars() {
        if ch == '/' || !ch.is_ascii() || (ch != ' ' && !target_chars.contains(&ch)) {
            encoded.push(ch);
        } else {
            push_encoded_byte(&mut encoded, ch as u8);
        }
    }

@baojun-zhang
Copy link
Copy Markdown
Collaborator Author

baojun-zhang commented May 15, 2026

AGFS-GREP-TOS性能测试

前置

后端使用 TOS,导入 1000 个资源,用于准备 grep 测试数据:

python3 ./OpenViking/load_test_add_resource_1000.py
image

测试命令

python3 ./load_test_grep_100.py \
  --account-id tenant_d9f3c5b17f \
  --user-id admin \
  --api-key dGVuYW50X2Q5ZjNjNWIxN2Y.dXNlcl9iMjdlNDZhMWNi.NTNiNDUyZGMxYTU3MzgzZWFhMTY1NWMwYWIxNzZmYjZiYWE3YzhmZTYxZDhmZWIxNmI0Y2FiODZmYmY4YjU5ZA

结果汇总(100 次)

场景 total_sec avg_ms p50_ms p95_ms p99_ms last_result_count
测试1 1536.0106 15360.08 14133.22 27961.53 41612.96 256
测试2 157.9589 1579.56 1240.96 1618.94 11278.29 256

一句话结论

在相同结果规模(last_result_count=256)下,测试2相比测试1平均耗时从 ~15360ms/次 降到 ~1580ms/次,提升约 9.7 倍

原始输出

测试1(优化前)

OpenViking % python load_test_grep_100.py --account-id tenant_d9f3c5b17f  --user-id admin --api-key dG...5ZA  --iterations 100
using existing tenant: account_id=tenant_d9f3c5b17f user_id=admin
ERR i=18 status=500 data=None text=
progress: 20/100, ok=19, fail=1
progress: 40/100, ok=39, fail=1
progress: 60/100, ok=59, fail=1
progress: 80/100, ok=79, fail=1
progress: 100/100, ok=99, fail=1

Summary:
  iterations: 100
  ok: 99
  fail: 1
  total_sec: 1536.0106
  avg_ms: 15360.08
  p50_ms: 14133.22
  p95_ms: 27961.53
  p99_ms: 41612.96
  last_result_count: 256

测试2(优化后)

OpenViking % python load_test_grep_100.py --account-id tenant_d9f3c5b17f  --user-id admin --api-key dG...5ZA
         
using existing tenant: account_id=tenant_d9f3c5b17f user_id=admin
progress: 20/100, ok=20, fail=0
progress: 40/100, ok=40, fail=0
progress: 60/100, ok=60, fail=0
progress: 80/100, ok=80, fail=0
progress: 100/100, ok=100, fail=0

Summary:
  iterations: 100
  ok: 100
  fail: 0
  total_sec: 157.9589
  avg_ms: 1579.56
  p50_ms: 1240.96
  p95_ms: 1618.94
  p99_ms: 11278.29
  last_result_count: 256

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix unconditional space encoding regression

Fix regression where spaces are now always encoded even when not included in
normalize_encoding_chars. The new logic incorrectly adds space to the target
character set unconditionally. Revert to only encoding characters explicitly present
in target_chars.

crates/ragfs/src/plugins/s3fs/client.rs [22-37]

-let is_target = |ch: char| ch != '/' && ch.is_ascii() && (ch == ' ' || target_chars.contains(&ch));
+let is_target = |ch: char| ch != '/' && ch.is_ascii() && target_chars.contains(&ch);
 
 if !path.chars().any(&is_target) {
     return path.to_string();
 }
 
 let extra = path.chars().filter(|&ch| is_target(ch)).count() * 2;
 let mut encoded = String::with_capacity(path.len() + extra);
 
 for ch in path.chars() {
-    if ch == '/' || !ch.is_ascii() || (ch != ' ' && !target_chars.contains(&ch)) {
+    if ch == '/' || !ch.is_ascii() || !target_chars.contains(&ch) {
         encoded.push(ch);
     } else {
         push_encoded_byte(&mut encoded, ch as u8);
     }
 }
Suggestion importance[1-10]: 9

__

Why: The PR introduced a regression where spaces are now encoded unconditionally, even when not in normalize_encoding_chars. This suggestion correctly reverts that logic, fixing a functional bug.

High
Fix atomic match count synchronization

Use Ordering::Acquire for loading matched_count and Ordering::Release for storing it
to ensure proper cross-thread synchronization. This prevents stale reads of the
match count and ensures updates are visible to other threads in a timely manner.

crates/ragfs/src/plugins/s3fs/mod.rs [345-363]

-let done = matched_count.load(Ordering::Relaxed);
+let done = matched_count.load(Ordering::Acquire);
 if done >= limit {
     return Ok(());
 }
 
 let remaining = limit.saturating_sub(done);
 let matches = self
     .grep_one_file(base_path, &path, file_size, re, remaining)
     .await?;
 
 let mut r = result.lock().unwrap();
 for m in matches {
     if r.count >= limit {
         break;
     }
     r.matches.push(m);
     r.count += 1;
 }
-matched_count.store(r.count, Ordering::Relaxed);
+matched_count.store(r.count, Ordering::Release);
Suggestion importance[1-10]: 6

__

Why: Using Ordering::Relaxed for the atomic matched_count can lead to stale reads across threads. Switching to Acquire/Release improves synchronization, though the existing Mutex already protects the main result state.

Low

@zhoujh01 zhoujh01 merged commit 86f61ee into volcengine:main May 20, 2026
11 checks passed
@github-project-automation github-project-automation Bot moved this from Backlog to Done in OpenViking project May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants