Skip to content

fix(core): 🐛 Prevent glob matcher stack overflow#203

Merged
jorben merged 1 commit into
masterfrom
refactor/glob-matcher
May 24, 2026
Merged

fix(core): 🐛 Prevent glob matcher stack overflow#203
jorben merged 1 commit into
masterfrom
refactor/glob-matcher

Conversation

@jorben
Copy link
Copy Markdown
Contributor

@jorben jorben commented May 24, 2026

Summary

  • Replace recursive policy glob matching with a stack-safe iterative matcher.
  • Preserve existing * wildcard and backslash-escape behavior for shell and non-shell policy rules.
  • Add a long shell command regression test covering the stack overflow crash scenario.

Test Plan

  • cargo fmt --check --manifest-path src-tauri/Cargo.toml
  • cargo test --locked --manifest-path src-tauri/Cargo.toml --test tool_gateway policy
  • cargo test --locked --manifest-path src-tauri/Cargo.toml --test tool_gateway

🤖 Generated with TiyCode

Rewrite the glob matching algorithm from recursive to iterative to avoid stack overflow on very long shell command arguments. The previous implementation used recursion with memoization, which could exhaust the stack for inputs exceeding 30,000 characters. The new algorithm uses a tokenizer and an iterative matching loop, improving robustness and performance.

Add a test verifying that long shell commands (e.g., 30,000 characters) do not cause stack overflow and are correctly evaluated by the policy engine.
@github-actions
Copy link
Copy Markdown

AI Code Review Summary

PR: #203 (fix(core): 🐛 Prevent glob matcher stack overflow)
Preferred language: English

Overall Assessment

Detected 2 actionable findings, prioritize CRITICAL/HIGH before merge.

Major Findings by Severity

  • MEDIUM (1)
    • src-tauri/src/core/policy_engine.rs:541 - Repetitive Tokenization and Allocations in Glob Matching
  • LOW (1)
    • src-tauri/src/core/policy_engine.rs:542 - Avoid collecting text characters to a Vec to prevent allocation on extremely long command payloads

Actionable Suggestions

  • As a future refactoring, explore using a zero-allocation iterator-based approach or byte-level UTF-8 traversal to avoid collecting the entire text string into a Vec<char>.
  • Pre-compile policy patterns during policy load instead of parsing them dynamically on every evaluation.
  • Collapse multiple consecutive asterisks (e.g., ***) into a single SimpleGlobToken::AnySequence during tokenization.

Potential Risks

  • Worst-case O(N * M) execution time for maliciously designed wildcard-dense policy patterns, although in practice administrative policy configurations are controlled and trusted.
  • Under extremely high command execution frequencies, repeated character collections and vector allocations could introduce minor latency spikes in the tool execution lifecycle.

Test Suggestions

  • Add micro-benchmarks or local unit tests in policy_engine.rs to assert correct matching on complex glob boundaries (e.g. trailing backslashes, multiple asterisks, empty values).
  • Implement benchmark tests to measure the throughput of simple_glob_match compared to standard regex-based or string-based matchers.

File-Level Coverage Notes

  • src-tauri/src/core/policy_engine.rs: The changes successfully resolve a potential Stack Overflow / Memory Exhaustion vulnerability by rewriting the glob matching algorithm to be iterative and stack-safe. The implementation is highly robust and correct. (Great work, Buddy!)
  • src-tauri/tests/tool_gateway.rs: Added integration tests are comprehensive, verifying that extremely long shell command arguments do not trigger stack overflow and that the policy evaluation still functions correctly for both denied and allowed commands. (The inclusion of test_policy_long_shell_commands_do_not_overflow_glob_matcher with a 30,000-character input is highly effective for verification.)

Inline Downgraded Items (processed but not inline)

  • None

Coverage Status

  • Target files: 2
  • Covered files: 2
  • Uncovered files: 0
  • No-patch/binary covered as file-level: 0
  • Findings with unknown confidence (N/A): 0

Uncovered list:

  • None

No-patch covered list:

  • None

Runtime/Budget

  • Rounds used: 1/4
  • Planned batches: 1
  • Executed batches: 1
  • Sub-agent runs: 3
  • Planner calls: 1
  • Reviewer calls: 3
  • Model calls: 4/64
  • Structured-output summary-only degradation: NO

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 2
  • Findings with unknown confidence: 0
  • Inline comments attempted: 2
  • Target files: 2
  • Covered files: 2
  • Uncovered files: 0
    See the summary comment for detailed analysis and coverage details.


fn simple_glob_match(pattern: &str, text: &str) -> bool {
let pattern_chars: Vec<char> = pattern.chars().collect();
let pattern_tokens = tokenize_simple_glob(pattern);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Repetitive Tokenization and Allocations in Glob Matching

The glob matcher tokenizes the pattern and collects characters of the text into dynamic arrays (Vec) on every evaluation. While this is a massive improvement over the previous quadratic DP memoization table, doing this dynamically on every policy evaluation causes unnecessary allocation churn, especially when evaluating multiple policy rules against a command.

Suggestion: 1. Pre-tokenize/compile the glob patterns into Vec<SimpleGlobToken> when the policy is loaded or parsed (e.g., inside EffectivePolicyRule), rather than during every evaluation.
2. Optimize tokenize_simple_glob to collapse consecutive wildcards (multiple consecutive * characters) into a single AnySequence token to reduce matching iterations.
3. Explore using standard string matching or byte-based iteration if patterns and text are ASCII-compatible, avoiding Vec<char> allocations.

Risk: Increased CPU overhead and garbage/allocation churn under heavy load or dense policy configurations, which could slow down tool execution pipeline throughput.

Confidence: 0.90

[From SubAgent: performance]

fn simple_glob_match(pattern: &str, text: &str) -> bool {
let pattern_chars: Vec<char> = pattern.chars().collect();
let pattern_tokens = tokenize_simple_glob(pattern);
let text_chars: Vec<char> = text.chars().collect();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] Avoid collecting text characters to a Vec to prevent allocation on extremely long command payloads

In simple_glob_match, the text is collected into a Vec<char> before matching. For extremely large inputs, this triggers a heap allocation of size proportional to the text length.

Suggestion: Buddy, for a future optimization you can perform matching directly using character iterators or UTF-8 byte indexes to avoid allocating a Vec<char> for the entire text, though for typical command lines the current approach is perfectly fine and safe.

Risk: Negligible memory overhead for standard shell commands, but could become a minor bottleneck if extremely large payloads are routinely matched.

Confidence: 0.85

[From SubAgent: general]

@jorben jorben merged commit 91ed98d into master May 24, 2026
4 checks passed
@jorben jorben deleted the refactor/glob-matcher branch May 24, 2026 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant