fix(core): 🐛 Prevent glob matcher stack overflow#203
Conversation
Rewrite the glob matching algorithm from recursive to iterative to avoid stack overflow on very long shell command arguments. The previous implementation used recursion with memoization, which could exhaust the stack for inputs exceeding 30,000 characters. The new algorithm uses a tokenizer and an iterative matching loop, improving robustness and performance. Add a test verifying that long shell commands (e.g., 30,000 characters) do not cause stack overflow and are correctly evaluated by the policy engine.
AI Code Review SummaryPR: #203 (fix(core): 🐛 Prevent glob matcher stack overflow) Overall AssessmentDetected 2 actionable findings, prioritize CRITICAL/HIGH before merge. Major Findings by Severity
Actionable Suggestions
Potential Risks
Test Suggestions
File-Level Coverage Notes
Inline Downgraded Items (processed but not inline)
Coverage Status
Uncovered list:
No-patch covered list:
Runtime/Budget
|
|
|
||
| fn simple_glob_match(pattern: &str, text: &str) -> bool { | ||
| let pattern_chars: Vec<char> = pattern.chars().collect(); | ||
| let pattern_tokens = tokenize_simple_glob(pattern); |
There was a problem hiding this comment.
[MEDIUM] Repetitive Tokenization and Allocations in Glob Matching
The glob matcher tokenizes the pattern and collects characters of the text into dynamic arrays (Vec) on every evaluation. While this is a massive improvement over the previous quadratic DP memoization table, doing this dynamically on every policy evaluation causes unnecessary allocation churn, especially when evaluating multiple policy rules against a command.
Suggestion: 1. Pre-tokenize/compile the glob patterns into Vec<SimpleGlobToken> when the policy is loaded or parsed (e.g., inside EffectivePolicyRule), rather than during every evaluation.
2. Optimize tokenize_simple_glob to collapse consecutive wildcards (multiple consecutive * characters) into a single AnySequence token to reduce matching iterations.
3. Explore using standard string matching or byte-based iteration if patterns and text are ASCII-compatible, avoiding Vec<char> allocations.
Risk: Increased CPU overhead and garbage/allocation churn under heavy load or dense policy configurations, which could slow down tool execution pipeline throughput.
Confidence: 0.90
| fn simple_glob_match(pattern: &str, text: &str) -> bool { | ||
| let pattern_chars: Vec<char> = pattern.chars().collect(); | ||
| let pattern_tokens = tokenize_simple_glob(pattern); | ||
| let text_chars: Vec<char> = text.chars().collect(); |
There was a problem hiding this comment.
[LOW] Avoid collecting text characters to a Vec to prevent allocation on extremely long command payloads
In simple_glob_match, the text is collected into a Vec<char> before matching. For extremely large inputs, this triggers a heap allocation of size proportional to the text length.
Suggestion: Buddy, for a future optimization you can perform matching directly using character iterators or UTF-8 byte indexes to avoid allocating a Vec<char> for the entire text, though for typical command lines the current approach is perfectly fine and safe.
Risk: Negligible memory overhead for standard shell commands, but could become a minor bottleneck if extremely large payloads are routinely matched.
Confidence: 0.85
Summary
*wildcard and backslash-escape behavior for shell and non-shell policy rules.Test Plan
cargo fmt --check --manifest-path src-tauri/Cargo.tomlcargo test --locked --manifest-path src-tauri/Cargo.toml --test tool_gateway policycargo test --locked --manifest-path src-tauri/Cargo.toml --test tool_gateway🤖 Generated with TiyCode