fix(core): 🐛 Prevent glob matcher stack overflow by jorben · Pull Request #203 · tiylabs/tiycode

jorben · 2026-05-24T08:50:57Z

Summary

Replace recursive policy glob matching with a stack-safe iterative matcher.
Preserve existing * wildcard and backslash-escape behavior for shell and non-shell policy rules.
Add a long shell command regression test covering the stack overflow crash scenario.

Test Plan

cargo fmt --check --manifest-path src-tauri/Cargo.toml
cargo test --locked --manifest-path src-tauri/Cargo.toml --test tool_gateway policy
cargo test --locked --manifest-path src-tauri/Cargo.toml --test tool_gateway

🤖 Generated with TiyCode

Rewrite the glob matching algorithm from recursive to iterative to avoid stack overflow on very long shell command arguments. The previous implementation used recursion with memoization, which could exhaust the stack for inputs exceeding 30,000 characters. The new algorithm uses a tokenizer and an iterative matching loop, improving robustness and performance. Add a test verifying that long shell commands (e.g., 30,000 characters) do not cause stack overflow and are correctly evaluated by the policy engine.

github-actions · 2026-05-24T08:53:14Z

AI Code Review Summary

PR: #203 (fix(core): 🐛 Prevent glob matcher stack overflow)
Preferred language: English

Overall Assessment

Detected 2 actionable findings, prioritize CRITICAL/HIGH before merge.

Major Findings by Severity

MEDIUM (1)
- src-tauri/src/core/policy_engine.rs:541 - Repetitive Tokenization and Allocations in Glob Matching
LOW (1)
- src-tauri/src/core/policy_engine.rs:542 - Avoid collecting text characters to a Vec to prevent allocation on extremely long command payloads

Actionable Suggestions

As a future refactoring, explore using a zero-allocation iterator-based approach or byte-level UTF-8 traversal to avoid collecting the entire text string into a Vec<char>.
Pre-compile policy patterns during policy load instead of parsing them dynamically on every evaluation.
Collapse multiple consecutive asterisks (e.g., ***) into a single SimpleGlobToken::AnySequence during tokenization.

Potential Risks

Worst-case O(N * M) execution time for maliciously designed wildcard-dense policy patterns, although in practice administrative policy configurations are controlled and trusted.
Under extremely high command execution frequencies, repeated character collections and vector allocations could introduce minor latency spikes in the tool execution lifecycle.

Test Suggestions

Add micro-benchmarks or local unit tests in policy_engine.rs to assert correct matching on complex glob boundaries (e.g. trailing backslashes, multiple asterisks, empty values).
Implement benchmark tests to measure the throughput of simple_glob_match compared to standard regex-based or string-based matchers.

File-Level Coverage Notes

src-tauri/src/core/policy_engine.rs: The changes successfully resolve a potential Stack Overflow / Memory Exhaustion vulnerability by rewriting the glob matching algorithm to be iterative and stack-safe. The implementation is highly robust and correct. (Great work, Buddy!)
src-tauri/tests/tool_gateway.rs: Added integration tests are comprehensive, verifying that extremely long shell command arguments do not trigger stack overflow and that the policy evaluation still functions correctly for both denied and allowed commands. (The inclusion of test_policy_long_shell_commands_do_not_overflow_glob_matcher with a 30,000-character input is highly effective for verification.)

Inline Downgraded Items (processed but not inline)

None

Coverage Status

Target files: 2
Covered files: 2
Uncovered files: 0
No-patch/binary covered as file-level: 0
Findings with unknown confidence (N/A): 0

Uncovered list:

None

No-patch covered list:

None

Runtime/Budget

Rounds used: 1/4
Planned batches: 1
Executed batches: 1
Sub-agent runs: 3
Planner calls: 1
Reviewer calls: 3
Model calls: 4/64
Structured-output summary-only degradation: NO

github-actions

Automated PR review completed.

Findings kept: 2
Findings with unknown confidence: 0
Inline comments attempted: 2
Target files: 2
Covered files: 2
Uncovered files: 0
See the summary comment for detailed analysis and coverage details.

github-actions · 2026-05-24T08:53:15Z

+
 fn simple_glob_match(pattern: &str, text: &str) -> bool {
-    let pattern_chars: Vec<char> = pattern.chars().collect();
+    let pattern_tokens = tokenize_simple_glob(pattern);


[MEDIUM] Repetitive Tokenization and Allocations in Glob Matching

The glob matcher tokenizes the pattern and collects characters of the text into dynamic arrays (Vec) on every evaluation. While this is a massive improvement over the previous quadratic DP memoization table, doing this dynamically on every policy evaluation causes unnecessary allocation churn, especially when evaluating multiple policy rules against a command.

Suggestion: 1. Pre-tokenize/compile the glob patterns into Vec<SimpleGlobToken> when the policy is loaded or parsed (e.g., inside EffectivePolicyRule), rather than during every evaluation.
2. Optimize tokenize_simple_glob to collapse consecutive wildcards (multiple consecutive * characters) into a single AnySequence token to reduce matching iterations.
3. Explore using standard string matching or byte-based iteration if patterns and text are ASCII-compatible, avoiding Vec<char> allocations.

Risk: Increased CPU overhead and garbage/allocation churn under heavy load or dense policy configurations, which could slow down tool execution pipeline throughput.

Confidence: 0.90

[From SubAgent: performance]

github-actions · 2026-05-24T08:53:15Z

 fn simple_glob_match(pattern: &str, text: &str) -> bool {
-    let pattern_chars: Vec<char> = pattern.chars().collect();
+    let pattern_tokens = tokenize_simple_glob(pattern);
    let text_chars: Vec<char> = text.chars().collect();


[LOW] Avoid collecting text characters to a Vec to prevent allocation on extremely long command payloads

In simple_glob_match, the text is collected into a Vec<char> before matching. For extremely large inputs, this triggers a heap allocation of size proportional to the text length.

Suggestion: Buddy, for a future optimization you can perform matching directly using character iterators or UTF-8 byte indexes to avoid allocating a Vec<char> for the entire text, though for typical command lines the current approach is perfectly fine and safe.

Risk: Negligible memory overhead for standard shell commands, but could become a minor bottleneck if extremely large payloads are routinely matched.

Confidence: 0.85

[From SubAgent: general]

github-actions Bot reviewed May 24, 2026

View reviewed changes

jorben merged commit 91ed98d into master May 24, 2026
4 checks passed

jorben deleted the refactor/glob-matcher branch May 24, 2026 14:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core): 🐛 Prevent glob matcher stack overflow#203

fix(core): 🐛 Prevent glob matcher stack overflow#203
jorben merged 1 commit into
masterfrom
refactor/glob-matcher

jorben commented May 24, 2026

Uh oh!

github-actions Bot commented May 24, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot May 24, 2026

Uh oh!

github-actions Bot May 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jorben commented May 24, 2026

Summary

Test Plan

Uh oh!

github-actions Bot commented May 24, 2026

AI Code Review Summary

Overall Assessment

Major Findings by Severity

Actionable Suggestions

Potential Risks

Test Suggestions

File-Level Coverage Notes

Inline Downgraded Items (processed but not inline)

Coverage Status

Runtime/Budget

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant