feat: improve ^~ prefix regex shadowing detection in unreachable-location#37
feat: improve ^~ prefix regex shadowing detection in unreachable-location#37
Conversation
…tion Improve prefix_might_shadow_regex heuristic to detect more patterns where ^~ modifier blocks regex evaluation: - ^~ / shadows all regex locations - Catch-all regex (.*) shadowed by any ^~ - Bidirectional prefix overlap (regex literal under prefix, prefix under regex literal) - Global file extension patterns without trailing slash requirement Add extract_regex_literal_prefix and is_global_extension_pattern helpers. Add 5 unit tests and 3 container tests for the new detection patterns. Update bad/good examples with ^~ shadowing scenarios. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR improves the unreachable-location best-practices plugin’s heuristic for detecting when ^~ prefix locations can block evaluation of regex (~/~*) locations, and expands test coverage and examples accordingly.
Changes:
- Enhance
prefix_might_shadow_regexto catch additional^~shadowing scenarios (root prefix, catch-all regex, prefix/regex literal overlap, global extension regex patterns). - Add helpers for extracting a regex’s literal prefix and detecting global extension regexes.
- Add new unit tests, container tests, and update good/bad config examples for the new scenarios.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
plugins/builtin/best_practices/unreachable_location/src/lib.rs |
Expands the ^~ vs regex shadowing heuristic and adds corresponding unit tests. |
plugins/builtin/best_practices/unreachable_location/tests/container_test.rs |
Adds ignored integration tests validating nginx runtime behavior for the new scenarios. |
plugins/builtin/best_practices/unreachable_location/examples/good.conf |
Adds a non-conflicting ^~ + regex example. |
plugins/builtin/best_practices/unreachable_location/examples/bad.conf |
Adds a conflicting ^~ + global extension regex example to demonstrate the warning. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| chars.next(); | ||
| if let Some(&next) = chars.peek() { | ||
| // \/ is a literal slash in paths | ||
| if next == '/' { | ||
| result.push(next); | ||
| chars.next(); | ||
| } else { | ||
| break; // other escapes are regex metacharacters | ||
| } | ||
| } | ||
| } else if c.is_alphanumeric() || c == '/' || c == '_' || c == '-' || c == '.' { | ||
| result.push(c); | ||
| chars.next(); | ||
| } else { | ||
| break; // regex metacharacter |
There was a problem hiding this comment.
extract_regex_literal_prefix treats unescaped . as part of the literal prefix (even though . is a regex wildcard) and treats escapes like \. as a metacharacter and stops parsing. This contradicts the doc example (^/static/.*\.css$ → /static/) and can produce incorrect prefixes (e.g. /static/.), leading to missed or incorrect shadowing detection for patterns with ./escaped literals. Consider stopping on unescaped . and treating common escapes (like \.) as literal characters when building the prefix (only stopping on escapes that introduce character classes like \d, \w, etc.).
| chars.next(); | |
| if let Some(&next) = chars.peek() { | |
| // \/ is a literal slash in paths | |
| if next == '/' { | |
| result.push(next); | |
| chars.next(); | |
| } else { | |
| break; // other escapes are regex metacharacters | |
| } | |
| } | |
| } else if c.is_alphanumeric() || c == '/' || c == '_' || c == '-' || c == '.' { | |
| result.push(c); | |
| chars.next(); | |
| } else { | |
| break; // regex metacharacter | |
| // Handle escaped characters. Treat simple escapes that are commonly | |
| // used in path-like prefixes as literals, and stop on others. | |
| chars.next(); // consume '\' | |
| if let Some(&next) = chars.peek() { | |
| // Escaped literals we allow in the prefix (e.g. `\.`, `\/`, `\-`, `\_`, `\a`) | |
| if next == '/' | |
| || next == '.' | |
| || next == '_' | |
| || next == '-' | |
| || next.is_alphanumeric() | |
| { | |
| result.push(next); | |
| chars.next(); | |
| } else { | |
| // Other escapes (e.g. \d, \w, etc.) are treated as regex metacharacters; | |
| // stop building the literal prefix here. | |
| break; | |
| } | |
| } else { | |
| // Trailing backslash; nothing more to consume. | |
| break; | |
| } | |
| } else if c.is_alphanumeric() || c == '/' || c == '_' || c == '-' { | |
| // Plain literal characters allowed in a path-like prefix. | |
| result.push(c); | |
| chars.next(); | |
| } else if c == '.' { | |
| // Unescaped '.' is a regex wildcard; stop the literal prefix. | |
| break; | |
| } else { | |
| // Any other character is treated as a regex metacharacter boundary. | |
| break; |
…prefix Unescaped '.' is a regex wildcard (matches any character) and should stop literal prefix extraction. Conversely, escaped sequences like \. \/ \- \_ are literal characters and should be included in the prefix. Previously `^/static/.*\.css$` incorrectly produced `/static/.` instead of `/static/`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let s = regex_pattern.trim_start_matches('^'); | ||
| // Global pattern: doesn't start with a path, contains file extension matching | ||
| !s.starts_with('/') && regex_pattern.contains(r"\.") |
There was a problem hiding this comment.
is_global_extension_pattern currently returns true for any regex that doesn’t start with '/' and contains "\.", which is broader than “file extension pattern” (e.g., patterns like "\.well-known/" would be classified as an extension regex). This can introduce noisy ^~ shadowing warnings. Consider tightening the heuristic (e.g., require the pattern to start with "\." or ".*\." and typically end with "$" / include an extension character class/group).
| let s = regex_pattern.trim_start_matches('^'); | |
| // Global pattern: doesn't start with a path, contains file extension matching | |
| !s.starts_with('/') && regex_pattern.contains(r"\.") | |
| // Strip a leading anchor for easier shape checking. | |
| let s = regex_pattern.trim_start_matches('^'); | |
| // Must not start with a path prefix. | |
| if s.starts_with('/') { | |
| return false; | |
| } | |
| // Must start like a pure extension pattern: `\.` or `.*\.` | |
| if !(s.starts_with(r"\.") || s.starts_with(r".*\.")) { | |
| return false; | |
| } | |
| // Find the first `\.` and ensure there is no `/` after it (to avoid | |
| // treating patterns like `\.well-known/` as global extension patterns). | |
| if let Some(dot_idx) = s.find(r"\.") { | |
| let after_dot = &s[dot_idx + 2..]; | |
| if after_dot.contains('/') { | |
| return false; | |
| } | |
| } else { | |
| return false; | |
| } | |
| // Typical extension patterns either end with `$` or include an | |
| // extension group/character class (e.g., `\.(jpg|png)$`, `\.[a-z]+$`). | |
| let ends_with_dollar = s.ends_with('$'); | |
| let has_ext_group_or_class = s.contains('(') || s.contains('['); | |
| if !ends_with_dollar && !has_ext_group_or_class { | |
| return false; | |
| } | |
| true |
| // 2. Catch-all regex patterns are always shadowed by any ^~ | ||
| if regex_pattern == ".*" || regex_pattern == "^.*" || regex_pattern == "." { | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Catch-all regex detection only matches exactly ".", "^.", or ".". Common catch-alls like "^.$" (and variations with trailing "$"/flags) won’t be treated as catch-all here, so ^~ shadowing warnings will be missed for those patterns. Consider normalizing by trimming leading '^' and trailing '$' (and possibly surrounding parentheses) before comparing to "."/".".
Extract utility functions for clearer separation of concerns: - is_catchall_regex: detect catch-all patterns (also reused by regex_shadows) - prefix_and_regex_paths_overlap: bidirectional literal prefix overlap check - is_plain_path_char / is_escaped_path_literal: character classification helpers Also tighten is_global_extension_pattern to require extension-like shape (\. or .*\. prefix, no / after dot, $ or group/class ending) and normalize catch-all detection to handle ^.*$ variants. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace verbose doc comments with focused unit tests that document the intended behavior of each utility function: - is_catchall_regex: catch-all patterns and non-matches - extract_regex_literal_prefix: anchor stripping, escaped dots, metachar stops - prefix_and_regex_paths_overlap: bidirectional overlap and non-overlap cases - is_global_extension_pattern: valid extensions vs path patterns - is_plain_path_char / is_escaped_path_literal: character classification Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
e7fca06 to
453be58
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| fn is_escaped_path_literal(c: char) -> bool { | ||
| c == '/' || c == '.' || c == '_' || c == '-' || c.is_alphanumeric() |
There was a problem hiding this comment.
is_escaped_path_literal currently treats any alphanumeric escape (e.g. \d, \w, \s) as a literal character. In PCRE (used by nginx), those escapes are character classes/anchors, not literals, so extract_regex_literal_prefix can incorrectly include them and produce misleading overlap/shadowing warnings. Consider limiting this to escapes that are truly literal in paths (e.g. ., /, -, _) and treating known regex escapes (dDsSwWbBAZ, etc.) as non-literal (stop extraction).
| fn is_escaped_path_literal(c: char) -> bool { | |
| c == '/' || c == '.' || c == '_' || c == '-' || c.is_alphanumeric() | |
| /// | |
| /// In PCRE (used by nginx), many alphanumeric escapes like `\d`, `\w`, `\s`, | |
| /// `\b`, `\A`, `\Z`, etc. are character classes or anchors, not literals. | |
| /// To avoid incorrectly extending the extracted literal prefix past such | |
| /// constructs, we treat only a small set of clearly literal path characters | |
| /// as valid escaped literals. | |
| fn is_escaped_path_literal(c: char) -> bool { | |
| matches!(c, '/' | '.' | '_' | '-') |
Narrow is_escaped_path_literal to only accept \. \/ \- \_ as literals. Previously \d \w \s etc. were treated as literals due to the is_alphanumeric() check, which could produce incorrect prefixes (e.g., ^/api/v\d+ would extract "/api/vd" instead of "/api/v"). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// 1. `^~ /` matches all URIs | ||
| /// 2. Catch-all regex (e.g., `.*`) is always shadowed | ||
| /// 3. Regex's literal prefix overlaps with `^~` path | ||
| /// 4. Global extension patterns (e.g., `\.(css|js)$`) are shadowed by any `^~` |
There was a problem hiding this comment.
Doc comment bullet 4 is a bit misleading: a global extension regex (e.g., \.(css|js)$) is not universally shadowed by any ^~ prefix—it's only skipped for requests whose URI matches that ^~ prefix. Consider rewording to clarify this is a “may shadow for URIs under the prefix” scenario to avoid implying the regex becomes entirely unreachable.
| /// 4. Global extension patterns (e.g., `\.(css|js)$`) are shadowed by any `^~` | |
| /// 4. Global extension patterns (e.g., `\.(css|js)$`) may be shadowed for URIs under a `^~` prefix |
The ^~ prefix only prevents regex evaluation for URIs matching the prefix, not for all URIs. Update doc comment to say "may be shadowed for URIs under a ^~ prefix" instead of "are shadowed by any ^~". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Improve prefix_might_shadow_regex heuristic to detect more patterns where ^~ modifier blocks regex evaluation:
Add extract_regex_literal_prefix and is_global_extension_pattern helpers. Add 5 unit tests and 3 container tests for the new detection patterns. Update bad/good examples with ^~ shadowing scenarios.