Skip to content

feat: improve ^~ prefix regex shadowing detection in unreachable-location#37

Merged
walf443 merged 6 commits intomainfrom
unreachable_location_more
Feb 11, 2026
Merged

feat: improve ^~ prefix regex shadowing detection in unreachable-location#37
walf443 merged 6 commits intomainfrom
unreachable_location_more

Conversation

@walf443
Copy link
Owner

@walf443 walf443 commented Feb 11, 2026

Improve prefix_might_shadow_regex heuristic to detect more patterns where ^~ modifier blocks regex evaluation:

  • ^~ / shadows all regex locations
  • Catch-all regex (.*) shadowed by any ^~
  • Bidirectional prefix overlap (regex literal under prefix, prefix under regex literal)
  • Global file extension patterns without trailing slash requirement

Add extract_regex_literal_prefix and is_global_extension_pattern helpers. Add 5 unit tests and 3 container tests for the new detection patterns. Update bad/good examples with ^~ shadowing scenarios.

…tion

Improve prefix_might_shadow_regex heuristic to detect more patterns where
^~ modifier blocks regex evaluation:
- ^~ / shadows all regex locations
- Catch-all regex (.*) shadowed by any ^~
- Bidirectional prefix overlap (regex literal under prefix, prefix under regex literal)
- Global file extension patterns without trailing slash requirement

Add extract_regex_literal_prefix and is_global_extension_pattern helpers.
Add 5 unit tests and 3 container tests for the new detection patterns.
Update bad/good examples with ^~ shadowing scenarios.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the unreachable-location best-practices plugin’s heuristic for detecting when ^~ prefix locations can block evaluation of regex (~/~*) locations, and expands test coverage and examples accordingly.

Changes:

  • Enhance prefix_might_shadow_regex to catch additional ^~ shadowing scenarios (root prefix, catch-all regex, prefix/regex literal overlap, global extension regex patterns).
  • Add helpers for extracting a regex’s literal prefix and detecting global extension regexes.
  • Add new unit tests, container tests, and update good/bad config examples for the new scenarios.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
plugins/builtin/best_practices/unreachable_location/src/lib.rs Expands the ^~ vs regex shadowing heuristic and adds corresponding unit tests.
plugins/builtin/best_practices/unreachable_location/tests/container_test.rs Adds ignored integration tests validating nginx runtime behavior for the new scenarios.
plugins/builtin/best_practices/unreachable_location/examples/good.conf Adds a non-conflicting ^~ + regex example.
plugins/builtin/best_practices/unreachable_location/examples/bad.conf Adds a conflicting ^~ + global extension regex example to demonstrate the warning.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +270 to +284
chars.next();
if let Some(&next) = chars.peek() {
// \/ is a literal slash in paths
if next == '/' {
result.push(next);
chars.next();
} else {
break; // other escapes are regex metacharacters
}
}
} else if c.is_alphanumeric() || c == '/' || c == '_' || c == '-' || c == '.' {
result.push(c);
chars.next();
} else {
break; // regex metacharacter
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract_regex_literal_prefix treats unescaped . as part of the literal prefix (even though . is a regex wildcard) and treats escapes like \. as a metacharacter and stops parsing. This contradicts the doc example (^/static/.*\.css$/static/) and can produce incorrect prefixes (e.g. /static/.), leading to missed or incorrect shadowing detection for patterns with ./escaped literals. Consider stopping on unescaped . and treating common escapes (like \.) as literal characters when building the prefix (only stopping on escapes that introduce character classes like \d, \w, etc.).

Suggested change
chars.next();
if let Some(&next) = chars.peek() {
// \/ is a literal slash in paths
if next == '/' {
result.push(next);
chars.next();
} else {
break; // other escapes are regex metacharacters
}
}
} else if c.is_alphanumeric() || c == '/' || c == '_' || c == '-' || c == '.' {
result.push(c);
chars.next();
} else {
break; // regex metacharacter
// Handle escaped characters. Treat simple escapes that are commonly
// used in path-like prefixes as literals, and stop on others.
chars.next(); // consume '\'
if let Some(&next) = chars.peek() {
// Escaped literals we allow in the prefix (e.g. `\.`, `\/`, `\-`, `\_`, `\a`)
if next == '/'
|| next == '.'
|| next == '_'
|| next == '-'
|| next.is_alphanumeric()
{
result.push(next);
chars.next();
} else {
// Other escapes (e.g. \d, \w, etc.) are treated as regex metacharacters;
// stop building the literal prefix here.
break;
}
} else {
// Trailing backslash; nothing more to consume.
break;
}
} else if c.is_alphanumeric() || c == '/' || c == '_' || c == '-' {
// Plain literal characters allowed in a path-like prefix.
result.push(c);
chars.next();
} else if c == '.' {
// Unescaped '.' is a regex wildcard; stop the literal prefix.
break;
} else {
// Any other character is treated as a regex metacharacter boundary.
break;

Copilot uses AI. Check for mistakes.
…prefix

Unescaped '.' is a regex wildcard (matches any character) and should stop
literal prefix extraction. Conversely, escaped sequences like \. \/ \- \_
are literal characters and should be included in the prefix.

Previously `^/static/.*\.css$` incorrectly produced `/static/.` instead
of `/static/`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +300 to +302
let s = regex_pattern.trim_start_matches('^');
// Global pattern: doesn't start with a path, contains file extension matching
!s.starts_with('/') && regex_pattern.contains(r"\.")
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_global_extension_pattern currently returns true for any regex that doesn’t start with '/' and contains "\.", which is broader than “file extension pattern” (e.g., patterns like "\.well-known/" would be classified as an extension regex). This can introduce noisy ^~ shadowing warnings. Consider tightening the heuristic (e.g., require the pattern to start with "\." or ".*\." and typically end with "$" / include an extension character class/group).

Suggested change
let s = regex_pattern.trim_start_matches('^');
// Global pattern: doesn't start with a path, contains file extension matching
!s.starts_with('/') && regex_pattern.contains(r"\.")
// Strip a leading anchor for easier shape checking.
let s = regex_pattern.trim_start_matches('^');
// Must not start with a path prefix.
if s.starts_with('/') {
return false;
}
// Must start like a pure extension pattern: `\.` or `.*\.`
if !(s.starts_with(r"\.") || s.starts_with(r".*\.")) {
return false;
}
// Find the first `\.` and ensure there is no `/` after it (to avoid
// treating patterns like `\.well-known/` as global extension patterns).
if let Some(dot_idx) = s.find(r"\.") {
let after_dot = &s[dot_idx + 2..];
if after_dot.contains('/') {
return false;
}
} else {
return false;
}
// Typical extension patterns either end with `$` or include an
// extension group/character class (e.g., `\.(jpg|png)$`, `\.[a-z]+$`).
let ends_with_dollar = s.ends_with('$');
let has_ext_group_or_class = s.contains('(') || s.contains('[');
if !ends_with_dollar && !has_ext_group_or_class {
return false;
}
true

Copilot uses AI. Check for mistakes.
Comment on lines +232 to +235
// 2. Catch-all regex patterns are always shadowed by any ^~
if regex_pattern == ".*" || regex_pattern == "^.*" || regex_pattern == "." {
return true;
}
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catch-all regex detection only matches exactly ".", "^.", or ".". Common catch-alls like "^.$" (and variations with trailing "$"/flags) won’t be treated as catch-all here, so ^~ shadowing warnings will be missed for those patterns. Consider normalizing by trimming leading '^' and trailing '$' (and possibly surrounding parentheses) before comparing to "."/".".

Copilot uses AI. Check for mistakes.
Extract utility functions for clearer separation of concerns:
- is_catchall_regex: detect catch-all patterns (also reused by regex_shadows)
- prefix_and_regex_paths_overlap: bidirectional literal prefix overlap check
- is_plain_path_char / is_escaped_path_literal: character classification helpers

Also tighten is_global_extension_pattern to require extension-like shape
(\. or .*\. prefix, no / after dot, $ or group/class ending) and normalize
catch-all detection to handle ^.*$ variants.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace verbose doc comments with focused unit tests that document the
intended behavior of each utility function:
- is_catchall_regex: catch-all patterns and non-matches
- extract_regex_literal_prefix: anchor stripping, escaped dots, metachar stops
- prefix_and_regex_paths_overlap: bidirectional overlap and non-overlap cases
- is_global_extension_pattern: valid extensions vs path patterns
- is_plain_path_char / is_escaped_path_literal: character classification

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@walf443 walf443 force-pushed the unreachable_location_more branch from e7fca06 to 453be58 Compare February 11, 2026 09:47
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +338 to +339
fn is_escaped_path_literal(c: char) -> bool {
c == '/' || c == '.' || c == '_' || c == '-' || c.is_alphanumeric()
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_escaped_path_literal currently treats any alphanumeric escape (e.g. \d, \w, \s) as a literal character. In PCRE (used by nginx), those escapes are character classes/anchors, not literals, so extract_regex_literal_prefix can incorrectly include them and produce misleading overlap/shadowing warnings. Consider limiting this to escapes that are truly literal in paths (e.g. ., /, -, _) and treating known regex escapes (dDsSwWbBAZ, etc.) as non-literal (stop extraction).

Suggested change
fn is_escaped_path_literal(c: char) -> bool {
c == '/' || c == '.' || c == '_' || c == '-' || c.is_alphanumeric()
///
/// In PCRE (used by nginx), many alphanumeric escapes like `\d`, `\w`, `\s`,
/// `\b`, `\A`, `\Z`, etc. are character classes or anchors, not literals.
/// To avoid incorrectly extending the extracted literal prefix past such
/// constructs, we treat only a small set of clearly literal path characters
/// as valid escaped literals.
fn is_escaped_path_literal(c: char) -> bool {
matches!(c, '/' | '.' | '_' | '-')

Copilot uses AI. Check for mistakes.
Narrow is_escaped_path_literal to only accept \. \/ \- \_ as literals.
Previously \d \w \s etc. were treated as literals due to the
is_alphanumeric() check, which could produce incorrect prefixes
(e.g., ^/api/v\d+ would extract "/api/vd" instead of "/api/v").

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

/// 1. `^~ /` matches all URIs
/// 2. Catch-all regex (e.g., `.*`) is always shadowed
/// 3. Regex's literal prefix overlaps with `^~` path
/// 4. Global extension patterns (e.g., `\.(css|js)$`) are shadowed by any `^~`
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc comment bullet 4 is a bit misleading: a global extension regex (e.g., \.(css|js)$) is not universally shadowed by any ^~ prefix—it's only skipped for requests whose URI matches that ^~ prefix. Consider rewording to clarify this is a “may shadow for URIs under the prefix” scenario to avoid implying the regex becomes entirely unreachable.

Suggested change
/// 4. Global extension patterns (e.g., `\.(css|js)$`) are shadowed by any `^~`
/// 4. Global extension patterns (e.g., `\.(css|js)$`) may be shadowed for URIs under a `^~` prefix

Copilot uses AI. Check for mistakes.
The ^~ prefix only prevents regex evaluation for URIs matching the
prefix, not for all URIs. Update doc comment to say "may be shadowed
for URIs under a ^~ prefix" instead of "are shadowed by any ^~".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@walf443 walf443 merged commit b59df45 into main Feb 11, 2026
10 checks passed
@walf443 walf443 deleted the unreachable_location_more branch February 11, 2026 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants