feat: improve ^~ prefix regex shadowing detection in unreachable-location by walf443 · Pull Request #37 · walf443/nginx-lint

walf443 · 2026-02-11T09:10:45Z

Improve prefix_might_shadow_regex heuristic to detect more patterns where ^~ modifier blocks regex evaluation:

^~ / shadows all regex locations
Catch-all regex (.*) shadowed by any ^~
Bidirectional prefix overlap (regex literal under prefix, prefix under regex literal)
Global file extension patterns without trailing slash requirement

Add extract_regex_literal_prefix and is_global_extension_pattern helpers. Add 5 unit tests and 3 container tests for the new detection patterns. Update bad/good examples with ^~ shadowing scenarios.

…tion Improve prefix_might_shadow_regex heuristic to detect more patterns where ^~ modifier blocks regex evaluation: - ^~ / shadows all regex locations - Catch-all regex (.*) shadowed by any ^~ - Bidirectional prefix overlap (regex literal under prefix, prefix under regex literal) - Global file extension patterns without trailing slash requirement Add extract_regex_literal_prefix and is_global_extension_pattern helpers. Add 5 unit tests and 3 container tests for the new detection patterns. Update bad/good examples with ^~ shadowing scenarios. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

This PR improves the unreachable-location best-practices plugin’s heuristic for detecting when ^~ prefix locations can block evaluation of regex (~/~*) locations, and expands test coverage and examples accordingly.

Changes:

Enhance prefix_might_shadow_regex to catch additional ^~ shadowing scenarios (root prefix, catch-all regex, prefix/regex literal overlap, global extension regex patterns).
Add helpers for extracting a regex’s literal prefix and detecting global extension regexes.
Add new unit tests, container tests, and update good/bad config examples for the new scenarios.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
`plugins/builtin/best_practices/unreachable_location/src/lib.rs`	Expands the `^~` vs regex shadowing heuristic and adds corresponding unit tests.
`plugins/builtin/best_practices/unreachable_location/tests/container_test.rs`	Adds ignored integration tests validating nginx runtime behavior for the new scenarios.
`plugins/builtin/best_practices/unreachable_location/examples/good.conf`	Adds a non-conflicting `^~` + regex example.
`plugins/builtin/best_practices/unreachable_location/examples/bad.conf`	Adds a conflicting `^~` + global extension regex example to demonstrate the warning.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-11T09:15:17Z

plugins/builtin/best_practices/unreachable_location/src/lib.rs

+                chars.next();
+                if let Some(&next) = chars.peek() {
+                    // \/ is a literal slash in paths
+                    if next == '/' {
+                        result.push(next);
+                        chars.next();
+                    } else {
+                        break; // other escapes are regex metacharacters
+                    }
+                }
+            } else if c.is_alphanumeric() || c == '/' || c == '_' || c == '-' || c == '.' {
+                result.push(c);
+                chars.next();
+            } else {
+                break; // regex metacharacter


extract_regex_literal_prefix treats unescaped . as part of the literal prefix (even though . is a regex wildcard) and treats escapes like \. as a metacharacter and stops parsing. This contradicts the doc example (^/static/.*\.css$ → /static/) and can produce incorrect prefixes (e.g. /static/.), leading to missed or incorrect shadowing detection for patterns with ./escaped literals. Consider stopping on unescaped . and treating common escapes (like \.) as literal characters when building the prefix (only stopping on escapes that introduce character classes like \d, \w, etc.).

Suggested change

chars.next();

if let Some(&next) = chars.peek() {

// \/ is a literal slash in paths

if next == '/' {

result.push(next);

chars.next();

} else {

break; // other escapes are regex metacharacters

}

}

} else if c.is_alphanumeric() || c == '/' || c == '_' || c == '-' || c == '.' {

result.push(c);

chars.next();

} else {

break; // regex metacharacter

// Handle escaped characters. Treat simple escapes that are commonly

// used in path-like prefixes as literals, and stop on others.

chars.next(); // consume '\'

if let Some(&next) = chars.peek() {

// Escaped literals we allow in the prefix (e.g. `\.`, `\/`, `\-`, `\_`, `\a`)

if next == '/'

|| next == '.'

|| next == '_'

|| next == '-'

|| next.is_alphanumeric()

{

result.push(next);

chars.next();

} else {

// Other escapes (e.g. \d, \w, etc.) are treated as regex metacharacters;

// stop building the literal prefix here.

break;

}

} else {

// Trailing backslash; nothing more to consume.

break;

}

} else if c.is_alphanumeric() || c == '/' || c == '_' || c == '-' {

// Plain literal characters allowed in a path-like prefix.

result.push(c);

chars.next();

} else if c == '.' {

// Unescaped '.' is a regex wildcard; stop the literal prefix.

break;

} else {

// Any other character is treated as a regex metacharacter boundary.

break;

…prefix Unescaped '.' is a regex wildcard (matches any character) and should stop literal prefix extraction. Conversely, escaped sequences like \. \/ \- \_ are literal characters and should be included in the prefix. Previously `^/static/.*\.css$` incorrectly produced `/static/.` instead of `/static/`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-11T09:25:09Z

plugins/builtin/best_practices/unreachable_location/src/lib.rs

+        let s = regex_pattern.trim_start_matches('^');
+        // Global pattern: doesn't start with a path, contains file extension matching
+        !s.starts_with('/') && regex_pattern.contains(r"\.")


is_global_extension_pattern currently returns true for any regex that doesn’t start with '/' and contains "\.", which is broader than “file extension pattern” (e.g., patterns like "\.well-known/" would be classified as an extension regex). This can introduce noisy ^~ shadowing warnings. Consider tightening the heuristic (e.g., require the pattern to start with "\." or ".*\." and typically end with "$" / include an extension character class/group).

Suggested change

let s = regex_pattern.trim_start_matches('^');

// Global pattern: doesn't start with a path, contains file extension matching

!s.starts_with('/') && regex_pattern.contains(r"\.")

// Strip a leading anchor for easier shape checking.

let s = regex_pattern.trim_start_matches('^');

// Must not start with a path prefix.

if s.starts_with('/') {

return false;

}

// Must start like a pure extension pattern: `\.` or `.*\.`

if !(s.starts_with(r"\.") || s.starts_with(r".*\.")) {

return false;

}

// Find the first `\.` and ensure there is no `/` after it (to avoid

// treating patterns like `\.well-known/` as global extension patterns).

if let Some(dot_idx) = s.find(r"\.") {

let after_dot = &s[dot_idx + 2..];

if after_dot.contains('/') {

return false;

}

} else {

return false;

}

// Typical extension patterns either end with `$` or include an

// extension group/character class (e.g., `\.(jpg|png)$`, `\.[a-z]+$`).

let ends_with_dollar = s.ends_with('$');

let has_ext_group_or_class = s.contains('(') || s.contains('[');

if !ends_with_dollar && !has_ext_group_or_class {

return false;

}

true

Copilot · 2026-02-11T09:25:10Z

plugins/builtin/best_practices/unreachable_location/src/lib.rs

+        // 2. Catch-all regex patterns are always shadowed by any ^~
+        if regex_pattern == ".*" || regex_pattern == "^.*" || regex_pattern == "." {
+            return true;
+        }


Catch-all regex detection only matches exactly ".", "^.", or ".". Common catch-alls like "^.$" (and variations with trailing "$"/flags) won’t be treated as catch-all here, so ^~ shadowing warnings will be missed for those patterns. Consider normalizing by trimming leading '^' and trailing '$' (and possibly surrounding parentheses) before comparing to "."/".".

Extract utility functions for clearer separation of concerns: - is_catchall_regex: detect catch-all patterns (also reused by regex_shadows) - prefix_and_regex_paths_overlap: bidirectional literal prefix overlap check - is_plain_path_char / is_escaped_path_literal: character classification helpers Also tighten is_global_extension_pattern to require extension-like shape (\. or .*\. prefix, no / after dot, $ or group/class ending) and normalize catch-all detection to handle ^.*$ variants. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace verbose doc comments with focused unit tests that document the intended behavior of each utility function: - is_catchall_regex: catch-all patterns and non-matches - extract_regex_literal_prefix: anchor stripping, escaped dots, metachar stops - prefix_and_regex_paths_overlap: bidirectional overlap and non-overlap cases - is_global_extension_pattern: valid extensions vs path patterns - is_plain_path_char / is_escaped_path_literal: character classification Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-11T09:51:34Z

plugins/builtin/best_practices/unreachable_location/src/lib.rs

+fn is_escaped_path_literal(c: char) -> bool {
+    c == '/' || c == '.' || c == '_' || c == '-' || c.is_alphanumeric()


is_escaped_path_literal currently treats any alphanumeric escape (e.g. \d, \w, \s) as a literal character. In PCRE (used by nginx), those escapes are character classes/anchors, not literals, so extract_regex_literal_prefix can incorrectly include them and produce misleading overlap/shadowing warnings. Consider limiting this to escapes that are truly literal in paths (e.g. ., /, -, _) and treating known regex escapes (dDsSwWbBAZ, etc.) as non-literal (stop extraction).

Suggested change

fn is_escaped_path_literal(c: char) -> bool {

c == '/' || c == '.' || c == '_' || c == '-' || c.is_alphanumeric()

///

/// In PCRE (used by nginx), many alphanumeric escapes like `\d`, `\w`, `\s`,

/// `\b`, `\A`, `\Z`, etc. are character classes or anchors, not literals.

/// To avoid incorrectly extending the extracted literal prefix past such

/// constructs, we treat only a small set of clearly literal path characters

/// as valid escaped literals.

fn is_escaped_path_literal(c: char) -> bool {

matches!(c, '/' | '.' | '_' | '-')

Narrow is_escaped_path_literal to only accept \. \/ \- \_ as literals. Previously \d \w \s etc. were treated as literals due to the is_alphanumeric() check, which could produce incorrect prefixes (e.g., ^/api/v\d+ would extract "/api/vd" instead of "/api/v"). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-11T10:02:20Z

plugins/builtin/best_practices/unreachable_location/src/lib.rs

+    /// 1. `^~ /` matches all URIs
+    /// 2. Catch-all regex (e.g., `.*`) is always shadowed
+    /// 3. Regex's literal prefix overlaps with `^~` path
+    /// 4. Global extension patterns (e.g., `\.(css|js)$`) are shadowed by any `^~`


Doc comment bullet 4 is a bit misleading: a global extension regex (e.g., \.(css|js)$) is not universally shadowed by any ^~ prefix—it's only skipped for requests whose URI matches that ^~ prefix. Consider rewording to clarify this is a “may shadow for URIs under the prefix” scenario to avoid implying the regex becomes entirely unreachable.

Suggested change

/// 4. Global extension patterns (e.g., `\.(css|js)$`) are shadowed by any `^~`

/// 4. Global extension patterns (e.g., `\.(css|js)$`) may be shadowed for URIs under a `^~` prefix

The ^~ prefix only prevents regex evaluation for URIs matching the prefix, not for all URIs. Update doc comment to say "may be shadowed for URIs under a ^~ prefix" instead of "are shadowed by any ^~". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

walf443 self-assigned this Feb 11, 2026

walf443 requested a review from Copilot February 11, 2026 09:10

Copilot started reviewing on behalf of walf443 February 11, 2026 09:11 View session

Copilot AI reviewed Feb 11, 2026

View reviewed changes

walf443 requested a review from Copilot February 11, 2026 09:19

Copilot started reviewing on behalf of walf443 February 11, 2026 09:20 View session

Copilot AI reviewed Feb 11, 2026

View reviewed changes

walf443 requested a review from Copilot February 11, 2026 09:46

Copilot started reviewing on behalf of walf443 February 11, 2026 09:46 View session

walf443 force-pushed the unreachable_location_more branch from e7fca06 to 453be58 Compare February 11, 2026 09:47

Copilot AI reviewed Feb 11, 2026

View reviewed changes

walf443 requested a review from Copilot February 11, 2026 09:58

Copilot started reviewing on behalf of walf443 February 11, 2026 09:58 View session

Copilot AI reviewed Feb 11, 2026

View reviewed changes

walf443 merged commit b59df45 into main Feb 11, 2026
10 checks passed

walf443 deleted the unreachable_location_more branch February 11, 2026 10:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve ^~ prefix regex shadowing detection in unreachable-location#37

feat: improve ^~ prefix regex shadowing detection in unreachable-location#37
walf443 merged 6 commits intomainfrom
unreachable_location_more

walf443 commented Feb 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-        let s = regex_pattern.trim_start_matches('^');
-        // Global pattern: doesn't start with a path, contains file extension matching
-        !s.starts_with('/') && regex_pattern.contains(r"\.")
+        // Strip a leading anchor for easier shape checking.
+        let s = regex_pattern.trim_start_matches('^');
+        // Must not start with a path prefix.
+        if s.starts_with('/') {
+            return false;
+        }
+        // Must start like a pure extension pattern: `\.` or `.*\.`
+        if !(s.starts_with(r"\.") || s.starts_with(r".*\.")) {
+            return false;
+        }
+        // Find the first `\.` and ensure there is no `/` after it (to avoid
+        // treating patterns like `\.well-known/` as global extension patterns).
+        if let Some(dot_idx) = s.find(r"\.") {
+            let after_dot = &s[dot_idx + 2..];
+            if after_dot.contains('/') {
+                return false;
+            }
+        } else {
+            return false;
+        }
+        // Typical extension patterns either end with `$` or include an
+        // extension group/character class (e.g., `\.(jpg|png)$`, `\.[a-z]+$`).
+        let ends_with_dollar = s.ends_with('$');
+        let has_ext_group_or_class = s.contains('(') || s.contains('[');
+        if !ends_with_dollar && !has_ext_group_or_class {
+            return false;
+        }
+        true

		fn is_escaped_path_literal(c: char) -> bool {
		c == '/' \|\| c == '.' \|\| c == '_' \|\| c == '-' \|\| c.is_alphanumeric()

-fn is_escaped_path_literal(c: char) -> bool {
-    c == '/' || c == '.' || c == '_' || c == '-' || c.is_alphanumeric()
+///
+/// In PCRE (used by nginx), many alphanumeric escapes like `\d`, `\w`, `\s`,
+/// `\b`, `\A`, `\Z`, etc. are character classes or anchors, not literals.
+/// To avoid incorrectly extending the extracted literal prefix past such
+/// constructs, we treat only a small set of clearly literal path characters
+/// as valid escaped literals.
+fn is_escaped_path_literal(c: char) -> bool {
+    matches!(c, '/' | '.' | '_' | '-')

	/// 4. Global extension patterns (e.g., `\.(css\|js)$`) are shadowed by any `^~`
	/// 4. Global extension patterns (e.g., `\.(css\|js)$`) may be shadowed for URIs under a `^~` prefix

Conversation

walf443 commented Feb 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants