Check for existing issues
Environment
- Reproduced on Vale v3.13.0 and v3.14.0
- Affects .html files only (walker-based linting path)
Describe the bug / provide steps to reproduce it
Description
When linting HTML files, Vale produces false-positive alerts for words that appear as substrings inside HTML attribute values — not in prose. The root cause is in initialPosition (internal/core/location.go).
Root cause
The HTML walker (walk.go) progressively replaces processed text-node content with @ characters in its context field (subInplace). This substitution can corrupt adjacent HTML attribute values when a text-node string happens to be a substring of an attribute value.
Example: a text node containing "at" gets substituted globally in context, turning:
class="status technology-preview"
into:
class="st@@us technology-preview"
The word-boundary regex then matches "us" in "st@@us" — because @ is not a \w character — and reports a false positive at the attribute's column position.
There is also a secondary issue: the strings.Index fallback path (used when the regex finds no match) has no word-boundary guard, so it can match a short pattern like "us" inside "status" in the raw HTML.
Minimal reproduction
The Microsoft We rule matches us using the pattern (?<![a-zA-Z])us(?![a-zA-Z]). Despite the negative lookahead/lookbehind, the false positive is not suppressed — because these guards apply when Vale matches the rule pattern against extracted prose text, not during position mapping.
HTML containing:
<span class="status technology-preview"></span>
Vale reports a warning for us at the column of the status attribute value, even though the <span> is empty and IgnoredClasses = status is set.
IgnoredClasses does not prevent this because it suppresses linting of text content inside matching elements. An empty element has no text content to suppress; the false positive comes from the position-mapping step, not the rule-matching step. The same false positive occurs regardless of whether the rule uses word boundaries, negative lookahead/lookbehind, or any other guard — all such guards are evaluated against the extracted prose, not the raw HTML context used by initialPosition.
Fix
Two guards in initialPosition (internal/core/location.go):
-
fsi path: skip any regex match where the character at start-1 or end+1 in the context is @. Such boundaries are artifacts of walker substitution, not real word boundaries in prose. When all candidates are skipped, fall back to guessLocation.
-
strings.Index fallback path: check whether the character before or after the match is a word character (\w). If so, the match is a substring of a longer token and the position is unreliable — fall back to guessLocation.
Check for existing issues
Environment
Describe the bug / provide steps to reproduce it
Description
When linting HTML files, Vale produces false-positive alerts for words that appear as substrings inside HTML attribute values — not in prose. The root cause is in
initialPosition(internal/core/location.go).Root cause
The HTML walker (
walk.go) progressively replaces processed text-node content with@characters in itscontextfield (subInplace). This substitution can corrupt adjacent HTML attribute values when a text-node string happens to be a substring of an attribute value.Example: a text node containing
"at"gets substituted globally in context, turning:into:
The word-boundary regex then matches
"us"in"st@@us"— because@is not a\wcharacter — and reports a false positive at the attribute's column position.There is also a secondary issue: the
strings.Indexfallback path (used when the regex finds no match) has no word-boundary guard, so it can match a short pattern like"us"inside"status"in the raw HTML.Minimal reproduction
The Microsoft
Werule matchesususing the pattern(?<![a-zA-Z])us(?![a-zA-Z]). Despite the negative lookahead/lookbehind, the false positive is not suppressed — because these guards apply when Vale matches the rule pattern against extracted prose text, not during position mapping.HTML containing:
Vale reports a warning for
usat the column of thestatusattribute value, even though the<span>is empty andIgnoredClasses = statusis set.IgnoredClassesdoes not prevent this because it suppresses linting of text content inside matching elements. An empty element has no text content to suppress; the false positive comes from the position-mapping step, not the rule-matching step. The same false positive occurs regardless of whether the rule uses word boundaries, negative lookahead/lookbehind, or any other guard — all such guards are evaluated against the extracted prose, not the raw HTML context used byinitialPosition.Fix
Two guards in
initialPosition(internal/core/location.go):fsipath: skip any regex match where the character atstart-1orend+1in the context is@. Such boundaries are artifacts of walker substitution, not real word boundaries in prose. When all candidates are skipped, fall back toguessLocation.strings.Indexfallback path: check whether the character before or after the match is a word character (\w). If so, the match is a substring of a longer token and the position is unreliable — fall back toguessLocation.