Skip to content

HTML linting: false positives from @-substituted attribute values in initialPosition #1086

@veshivas

Description

@veshivas

Check for existing issues

  • Completed

Environment

  • Reproduced on Vale v3.13.0 and v3.14.0
  • Affects .html files only (walker-based linting path)

Describe the bug / provide steps to reproduce it

Description

When linting HTML files, Vale produces false-positive alerts for words that appear as substrings inside HTML attribute values — not in prose. The root cause is in initialPosition (internal/core/location.go).

Root cause

The HTML walker (walk.go) progressively replaces processed text-node content with @ characters in its context field (subInplace). This substitution can corrupt adjacent HTML attribute values when a text-node string happens to be a substring of an attribute value.

Example: a text node containing "at" gets substituted globally in context, turning:

class="status technology-preview"

into:

class="st@@us technology-preview"

The word-boundary regex then matches "us" in "st@@us" — because @ is not a \w character — and reports a false positive at the attribute's column position.

There is also a secondary issue: the strings.Index fallback path (used when the regex finds no match) has no word-boundary guard, so it can match a short pattern like "us" inside "status" in the raw HTML.

Minimal reproduction

The Microsoft We rule matches us using the pattern (?<![a-zA-Z])us(?![a-zA-Z]). Despite the negative lookahead/lookbehind, the false positive is not suppressed — because these guards apply when Vale matches the rule pattern against extracted prose text, not during position mapping.

HTML containing:

<span class="status technology-preview"></span>

Vale reports a warning for us at the column of the status attribute value, even though the <span> is empty and IgnoredClasses = status is set.

IgnoredClasses does not prevent this because it suppresses linting of text content inside matching elements. An empty element has no text content to suppress; the false positive comes from the position-mapping step, not the rule-matching step. The same false positive occurs regardless of whether the rule uses word boundaries, negative lookahead/lookbehind, or any other guard — all such guards are evaluated against the extracted prose, not the raw HTML context used by initialPosition.

Fix

Two guards in initialPosition (internal/core/location.go):

  1. fsi path: skip any regex match where the character at start-1 or end+1 in the context is @. Such boundaries are artifacts of walker substitution, not real word boundaries in prose. When all candidates are skipped, fall back to guessLocation.

  2. strings.Index fallback path: check whether the character before or after the match is a word character (\w). If so, the match is a substring of a longer token and the position is unreliable — fall back to guessLocation.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions