Skip to content

feat: add intra-line word-diff highlighting#87

Merged
umputun merged 7 commits intomasterfrom
intraline-word-diff
Apr 10, 2026
Merged

feat: add intra-line word-diff highlighting#87
umputun merged 7 commits intomasterfrom
intraline-word-diff

Conversation

@umputun
Copy link
Copy Markdown
Owner

@umputun umputun commented Apr 10, 2026

Adds always-on intra-line highlighting. Within each diff hunk, paired remove/add lines run through a token-level diff and only the actually changed spans get a brighter background overlay, preserving the existing whole-line add/remove styling.

Approach

Combines ideas from two contributor PRs:

No external dependencies added. Feature is always on (no toggle, no flag, no keybinding, no status icon).

How it works

  1. recomputeIntraRanges runs once per file load after syntax highlighting
  2. Walks diffLines finding contiguous add/remove blocks
  3. pairHunkLines pairs lines via greedy best-match (prefix/suffix scoring, shared with collapsed mode's buildModifiedSet)
  4. For each pair, regex tokenizer splits into word/whitespace/punctuation tokens, LCS computes kept vs changed tokens
  5. 30% similarity gate (non-whitespace tokens) skips pairs that are too different to highlight meaningfully
  6. Results stored parallel to diffLines, injected via the existing insertHighlightMarkers during render

Theme integration

Two new optional color keys: color-word-add-bg / color-word-remove-bg. When empty, auto-derived from AddBg / RemoveBg via HSL lightness shift (0.15 toward mid). All bundled themes work out of the box with auto-derivation; explicit values can override.

Wrap mode and search interaction

  • reemitANSIState now tracks background color and reverse video state alongside fg/bold/italic, so word-diff bg survives across wrapped continuation lines
  • Search highlighting uses context-aware bg restoration (updateRestoreBg, inMatch flag) so search + word-diff combinations handle partial overlaps and cross-boundary matches correctly
  • --no-colors mode falls back to reverse-video markers

Collapsed mode

Refactored buildModifiedSet to share the new pairHunkLines method. Intra-line word-diff is not shown in collapsed view by design — collapsed mode already indicates which lines are modified via amber styling.

Credits

Thanks to @rashpile (#73) and @daulet (#74) for the original implementations and the detailed comparison analysis that shaped this approach.

Related to #73 and #74.

umputun added 6 commits April 10, 2026 01:42
Wire recomputeIntraRanges into handleFileLoaded, add
applyIntraLineHighlight method that inserts ANSI bg markers
for changed words in add/remove lines. Supports color mode
(WordAddBg/WordRemoveBg with line bg restoration) and no-color
fallback (reverse-video). Add background state tracking to
reemitANSIState so word-diff bg survives across wrapped
continuation lines. Extract buildSGRPrefix to reduce nesting.
Copilot AI review requested due to automatic review settings April 10, 2026 06:56
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds always-on intra-line “word diff” highlighting to the TUI diff view: removed/added line pairs within a hunk are token-diffed and only the changed spans get a brighter background overlay, while preserving existing whole-line add/remove styling. This integrates with theming (new optional colors), wrapping (ANSI state re-emission), and search highlighting (background restoration).

Changes:

  • Add token-level word-diff (tokenizer + LCS), line pairing, similarity gate, and per-line intra-range computation with tests.
  • Integrate intra-line background markers into rendering; extend wrap-mode ANSI state tracking (bg + reverse video) and make search highlighting restore the correct underlying background.
  • Add theme wiring + CLI/config keys for color-word-add-bg / color-word-remove-bg with auto-derivation, plus docs/site updates.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
app/ui/worddiff.go Core tokenizer/LCS + line pairing + intra-range computation for intra-line highlighting.
app/ui/worddiff_test.go Unit tests for tokenization, LCS, pairing, similarity gate, and recompute behavior.
app/ui/diffview.go Render pipeline integration; wrap ANSI state now tracks bg/reverse; search bg restoration improvements.
app/ui/diffview_test.go Integration tests for intra-line markers + wrap bg/reverse state + bg restoration helper.
app/ui/model.go Adds intraRanges storage parallel to diff lines.
app/ui/loaders.go Triggers intra-range computation on file load.
app/ui/styles.go Adds word-diff bg colors to Colors + auto-derivation in normalization.
app/ui/colorutil.go HSL-based shiftLightness helper for auto-deriving word-diff backgrounds.
app/ui/colorutil_test.go Tests for color shifting, HSL roundtrip, and auto-derivation behavior.
app/ui/collapsed.go Collapsed-mode “modified set” refactor to reuse pairHunkLines.
app/ui/collapsed_test.go Test coverage for modified-set behavior with trailing context lines.
app/ui/search_test.go Updates tests for search highlighting now restoring correct backgrounds.
app/ui/doc.go Package docs mention new word-diff + color utility components.
app/theme/theme.go Registers new optional theme keys and updates key counts.
app/theme/theme_test.go Updates theme parsing/dumping tests for the expanded key set.
app/main.go Adds CLI/config/env wiring for the new color keys and updates color mapping.
app/main_test.go Updates theme application and color collection tests for the new fields.
themes/README.md Documents new optional theme keys and validation checklist updates.
README.md Adds the feature description and documents new color flags/keys.
site/docs.html Updates docs for theme key counts and new word-diff color options.
site/index.html Updates marketing copy for color key count.
site/llms.txt Updates published feature stats (themes + key counts).
CLAUDE.md Updates architecture notes to include intra-line diff flow + ANSI bg tracking.
docs/plans/completed/20260409-intraline-word-diff.md Adds implementation plan/record for the completed work.
plugins/codex/skills/revdiff/references/config.md Updates reference docs for theme key count + new flags.
.claude-plugin/skills/revdiff/references/config.md Same reference doc update for Claude plugin path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +211 to +219
// commonPrefixLen returns the number of common prefix bytes between two strings.
func commonPrefixLen(a, b string) int {
n := min(len(a), len(b))
for i := range n {
if a[i] != b[i] {
return i
}
}
return n
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commonPrefixLen uses for i := range n, but n is an int and Go cannot range over an integer. This won’t compile; use an index loop (for i := 0; i < n; i++ { ... }) or range over a slice/string instead.

Copilot uses AI. Check for mistakes.
Comment on lines +222 to +231
// commonSuffixLen returns the number of common suffix bytes between two strings.
func commonSuffixLen(a, b string) int {
la, lb := len(a), len(b)
n := min(la, lb)
for i := range n {
if a[la-1-i] != b[lb-1-i] {
return i
}
}
return n
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commonSuffixLen uses for i := range n, but n is an int and Go cannot range over an integer. This is a compile error; switch to a standard index loop (for i := 0; i < n; i++ { ... }).

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +37
// parseHexRGB parses "#RRGGBB" into 0-255 components.
func parseHexRGB(hex string) (r, g, b uint8, ok bool) {
if len(hex) != 7 || hex[0] != '#' {
return 0, 0, 0, false
}
rv := hexVal(hex[1])<<4 | hexVal(hex[2])
gv := hexVal(hex[3])<<4 | hexVal(hex[4])
bv := hexVal(hex[5])<<4 | hexVal(hex[6])
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseHexRGB reports ok=true for any #...... input, even if it contains non-hex characters (because hexVal returns 0 on invalid chars). That contradicts the shiftLightness contract (“unparseable → unchanged”) and can silently turn invalid colors into black-ish values. Consider validating each nibble (e.g., return ok=false if any char is outside 0-9/a-f/A-F).

Suggested change
// parseHexRGB parses "#RRGGBB" into 0-255 components.
func parseHexRGB(hex string) (r, g, b uint8, ok bool) {
if len(hex) != 7 || hex[0] != '#' {
return 0, 0, 0, false
}
rv := hexVal(hex[1])<<4 | hexVal(hex[2])
gv := hexVal(hex[3])<<4 | hexVal(hex[4])
bv := hexVal(hex[5])<<4 | hexVal(hex[6])
func parseHexNibble(c byte) (uint8, bool) {
switch {
case c >= '0' && c <= '9':
return c - '0', true
case c >= 'a' && c <= 'f':
return c - 'a' + 10, true
case c >= 'A' && c <= 'F':
return c - 'A' + 10, true
default:
return 0, false
}
}
// parseHexRGB parses "#RRGGBB" into 0-255 components.
func parseHexRGB(hex string) (r, g, b uint8, ok bool) {
if len(hex) != 7 || hex[0] != '#' {
return 0, 0, 0, false
}
rh, ok := parseHexNibble(hex[1])
if !ok {
return 0, 0, 0, false
}
rl, ok := parseHexNibble(hex[2])
if !ok {
return 0, 0, 0, false
}
gh, ok := parseHexNibble(hex[3])
if !ok {
return 0, 0, 0, false
}
gl, ok := parseHexNibble(hex[4])
if !ok {
return 0, 0, 0, false
}
bh, ok := parseHexNibble(hex[5])
if !ok {
return 0, 0, 0, false
}
bl, ok := parseHexNibble(hex[6])
if !ok {
return 0, 0, 0, false
}
rv := rh<<4 | rl
gv := gh<<4 | gl
bv := bh<<4 | bl

Copilot uses AI. Check for mistakes.
Comment on lines +39 to +55
// build LCS DP table
dp := make([][]int, m+1)
for i := range dp {
dp[i] = make([]int, n+1)
}
for i := 1; i <= m; i++ {
for j := 1; j <= n; j++ {
switch {
case minusToks[i-1].text == plusToks[j-1].text:
dp[i][j] = dp[i-1][j-1] + 1
case dp[i-1][j] >= dp[i][j-1]:
dp[i][j] = dp[i-1][j]
default:
dp[i][j] = dp[i][j-1]
}
}
}
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lcsKeptTokens builds a full (m+1)×(n+1) DP table. With long lines (many tokens), this can allocate very large amounts of memory and slow down file loads, especially since word-diff is always on. Consider adding a token-count/line-length cutoff to skip intra-line diffing for very large pairs, or switching to a lower-memory approach (e.g., Hirschberg / 2-row DP with reconstruction).

Copilot uses AI. Check for mistakes.
- parseHexRGB now validates each nibble explicitly; invalid hex chars
  return ok=false instead of silently parsing as black. matches the
  shiftLightness contract ("unparseable -> unchanged").
- changedTokenRanges caps line length at 500 bytes before running LCS
  to prevent O(m*n) memory blowup on pathological input (minified
  files, very long configs). lines above the cap still render with
  whole-line add/remove highlighting, just no word-level detail.

Copilot also flagged `for i := range n` as a compile error but that's
a false positive — Go 1.22+ supports integer ranges.
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages bot commented Apr 10, 2026

Deploying revdiff with  Cloudflare Pages  Cloudflare Pages

Latest commit: f027ac4
Status: ✅  Deploy successful!
Preview URL: https://312f3564.revdiff.pages.dev
Branch Preview URL: https://intraline-word-diff.revdiff.pages.dev

View logs

@umputun umputun merged commit 8bd3973 into master Apr 10, 2026
3 checks passed
@umputun umputun deleted the intraline-word-diff branch April 10, 2026 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants