Skip to content

feat: intra-line word-diff highlighting (experimental)#73

Closed
rashpile wants to merge 2 commits intoumputun:masterfrom
rashpile:feat/word-diff-highlighting
Closed

feat: intra-line word-diff highlighting (experimental)#73
rashpile wants to merge 2 commits intoumputun:masterfrom
rashpile:feat/word-diff-highlighting

Conversation

@rashpile
Copy link
Copy Markdown
Contributor

@rashpile rashpile commented Apr 8, 2026

Summary

Adds optional GitHub-style intra-line highlighting for paired remove/add lines. Within a hunk, paired -/+ lines run through a token-level diff and only the actually changed spans get a brighter background overlay — the existing whole-line add/remove styling is preserved.

Off by default. Toggle with W or start with --word-diff.

Why

When a line is edited rather than wholly added/removed, the user has to eyeball which tokens changed. This is especially painful in dense refactors and indentation-only changes. Word-level highlights make the actual edit pop.

What's in

  • New app/ui/worddiff.go — pairing, range computation, ANSI insertion helper, and highlightIntraLineChanges overlay
  • Pairing logic extracted from buildModifiedSet into pairHunkLines (app/ui/collapsed.go); equal runs zip 1:1, uneven runs match by 2*commonPrefix + 2*commonSuffix similarity
  • Token-level diff via github.com/sergi/go-diff/diffmatchpatch with DiffCleanupSemantic
  • 30% similarity threshold: pairs whose common spans cover < 30% of the shorter line are treated as full replacements and get no intra-line overlay (avoids noisy spurious matches like "ind", "diff" inside otherwise unrelated lines)
  • Two new theme color keys: color-word-add-bg, color-word-remove-bg (defaults #4a7a1a / #a03838); wired into all 5 bundled themes
  • New --word-diff CLI flag and W keybinding (ActionToggleWordDiff)
  • Status bar mode icon (⇄) when active
  • --no-colors fallback uses reverse-video markers, matching the search-highlight fallback
  • Works alongside wrap, search, blame, line-numbers, collapsed mode (intra-line ANSI is inserted before extendLineBg and survives ansi.Wrap automatically)
  • Unit tests for pairing, range computation, ANSI insertion, and rendering integration; coverage for the new package is ~94%

Open questions for review

I can see this helping reviewers quickly spot what changed inside a line, but I'd land it as experimental because:

  • Needs validation on a large corpus of real diffs before trusting the output
  • The pairing heuristic and similarity threshold need real-world tuning
  • Color themes might needed tuning
  • Computational overhead per file load (small but non-zero — runs diffmatchpatch.DiffMain on every paired line)
  • Is this worth shipping at all? Should it be on by default, opt-in via flag, or dropped? I don't want to add a feature plus an ongoing tuning loop and a similarity threshold to maintain if in practice nobody uses it.

@rashpile rashpile requested a review from umputun as a code owner April 8, 2026 21:48
@rashpile
Copy link
Copy Markdown
Contributor Author

rashpile commented Apr 8, 2026

Screenshots

CleanShot 2026-04-09 at 00 49 51 CleanShot 2026-04-09 at 00 49 06

@rashpile
Copy link
Copy Markdown
Contributor Author

rashpile commented Apr 8, 2026

CleanShot 2026-04-09 at 01 01 58

@rashpile
Copy link
Copy Markdown
Contributor Author

rashpile commented Apr 8, 2026

CleanShot 2026-04-09 at 01 04 17

@rashpile rashpile force-pushed the feat/word-diff-highlighting branch 3 times, most recently from f46dd33 to fbddf6d Compare April 8, 2026 22:17
@umputun
Copy link
Copy Markdown
Owner

umputun commented Apr 8, 2026

this looks promising. answering your open questions:

worth shipping? the key question is whether it inflates the codebase significantly or adds a lot of ongoing maintenance surface. from what I see the footprint is contained - one new file, reuse of insertBgMarkers for search highlights, and the pairing logic was already implicit in buildModifiedSet. so yes, worth pursuing as opt-in via W/--word-diff.

pairing heuristic and threshold the 30% similarity gate is reasonable. I looked at how others solve this:

  • git's diff-highlight (the contrib Perl script used by tig and others) is the simplest proven approach - it only pairs equal-length remove/add runs and skips highlighting entirely when counts differ. no diff algorithm at all, just common prefix + suffix matching, then an is_pair_interesting guard that skips when the entire line would be highlighted (i.e. prefix and suffix are just whitespace/color codes). your approach with diffmatchpatch is more precise but heavier.
  • @pierre/diffs (used by plannotator) offers three granularity modes: char, word, and word-alt. the word-alt mode has a nice refinement - it joins adjacent highlighted spans separated by a single space into one, reducing visual noise for multi-word edits. they also have a maxLineDiffLength guard that skips intra-line diff for very long lines regardless of similarity. that could be a useful addition on top of your percentage threshold.
  • lazygit and tig both delegate to external tools (pagers or git's --word-diff), nothing custom.

your greedy matching for unequal runs goes beyond what diff-highlight does, which is fine - just something to watch for false positives. the DiffCleanupSemantic pass should handle most of it.

color themes the hardcoded hex defaults work for bundled themes, but we need a fallback for people who already have custom colors/themes set and won't have these two new keys defined. the word-diff bg should be derived automatically from the existing add/remove bg colors (e.g. lighten the bg by a fixed amount) so it works out of the box. the explicit color-word-add-bg/color-word-remove-bg keys can still override when set.

computational overhead negligible. DiffMain runs per paired line on file load and toggle, not per render. no concern here.

let's keep iterating on this. the color derivation for custom themes is the main thing I'd want addressed before merging. thx

@rashpile
Copy link
Copy Markdown
Contributor Author

rashpile commented Apr 9, 2026

CleanShot 2026-04-09 at 10 25 28

@rashpile
Copy link
Copy Markdown
Contributor Author

rashpile commented Apr 9, 2026

@umputun @daulet Implementation in #74 looks promising — significantly less code for the same feature. We should test both on real-world diffs before deciding. If both hold up equally, the simpler implementation wins.

rashpile added 2 commits April 9, 2026 10:35
Add optional GitHub-style word-level highlighting for paired remove/add
lines. Pairs lines within a hunk, runs a token-level diff via
sergi/go-diff, and overlays a brighter background on the changed ranges
while preserving the existing whole-line add/remove styling.

- new W keybind and --word-diff CLI flag (off by default)
- two new theme color keys: color-word-add-bg, color-word-remove-bg
- 30% similarity threshold suppresses spurious highlights on lines that
  share only incidental characters
- works alongside wrap, search, blame, line-numbers, collapsed modes
- --no-colors fallback uses reverse-video markers
- status bar icon when active
Remove hardcoded default word-diff colors. When color-word-add-bg or
color-word-remove-bg is unset, derive it at runtime by shifting the
lightness of the corresponding add-bg/remove-bg by 15% (dark gets
lighter, light gets darker). This ensures word-diff works out of the
box for custom themes that don't define the two new color keys.

Bundled themes ship pre-computed values (via ShiftLightness) so
--dump-theme and --dump-config include them. Explicit overrides in
config/CLI/theme still take precedence.

- add app/ui/colorutil.go with ShiftLightness and HSL utilities
- add app/ui/colorutil_test.go with roundtrip and known-value tests
- update bundled theme gallery with derived word-diff colors
@rashpile rashpile force-pushed the feat/word-diff-highlighting branch from dcd937f to f5a2e4f Compare April 9, 2026 07:36
@rashpile
Copy link
Copy Markdown
Contributor Author

rashpile commented Apr 9, 2026

Word-Diff Implementation AI Comparison: PR #73 vs PR #74


Comparison Table

Aspect PR #73 (rashpile) PR #74 (daulet)
LOC added ~3,400 (580 core + vendor + themes + docs) ~370 (328 core + 48 integration)
Core LOC (algorithm only) 204 (worddiff.go) + 96 (colorutil.go) = 300 270 (intraline.go)
Test LOC 280 (18 + 7 + 6 test functions) 58 (3 test functions)
New files 4 (worddiff.go, worddiff_test.go, colorutil.go, colorutil_test.go) 2 (intraline.go, intraline_test.go)
Files modified ~40 (themes, vendor, docs, config) 4
External deps sergi/go-diff v1.4.0 (+1,360 LOC vendor) None (hand-rolled)
Diff algorithm diffmatchpatch (Myers) + DiffCleanupSemantic Levenshtein (pairing) + LCS (token diff)
Line pairing Greedy + prefix/suffix similarity scoring Greedy + normalized edit distance
Similarity gate 30% shared chars threshold 0.60 strict / 1.00 naive (problem: always pairs when equal count)
Tokenization Character-level (diffmatchpatch) Regex token-level ([\pL\pN_]+|\s+|...)
Toggle W key + --word-diff flag (opt-in) Always-on (no toggle)
Dedicated colors WordAddBg / WordRemoveBg + auto-derivation via HSL shift Reuses SearchBg (problem: clashes with search)
Theme support All 7 bundled themes updated, custom theme fallback None
Status bar icon indicator None
Config/CLI --word-diff, --color-word-add-bg, --color-word-remove-bg, env vars, INI None
Collapsed mode Integrated (refactored pairHunkLines) Not integrated
Docs updated README, site, help overlay README (1 line)
CI status Passing Failing (missing switch cases)
No-color fallback Reverse video Reverse video + bold

Pros & Cons

PR #73 (rashpile)

Pros:

  • Production-complete: toggle, config, themes, docs, status bar
  • Auto-derives word-diff colors from existing theme — works with custom themes
  • Extensive tests (25+ test functions, edge cases, UTF-8, ANSI)
  • DiffCleanupSemantic produces cleaner grouping of changes
  • Collapsed mode integration
  • Proper similarity gate (30%)

Cons:

  • Heavy: 1,360 LOC vendored dependency for one function call
  • 40+ files touched — large review surface
  • diffmatchpatch is character-level, not token-level (can highlight partial words)

PR #74 (daulet)

Pros:

  • Minimal footprint: 370 LOC total, 6 files, zero deps
  • Token-level diffing (regex tokenizer) — semantically cleaner highlights
  • Hand-rolled LCS is simple and transparent
  • Easy to review and maintain

Cons:

  • Reuses SearchBg — indistinguishable from search highlights
  • Naive threshold 1.0 pairs unrelated lines when counts match — entire lines get highlighted including spaces, defeating the purpose of intra-line diff
  • No toggle, no config, no theme support
  • CI failing
  • Minimal tests (3 functions)
  • No collapsed mode integration

Verdict: Best of Both

The ideal implementation combines PR #74's algorithm with PR #73's integration work:

Take from What
#74 Zero-dependency approach: hand-rolled Levenshtein + LCS. Drop sergi/go-diff vendor.
#74 Token-level regex tokenizer (semantically better than char-level diffmatchpatch)
#73 W toggle, --word-diff flag, config/INI/env support
#73 Dedicated WordAddBg/WordRemoveBg colors with HSL auto-derivation
#73 Theme integration (all 7 bundled + custom theme fallback)
#73 Status bar icon
#73 insertBgMarkers shared between word-diff and search
#73 Test coverage (expand #74's 3 tests to cover edge cases)
#73 Collapsed mode integration via pairHunkLines
#73 30% similarity gate (fix #74's naive 1.0 threshold)
#73 Docs: README, site, help overlay

This would give #74's clean algorithm in a single ~270 LOC file with no deps, wrapped in #73's production-ready UX (toggle, colors, themes, config, docs). Net result: drop the 1,360 LOC vendor dependency while keeping all the user-facing polish.

@umputun
Copy link
Copy Markdown
Owner

umputun commented Apr 10, 2026

thx for this, and especially for the detailed comparison analysis you posted — it made the path forward obvious.

merged a combined version in #87 that takes the key ideas from this PR:

  • pairHunkLines with prefix/suffix scoring, shared between word-diff and collapsed mode's buildModifiedSet
  • 30% similarity gate (non-whitespace tokens) to suppress noise on force-paired dissimilar lines
  • dedicated color-word-add-bg / color-word-remove-bg theme keys with HSL auto-derivation via shiftLightness
  • reuse of insertHighlightMarkers for both search and word-diff overlays

paired with @daulet's zero-dependency token-level algorithm from #74 (regex tokenizer + LCS) so we avoided the go-diff vendored dep.

the final version is always-on (no toggle flag, no W keybinding, no status icon) to keep the surface minimal. closing this one as its ideas shipped in #87.

@umputun umputun closed this Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants