Skip to content

feat: parse multi-file git diffs piped to stdin#216

Merged
umputun merged 3 commits into
umputun:masterfrom
asek-ll:master
Jun 1, 2026
Merged

feat: parse multi-file git diffs piped to stdin#216
umputun merged 3 commits into
umputun:masterfrom
asek-ll:master

Conversation

@asek-ll
Copy link
Copy Markdown
Contributor

@asek-ll asek-ll commented May 27, 2026

Several version control systems (and assorted tooling) can emit
git-compatible unified diffs. Detecting and parsing a git-style diff on
stdin lets revdiff review the output of any such tool without a dedicated
renderer per VCS — supporting the git diff format once is cheaper to
maintain than wiring up a new VCS integration every time one appears.

Previously stdin was always treated as raw, context-only text. It is now
sniffed for a unified-diff signature, split into per-file sections, and
parsed into real diff lines, falling back to the raw-text reader when
detection or parsing fails.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for detecting git-style unified diffs on stdin, splitting them into per-file sections, and rendering them as real diffs (with a fallback to the existing raw “scratch buffer” context view when parsing isn’t possible).

Changes:

  • Read full stdin content, sniff for unified-diff signatures, and prefer a multi-file diff renderer when possible.
  • Introduce MultiFileStdinReader + multi-file diff splitting/header parsing (splitMultiFileDiff, parseFileHeader, IsUnifiedDiff).
  • Add unit tests covering multi-file stdin parsing, file status handling (added/deleted/renamed), and detection helpers.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
app/stdin.go Reads stdin into memory and selects unified-diff vs raw-text renderer.
app/diff/stdin.go Adds NewStdinReaderFromString and a MultiFileStdinReader implementation for multi-file stdin diffs.
app/diff/multiparse.go Implements unified-diff sniffing, multi-file section splitting, and per-section header parsing.
app/diff/stdin_test.go Tests multi-file stdin rendering behavior (order, statuses, binary, rename).
app/diff/multiparse_test.go Tests unified-diff detection and multi-file splitting/header parsing helpers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/stdin.go Outdated
Comment on lines +84 to +103
// read all stdin into memory for detection
content, err := io.ReadAll(stdin)
if err != nil {
return nil, nil, fmt.Errorf("read stdin: %w", err)
}

var renderer ui.Renderer
if diff.IsUnifiedDiff(string(content)) {
// try multi-file diff parsing; nil renderer falls through to context mode below
if multi, mErr := diff.NewMultiFileStdinReader(string(content)); mErr == nil {
renderer = multi
}
}
if renderer == nil {
// raw text, or fallback after a multi-file parse failure
renderer, err = diff.NewStdinReaderFromString(stdinName(opts.StdinName), string(content))
if err != nil {
return nil, nil, fmt.Errorf("create stdin reader: %w", err)
}
}
Comment thread app/diff/multiparse.go Outdated
Comment on lines +19 to +38
// IsUnifiedDiff reports whether the content looks like a git unified diff.
// Detection criteria:
// - starts with "diff --git a/"
// - has "@@ -" hunk header pattern within the sniff limit
func IsUnifiedDiff(content string) bool {
if content == "" {
return false
}

// inspect only the leading bytes for efficiency
sample := content[:min(unifiedDiffSniffLimit, len(content))]

// primary: git diff header
if strings.Contains(sample, "diff --git a/") {
return true
}

// secondary: hunk header pattern
return strings.Contains(sample, "@@ -")
}
Copy link
Copy Markdown
Owner

@umputun umputun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting direction and the parse-then-fallback shape is right. Couple of things worth changing before this merges.

blocking

  1. silent skip of failed sections at app/diff/stdin.go:81-97. If one section out of N fails parseUnifiedDiff, it's continued with an fmt.Fprintf(os.Stderr, "warning: ...") and the renderer is still returned as success. The stderr warning is invisible under bubbletea's alt screen. A user piping a 20-file patch where one file's hunks malform sees 19 files in the tree with no in-TUI hint the 20th is gone. That's a review-integrity bug. Pls make any per-section parse error fail the whole NewMultiFileStdinReader call so the caller falls back to raw-text mode for the entire input.

  2. no size cap on stdin at app/stdin.go:85. io.ReadAll(stdin) is uncapped, then string(content) is taken 2-3 times (lines 91, 93, 99) so the buffer is held ~3x briefly. Pre-PR was uncapped too, but only the final []DiffLine survived. cat /dev/zero | revdiff --stdin is worse now. Project precedent is --description-file using io.LimitReader with a clear error on overflow. Pls cap (64-256 MB is fine) and also do s := string(content) once and reuse it.

  3. sniffer is too loose at app/diff/multiparse.go:19-38. Godoc says "starts with diff --git a/" but the code is strings.Contains over the first 4 KB, and the @@ - secondary fires on any text containing a quoted hunk header. A markdown file or chat log mentioning diff --git a/x b/x literally in a code block gets reclassified and surrounding prose is dropped from the rendering. Pls anchor to line-start (^diff --git a/ with newline-boundary, or strings split + prefix check on the first non-blank line) and either drop the @@ - fallback or anchor it the same way. That also makes the godoc accurate.

  4. docs missing. Per the project rule, behavior change to an existing flag is the same documentation obligation as a new flag. Pls add:

    • README.md --stdin section (currently says "All lines are shown as context" which is no longer accurate)
    • site/docs.html matching paragraphs
    • .claude-plugin/skills/revdiff/references/usage.md
    • .claude-plugin/skills/revdiff/SKILL.md (when AI agents should pipe gh pr diff / git format-patch -1 --stdout for multi-file review)
    • plugins/codex/skills/revdiff/SKILL.md and plugins/pi/skills/revdiff/SKILL.md (kept byte-identical with the Claude copy)
    • docs/ARCHITECTURE.md (Renderer table near line 386, add MultiFileStdinReader row)

    No plugin version bump on this branch though, the feature is in the binary and the version-defer rule applies.

non-blocking

  1. fmt.Fprintf(os.Stderr, ...) from inside app/diff/ under a TUI doesn't fit the project's log.Printf convention. Once #1 above is in, the whole warning probably goes away.

  2. app/diff/stdin.go now mixes StdinReader and MultiFileStdinReader in one file but tests are already split (stdin_test.go vs multiparse_test.go). Worth moving MultiFileStdinReader to a multistdin.go to keep the 1:1 source-to-test convention.

  3. NewStdinReaderFromString (stdin.go:31) has one caller in app/stdin.go:99 that hands it a string. It can be inlined as NewStdinReaderFromReader(name, strings.NewReader(content)) and the constructor dropped. Same shape for IsUnifiedDiff (one caller, can be unexported isUnifiedDiff).

  4. the error from NewMultiFileStdinReader is dropped at app/stdin.go:93 (if multi, mErr := ...; mErr == nil). With #1 above in place this becomes load-bearing, pls don't silence it, at minimum log.Printf it.

  5. nit: splitMultiFileDiff resolves path/status via parseFileHeader, then parseUnifiedDiff walks the same header text per section. Not a bug, but the package now has two recognizers for diff --git (regex in multiparse.go:45 vs inline skip in diff.go:668). Folding the section split into parseUnifiedDiff so it emits (path, status, []DiffLine) tuples directly would dedupe, but that's a wider refactor than this PR needs, mentioning it for awareness.

  6. nit: sec.path is attacker-controlled and gets fmt.Fprintf'd to stderr unsanitized. Once #1 is in the warning's gone, but if any other code prints diff-header paths later, route through diff.SanitizeCommitText like the rest of the project.

tests worth adding

  • markdown / chat-log containing @@ -1,5 +1,5 @@ or diff --git a/x b/x in a fenced block (locks in the tightened sniffer from #3)
  • fallback end-to-end: sniff returns true, parse fails on all sections, caller falls back to raw text (locks in #1's behavior)
  • malformed diff --git headers (truncated, missing b/ side) that hit the empty-path-discard branch in splitMultiFileDiff
  • renames where the rename to path contains spaces
  • large input (would have caught #2)

tests/lint/race all green locally, no scope creep, code is idiomatic Go. Main concerns are the four blocking items above, addressing the non-blocking ones too would be appreciated.

Address PR umputun#216 review:
- fail NewMultiFileStdinReader on any per-section parse error so the
  caller falls back to raw-text mode for the whole input instead of
  silently dropping files behind an invisible stderr warning
- cap stdin reads at 64 MiB via io.LimitReader and convert the buffer
  to a string once instead of three times
- anchor the unified-diff sniffer to line-start (matching "diff --git
  a/" at column 0) and drop the loose "@@ -" substring fallback so
  markdown prose mentioning the marker is not reclassified
- expose ErrNotUnifiedDiff sentinel for silent fallback; log non-sentinel
  parse failures via log.Printf so they survive the alt-screen
- split MultiFileStdinReader into its own file, drop the redundant
  NewStdinReaderFromString constructor, unexport the sniffer
- docs updated: README, site/docs.html, usage reference, all three
  SKILL.md copies (with gh pr diff / git format-patch guidance),
  ARCHITECTURE renderer table
Copy link
Copy Markdown
Owner

@umputun umputun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all four blocking items are addressed:

  1. parse-failure now fails the whole NewMultiFileStdinReader, so the caller falls back to raw text (multistdin.go:48-53)
  2. 64 MiB cap via io.LimitReader(r, cap+1), enforced before the single string(data) (readStdinCapped in app/stdin.go)
  3. isUnifiedDiff is line-anchored now (needs a line starting with diff --git a/), @@ - fallback dropped, prose-mention test added
  4. docs synced across all six surfaces

non-blocking 5/6/7/8/10 are in too.

one blocking thing, #1 isn't fully closed. The parse-failure path is fixed, but two sibling cases still produce a degraded tree instead of raw-text fallback, which is the same review-integrity hazard:

  1. split-layer drop (multiparse.go:673-683): splitMultiFileDiff's flush does if path == "" { return }, so a diff --git boundary whose header yields no path is dropped silently. Valid section + a malformed second section means the reader succeeds with the second file missing from the tree, no fallback.
  2. empty-content section (multistdin.go): a section that sniffs and splits but where parseUnifiedDiff returns zero lines (e.g. a doc whose first line is diff --git a/foo b/foo with no hunks) is accepted as an empty file, so the real stdin text is hidden behind an empty tree entry instead of being shown as raw.

both reduce to one fix: a section that produces nothing renderable should be treated as a parse failure so the whole input falls back to raw text.

pls add two regression tests as well - valid + malformed section falling back to raw, and a line-start diff --git with no hunk falling back to raw.

non-blocking:

  • ChangedFiles (multistdin.go) can emit duplicate tree rows - order is a []string recording every path, so two sections resolving to the same path show up twice. Same crafted-diff family as above, low severity. Dedupe when appending to order.
  • rename to (multiparse.go:141-145) bypasses cleanPath, while every other branch routes through it, so a git-quoted rename to "..." keeps its quotes. Route it through cleanPath too.
  • cleanPath (multiparse.go:709) double-strips a top-level dir literally named a/b, so b/weird.go resolves to weird.go. Rare, but the triple TrimPrefix is the cause.
  • stdin_test.go is new but only tests MultiFileStdinReader (which lives in multistdin.go), while stdin.go's type is tested elsewhere. Rename to multistdin_test.go per the one-test-file-per-source rule.
  • selectStdinRenderer godoc (app/stdin.go) says the WARN logs "despite the alt-screen swallowing stderr", but it runs before p.Run() so stderr is visible. The logging is fine, the rationale in the comment is wrong, drop that clause.
  • app/diff/stdin.go wraps import "io" in parens for no functional reason, looks like unrelated churn, revert it.
  • the 4096-byte sniff window misclassifies git format-patch output when the commit body pushes the first diff --git past 4 KiB. Fine tradeoff for the documented git diff | revdiff --stdin case, but it's neither in the godoc nor covered by a test.

the integrity fallback is the only real gate here, the non-blocking items can ride along in the same push or a follow-up.

splitMultiFileDiff now errors when parseFileHeader yields no path, and
NewMultiFileStdinReader rejects zero-line sections that carry no
structural marker (rename, mode, binary, hunk). Both used to silently
drop a section, hiding real stdin behind an empty tree entry.

Also: dedupe duplicate paths in MultiFileStdinReader.order, route
`rename to` through cleanPath, fix cleanPath double-stripping a
top-level dir named a/ or b/, rename stdin_test.go to multistdin_test.go,
correct selectStdinRenderer godoc rationale, revert paren-wrap on the
single-import in app/diff/stdin.go, and document the 4 KiB sniff-window
tradeoff for format-patch input.
Copy link
Copy Markdown
Owner

@umputun umputun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all addressed, thx.

the blocking fallback-integrity hole is fully closed now. Empty-path sections fail at the split layer (multiparse.go flush returns an error), and hunkless-but-non-structural sections fail in NewMultiFileStdinReader via hasStructuralChange - both route the whole input back to raw text. The structural-marker carve-out keeps rename-only / mode-only / new-empty sections valid, which is the right call. Regression tests cover both the valid+malformed section and the line-start diff --git with no hunk.

the 7 non-blocking nits are all in too: order dedupe, rename to through cleanPath, single-strip cleanPath, test-file rename, corrected selectStdinRenderer godoc, import revert, and the 4 KiB sniff-window doc + test.

tests/lint/race green locally, lgtm.

@umputun umputun merged commit a331097 into umputun:master Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants