feat: parse multi-file git diffs piped to stdin#216
Conversation
There was a problem hiding this comment.
Pull request overview
Adds support for detecting git-style unified diffs on stdin, splitting them into per-file sections, and rendering them as real diffs (with a fallback to the existing raw “scratch buffer” context view when parsing isn’t possible).
Changes:
- Read full stdin content, sniff for unified-diff signatures, and prefer a multi-file diff renderer when possible.
- Introduce
MultiFileStdinReader+ multi-file diff splitting/header parsing (splitMultiFileDiff,parseFileHeader,IsUnifiedDiff). - Add unit tests covering multi-file stdin parsing, file status handling (added/deleted/renamed), and detection helpers.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| app/stdin.go | Reads stdin into memory and selects unified-diff vs raw-text renderer. |
| app/diff/stdin.go | Adds NewStdinReaderFromString and a MultiFileStdinReader implementation for multi-file stdin diffs. |
| app/diff/multiparse.go | Implements unified-diff sniffing, multi-file section splitting, and per-section header parsing. |
| app/diff/stdin_test.go | Tests multi-file stdin rendering behavior (order, statuses, binary, rename). |
| app/diff/multiparse_test.go | Tests unified-diff detection and multi-file splitting/header parsing helpers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // read all stdin into memory for detection | ||
| content, err := io.ReadAll(stdin) | ||
| if err != nil { | ||
| return nil, nil, fmt.Errorf("read stdin: %w", err) | ||
| } | ||
|
|
||
| var renderer ui.Renderer | ||
| if diff.IsUnifiedDiff(string(content)) { | ||
| // try multi-file diff parsing; nil renderer falls through to context mode below | ||
| if multi, mErr := diff.NewMultiFileStdinReader(string(content)); mErr == nil { | ||
| renderer = multi | ||
| } | ||
| } | ||
| if renderer == nil { | ||
| // raw text, or fallback after a multi-file parse failure | ||
| renderer, err = diff.NewStdinReaderFromString(stdinName(opts.StdinName), string(content)) | ||
| if err != nil { | ||
| return nil, nil, fmt.Errorf("create stdin reader: %w", err) | ||
| } | ||
| } |
| // IsUnifiedDiff reports whether the content looks like a git unified diff. | ||
| // Detection criteria: | ||
| // - starts with "diff --git a/" | ||
| // - has "@@ -" hunk header pattern within the sniff limit | ||
| func IsUnifiedDiff(content string) bool { | ||
| if content == "" { | ||
| return false | ||
| } | ||
|
|
||
| // inspect only the leading bytes for efficiency | ||
| sample := content[:min(unifiedDiffSniffLimit, len(content))] | ||
|
|
||
| // primary: git diff header | ||
| if strings.Contains(sample, "diff --git a/") { | ||
| return true | ||
| } | ||
|
|
||
| // secondary: hunk header pattern | ||
| return strings.Contains(sample, "@@ -") | ||
| } |
umputun
left a comment
There was a problem hiding this comment.
interesting direction and the parse-then-fallback shape is right. Couple of things worth changing before this merges.
blocking
-
silent skip of failed sections at
app/diff/stdin.go:81-97. If one section out of N failsparseUnifiedDiff, it'scontinued with anfmt.Fprintf(os.Stderr, "warning: ...")and the renderer is still returned as success. The stderr warning is invisible under bubbletea's alt screen. A user piping a 20-file patch where one file's hunks malform sees 19 files in the tree with no in-TUI hint the 20th is gone. That's a review-integrity bug. Pls make any per-section parse error fail the wholeNewMultiFileStdinReadercall so the caller falls back to raw-text mode for the entire input. -
no size cap on stdin at
app/stdin.go:85.io.ReadAll(stdin)is uncapped, thenstring(content)is taken 2-3 times (lines 91, 93, 99) so the buffer is held ~3x briefly. Pre-PR was uncapped too, but only the final[]DiffLinesurvived.cat /dev/zero | revdiff --stdinis worse now. Project precedent is--description-fileusingio.LimitReaderwith a clear error on overflow. Pls cap (64-256 MB is fine) and also dos := string(content)once and reuse it. -
sniffer is too loose at
app/diff/multiparse.go:19-38. Godoc says "starts withdiff --git a/" but the code isstrings.Containsover the first 4 KB, and the@@ -secondary fires on any text containing a quoted hunk header. A markdown file or chat log mentioningdiff --git a/x b/xliterally in a code block gets reclassified and surrounding prose is dropped from the rendering. Pls anchor to line-start (^diff --git a/with newline-boundary, or strings split + prefix check on the first non-blank line) and either drop the@@ -fallback or anchor it the same way. That also makes the godoc accurate. -
docs missing. Per the project rule, behavior change to an existing flag is the same documentation obligation as a new flag. Pls add:
README.md--stdinsection (currently says "All lines are shown as context" which is no longer accurate)site/docs.htmlmatching paragraphs.claude-plugin/skills/revdiff/references/usage.md.claude-plugin/skills/revdiff/SKILL.md(when AI agents should pipegh pr diff/git format-patch -1 --stdoutfor multi-file review)plugins/codex/skills/revdiff/SKILL.mdandplugins/pi/skills/revdiff/SKILL.md(kept byte-identical with the Claude copy)docs/ARCHITECTURE.md(Renderer table near line 386, addMultiFileStdinReaderrow)
No plugin version bump on this branch though, the feature is in the binary and the version-defer rule applies.
non-blocking
-
fmt.Fprintf(os.Stderr, ...)from insideapp/diff/under a TUI doesn't fit the project'slog.Printfconvention. Once #1 above is in, the whole warning probably goes away. -
app/diff/stdin.gonow mixesStdinReaderandMultiFileStdinReaderin one file but tests are already split (stdin_test.govsmultiparse_test.go). Worth movingMultiFileStdinReaderto amultistdin.goto keep the 1:1 source-to-test convention. -
NewStdinReaderFromString(stdin.go:31) has one caller inapp/stdin.go:99that hands it a string. It can be inlined asNewStdinReaderFromReader(name, strings.NewReader(content))and the constructor dropped. Same shape forIsUnifiedDiff(one caller, can be unexportedisUnifiedDiff). -
the error from
NewMultiFileStdinReaderis dropped atapp/stdin.go:93(if multi, mErr := ...; mErr == nil). With #1 above in place this becomes load-bearing, pls don't silence it, at minimumlog.Printfit. -
nit:
splitMultiFileDiffresolves path/status viaparseFileHeader, thenparseUnifiedDiffwalks the same header text per section. Not a bug, but the package now has two recognizers fordiff --git(regex inmultiparse.go:45vs inline skip indiff.go:668). Folding the section split intoparseUnifiedDiffso it emits(path, status, []DiffLine)tuples directly would dedupe, but that's a wider refactor than this PR needs, mentioning it for awareness. -
nit:
sec.pathis attacker-controlled and getsfmt.Fprintf'd to stderr unsanitized. Once #1 is in the warning's gone, but if any other code prints diff-header paths later, route throughdiff.SanitizeCommitTextlike the rest of the project.
tests worth adding
- markdown / chat-log containing
@@ -1,5 +1,5 @@ordiff --git a/x b/xin a fenced block (locks in the tightened sniffer from #3) - fallback end-to-end: sniff returns true, parse fails on all sections, caller falls back to raw text (locks in #1's behavior)
- malformed
diff --githeaders (truncated, missingb/side) that hit the empty-path-discard branch insplitMultiFileDiff - renames where the
rename topath contains spaces - large input (would have caught #2)
tests/lint/race all green locally, no scope creep, code is idiomatic Go. Main concerns are the four blocking items above, addressing the non-blocking ones too would be appreciated.
Address PR umputun#216 review: - fail NewMultiFileStdinReader on any per-section parse error so the caller falls back to raw-text mode for the whole input instead of silently dropping files behind an invisible stderr warning - cap stdin reads at 64 MiB via io.LimitReader and convert the buffer to a string once instead of three times - anchor the unified-diff sniffer to line-start (matching "diff --git a/" at column 0) and drop the loose "@@ -" substring fallback so markdown prose mentioning the marker is not reclassified - expose ErrNotUnifiedDiff sentinel for silent fallback; log non-sentinel parse failures via log.Printf so they survive the alt-screen - split MultiFileStdinReader into its own file, drop the redundant NewStdinReaderFromString constructor, unexport the sniffer - docs updated: README, site/docs.html, usage reference, all three SKILL.md copies (with gh pr diff / git format-patch guidance), ARCHITECTURE renderer table
umputun
left a comment
There was a problem hiding this comment.
all four blocking items are addressed:
- parse-failure now fails the whole
NewMultiFileStdinReader, so the caller falls back to raw text (multistdin.go:48-53) - 64 MiB cap via
io.LimitReader(r, cap+1), enforced before the singlestring(data)(readStdinCappedinapp/stdin.go) isUnifiedDiffis line-anchored now (needs a line starting withdiff --git a/),@@ -fallback dropped, prose-mention test added- docs synced across all six surfaces
non-blocking 5/6/7/8/10 are in too.
one blocking thing, #1 isn't fully closed. The parse-failure path is fixed, but two sibling cases still produce a degraded tree instead of raw-text fallback, which is the same review-integrity hazard:
- split-layer drop (
multiparse.go:673-683):splitMultiFileDiff's flush doesif path == "" { return }, so adiff --gitboundary whose header yields no path is dropped silently. Valid section + a malformed second section means the reader succeeds with the second file missing from the tree, no fallback. - empty-content section (
multistdin.go): a section that sniffs and splits but whereparseUnifiedDiffreturns zero lines (e.g. a doc whose first line isdiff --git a/foo b/foowith no hunks) is accepted as an empty file, so the real stdin text is hidden behind an empty tree entry instead of being shown as raw.
both reduce to one fix: a section that produces nothing renderable should be treated as a parse failure so the whole input falls back to raw text.
pls add two regression tests as well - valid + malformed section falling back to raw, and a line-start diff --git with no hunk falling back to raw.
non-blocking:
ChangedFiles(multistdin.go) can emit duplicate tree rows -orderis a[]stringrecording every path, so two sections resolving to the same path show up twice. Same crafted-diff family as above, low severity. Dedupe when appending toorder.rename to(multiparse.go:141-145) bypassescleanPath, while every other branch routes through it, so a git-quotedrename to "..."keeps its quotes. Route it throughcleanPathtoo.cleanPath(multiparse.go:709) double-strips a top-level dir literally nameda/b, sob/weird.goresolves toweird.go. Rare, but the tripleTrimPrefixis the cause.stdin_test.gois new but only testsMultiFileStdinReader(which lives inmultistdin.go), whilestdin.go's type is tested elsewhere. Rename tomultistdin_test.goper the one-test-file-per-source rule.selectStdinRenderergodoc (app/stdin.go) says the WARN logs "despite the alt-screen swallowing stderr", but it runs beforep.Run()so stderr is visible. The logging is fine, the rationale in the comment is wrong, drop that clause.app/diff/stdin.gowrapsimport "io"in parens for no functional reason, looks like unrelated churn, revert it.- the 4096-byte sniff window misclassifies
git format-patchoutput when the commit body pushes the firstdiff --gitpast 4 KiB. Fine tradeoff for the documentedgit diff | revdiff --stdincase, but it's neither in the godoc nor covered by a test.
the integrity fallback is the only real gate here, the non-blocking items can ride along in the same push or a follow-up.
splitMultiFileDiff now errors when parseFileHeader yields no path, and NewMultiFileStdinReader rejects zero-line sections that carry no structural marker (rename, mode, binary, hunk). Both used to silently drop a section, hiding real stdin behind an empty tree entry. Also: dedupe duplicate paths in MultiFileStdinReader.order, route `rename to` through cleanPath, fix cleanPath double-stripping a top-level dir named a/ or b/, rename stdin_test.go to multistdin_test.go, correct selectStdinRenderer godoc rationale, revert paren-wrap on the single-import in app/diff/stdin.go, and document the 4 KiB sniff-window tradeoff for format-patch input.
umputun
left a comment
There was a problem hiding this comment.
all addressed, thx.
the blocking fallback-integrity hole is fully closed now. Empty-path sections fail at the split layer (multiparse.go flush returns an error), and hunkless-but-non-structural sections fail in NewMultiFileStdinReader via hasStructuralChange - both route the whole input back to raw text. The structural-marker carve-out keeps rename-only / mode-only / new-empty sections valid, which is the right call. Regression tests cover both the valid+malformed section and the line-start diff --git with no hunk.
the 7 non-blocking nits are all in too: order dedupe, rename to through cleanPath, single-strip cleanPath, test-file rename, corrected selectStdinRenderer godoc, import revert, and the 4 KiB sniff-window doc + test.
tests/lint/race green locally, lgtm.
Several version control systems (and assorted tooling) can emit
git-compatible unified diffs. Detecting and parsing a git-style diff on
stdin lets revdiff review the output of any such tool without a dedicated
renderer per VCS — supporting the git diff format once is cheaper to
maintain than wiring up a new VCS integration every time one appears.
Previously stdin was always treated as raw, context-only text. It is now
sniffed for a unified-diff signature, split into per-file sections, and
parsed into real diff lines, falling back to the raw-text reader when
detection or parsing fails.