A high-performance difference algorithm module for Go.
Difference algorithms compare two inputs and find the edits that transform one to the other. This is very useful to understand changes, for example when comparing a test result with the expected result or to understand which changes have been made to a file.
This module provides diffing for arbitrary Go slices and text.
To use this module in your Go project, run:
go get znkr.io/diff
Full documentation available at pkg.go.dev/znkr.io/diff.
Diffing two slices produces either the full list of edits
x := strings.Fields("calm seas reflect the sky")
y := strings.Fields("restless seas reflect the sky defiantly")
edits := diff.Edits(x, y)
for i, edit := range edits {
if i > 0 {
fmt.Print(" ")
}
switch edit.Op {
case diff.Match:
fmt.Printf("%s", edit.X)
case diff.Delete:
fmt.Printf("[-%s-]", edit.X)
case diff.Insert:
fmt.Printf("{+%s+}", edit.Y)
default:
panic("never reached")
}
}
// Output:
// [-calm-] {+restless+} seas reflect the sky {+defiantly+}
or a list of hunks representing consecutive edits
x := strings.Fields("calm seas reflect the sky")
y := strings.Fields("restless seas reflect the sky defiantly")
hunks := diff.Hunks(x, y, diff.Context(1))
for i, h := range hunks {
if i > 0 {
fmt.Print(" … ")
}
for i, edit := range h.Edits {
if i > 0 {
fmt.Print(" ")
}
switch edit.Op {
case diff.Match:
fmt.Printf("%s", edit.X)
case diff.Delete:
fmt.Printf("[-%s-]", edit.X)
case diff.Insert:
fmt.Printf("{+%s+}", edit.Y)
default:
panic("never reached")
}
}
}
// Output:
// [-calm-] {+restless+} seas … sky {+defiantly+}
For both functions, a ...Func
variant exists that works with arbitrary slices by taking an
equality function.
Because of its importance, comparing text line by line has special support and produces output in the unified diff format:
x := `this paragraph
is not
changed and
barely long
enough to
create a
new hunk
this paragraph
is going to be
removed
`
y := `this is a new paragraph
that is inserted at the top
this paragraph
is not
changed and
barely long
enough to
create a
new hunk
`
fmt.Print(textdiff.Unified(x, y))
// Output:
// @@ -1,3 +1,6 @@
// +this is a new paragraph
// +that is inserted at the top
// +
// this paragraph
// is not
// changed and
// @@ -5,7 +8,3 @@
// enough to
// create a
// new hunk
// -
// -this paragraph
// -is going to be
// -removed
Status: Beta - This project is in beta, pending API reviews and general feedback, both are very welcome.
As a general rule, the exact diff output will never be guaranteed to be stable: I expect that performance and quality improvements will always be possible and they will likely change the output of a diff. Therefore, committing to a stable diff result would be too limiting.
Diffs produced by this module are intended to be readable by humans.
Readable diffs have been the subject of a lot of discussions and have even resulted in some new diffing algorithms like the patience or histogram algorithms in git. However, the best work about diff readability by far is diff-slider-tools by Michael Haggerty. He implemented a heuristic that's applied in a post-processing step to improve the readability. This module implements this heuristic in the textdiff package.
For example:
x := `// ...
["foo", "bar", "baz"].map do |i|
i.upcase
end
`
y := `// ...
["foo", "bar", "baz"].map do |i|
i
end
["foo", "bar", "baz"].map do |i|
i.upcase
end
`
fmt.Println("With textdiff.IndentHeuristic:")
fmt.Print(textdiff.Unified(x, y, textdiff.IndentHeuristic()))
fmt.Println()
fmt.Println("Without textdiff.IndentHeuristic:")
fmt.Print(textdiff.Unified(x, y))
// Output:
// With textdiff.IndentHeuristic:
// @@ -1,4 +1,8 @@
// // ...
// +["foo", "bar", "baz"].map do |i|
// + i
// +end
// +
// ["foo", "bar", "baz"].map do |i|
// i.upcase
// end
//
// Without textdiff.IndentHeuristic:
// @@ -1,4 +1,8 @@
// // ...
// ["foo", "bar", "baz"].map do |i|
// + i
// +end
// +
// +["foo", "bar", "baz"].map do |i|
// i.upcase
// end
By default, the underlying diff algorithm used is Myers' algorithm augmented by a number of
heuristics to speed up the algorithm in exchange for non-minimal diffs. The diff.Optimal
option is
provided to skip these heuristics to get a minimal diff independent of the costs and diff.Fast
to
use a fast heuristic to get a non-minimal diff as fast as possible.
On an M1 Mac, the default settings almost always result in runtimes < 1 ms, but truly large diffs
(e.g. caused by changing generators for generated files) can result in runtimes of almost 100 ms.
Below is the distribution of runtimes applying textdiff.Unified
to every commit in the Go
repository (y-axis is in log scale):
Comparing the performance with other Go modules that implement the same features is always
interesting, because it can surface missed optimization opportunities. This is especially
interesting for larger inputs where superlinear growth can become a problem. Below are benchmarks of
znkr.io/diff
against other popular Go diff modules:
- znkr: Default configuration with performance optimizations enabled
- znkr-optimal: With
diff.Optimal()
option for minimal diffs - znkr-fast: With
diff.Fast()
option for fastest possible diffing - go-internal: Patience diff algorithm from
github.com/rogpeppe/go-internal
- diffmatchpatch: Implementation from
github.com/sergi/go-diff
Note: It's possible that the benchmark is using diffmatchpatch
incorrectly, the benchmark
numbers certainly look suspiciously high. However, the way it's used in the benchmark is used in
at least one large open source project.
On the benchmarks used for this comparison znkr.io/diff almost always outperforms the other implementations. However, there's one case where go-internal is significantly faster, but the resulting diff is 10% larger (see numbers below).
Test Case | znkr (baseline) | znkr-optimal | znkr-fast | go-internal | diffmatchpatch |
---|---|---|---|---|---|
large_01 | 2.605ms | 10.813ms (+315.06%) |
2.606ms (±0%) |
4.888ms (+87.60%) |
42.546ms (+1533.09%) |
large_02 | 20.611ms | 49.379ms (+139.58%) |
1.805ms (-91.24%) |
4.190ms (-79.67%) |
628.690ms (+2950.32%) |
large_03 | 3.054ms | 14.910ms (+388.18%) |
3.129ms (±0%) |
4.789ms (+56.80%) |
30.803ms (+908.56%) |
large_04 | 6.707ms | 251.801ms (+3654.45%) |
5.749ms (-14.28%) |
8.735ms (+30.24%) |
1011.684ms (+14984.61%) |
medium | 25.00µs | 24.72µs (±0%) |
27.46µs (+9.86%) |
67.55µs (+170.24%) |
247.63µs (+890.72%) |
small | 17.15µs | 16.99µs (±0%) |
16.71µs (-2.56%) |
39.48µs (+130.15%) |
72.75µs (+324.13%) |
Test Case | znkr (baseline) | znkr-optimal | znkr-fast | go-internal | diffmatchpatch |
---|---|---|---|---|---|
large_01 | 5.615k edits | 5.615k edits (±0%) |
5.615k edits (±0%) |
5.617k edits (+0.04%) |
5.616k edits (+0.02%) |
large_02 | 28.87k edits | 28.83k edits (-0.15%) |
31.80k edits (+10.15%) |
31.81k edits (+10.17%) |
28.83k edits (-0.14%) |
large_03 | 5.504k edits | 5.504k edits (±0%) |
5.504k edits (±0%) |
5.506k edits (+0.04%) |
5.505k edits (+0.02%) |
large_04 | 26.99k edits | 26.99k edits (-0.01%) |
27.80k edits (+2.99%) |
27.80k edits (+2.99%) |
60.36k edits (+123.65%) |
medium | 277 edits | 277 edits (±0%) |
277 edits (±0%) |
283 edits (+2.17%) |
278 edits (+0.36%) |
small | 108 edits | 108 edits (±0%) |
114 edits (+5.56%) |
120 edits (+11.11%) |
109 edits (+0.93%) |
I tested this diff implementation against every commit in the Go
repository using the standard unix patch
tool to ensure that all
diff results are correct.
This test is part of the test suite for this module and can be run with
go run ./internal/cmd/eval -repo <repo>
This module is distributed under the Apache License, Version 2.0, see LICENSE for more information.