dry4go finds candidate duplicate Go code across files and directories. It reports fuzzy structural matches by filename and line range so another mechanism can evaluate and reduce duplication.
dry4go compares Go functions and methods by converting each function body and signature shape into normalized syntax nodes. The normalized tree is walked to collect a set of structural fingerprints, one for the whole function and one for each nested syntax node.
Similarity is Jaccard similarity over those fingerprint sets:
score = shared fingerprints / all fingerprints seen in either function
A score of 1.0 means the normalized structures have the same fingerprint set.
Lower scores mean the functions still share structure, but each function also
has structure the other does not. The default --threshold 0.82 reports
candidates whose normalized structures are close enough to be worth review.
Go differs from Clojure in important ways, so dry4go treats functions and methods as the comparison units and uses Go's parser/AST instead of textual forms. Identifiers, local names, selector names, and literal values normalize away. Structural Go syntax is preserved, including:
- function and method shape
- parameter and result type structure
- blocks and statement order
if,for,range,switch,select- assignments, returns, calls, selectors, indexing, slicing
- composite literals, map/array/struct/function types
- operators such as
+,==,&&, and||
For example, these functions can match strongly even though their names, local variables, predicates, and field names differ:
func Alpha(xs []int) []int {
var ys []int
for _, x := range xs {
if x%2 == 1 {
ys = append(ys, x+1)
}
}
return ys
}
func Beta(items []int) []int {
var kept []int
for _, item := range items {
if item%2 == 0 {
kept = append(kept, item+1)
}
}
return kept
}dry4go [options] [file-or-directory ...]Options:
--threshold N Minimum structural similarity score, default 0.82
--min-lines N Minimum source lines in a candidate function, default 4
--min-nodes N Minimum normalized syntax nodes, default 20
--format F text or json, default text
--json Same as --format json
--text Same as --format text
Examples:
dry4go .
dry4go internal/foo/foo.go internal/bar/bar.go
dry4go --json --threshold 0.9 ./internal ./cmdEvery file named on the command line participates in the same duplication
search. When an argument is a directory, dry4go recursively includes every
.go file under that directory in the same search set, skipping .git,
vendor, and target directories.
Default text output is intended for quick reading:
DUPLICATE score=0.89
internal/billing/invoice.go:12-25
internal/billing/receipt.go:30-44
JSON output is intended for tools:
{
"candidates": [
{
"score": 0.8909090909090909,
"left": {"file": "internal/billing/invoice.go", "start_line": 12, "end_line": 25},
"right": {"file": "internal/billing/receipt.go", "start_line": 30, "end_line": 44},
"left_nodes": 88,
"right_nodes": 91
}
]
}go test ./...
go run ./cmd/dry4go --help
go run ./cmd/dry4go --threshold 0.75 .Copyright (c) Robert C. Martin. All rights reserved.