file-dedup

A simple program that finds duplicate files, written in Rust.

How it works

Finds groups of files with the exact same size.
Within each group it compares pairs of files¹ byte-by-byte until the first differing byte.
Prints resulting groups of identical files.

¹ No, if A == B and B == C, it doesn't compare A with C. :)

Why it works this way

For some reason if you ask people about file deduplication, they respond without thinking: "hash every file and compare hashes". But this approach has some downsides.

First, why do we even compare files that have different filesizes? Files can only be identical if they have the exact same size. So size acts as kind of a weak hash that we don't have to compute. Why not use it then?

Second, if there are big files with the same size, hashing them would mean we have to read (and process) whole files - that's a lot of work. But if the difference is in the first few bytes then we could just load a tiny part and bail early.

Unfortunately my approach has a small downside of its own: if user has many files that have same size and are identical (or they differ, but only very late in the file), then we do more reads than naive hasing would do on these files. But I don't think it happens very often.

Usage

USAGE:
    file-dedup [FLAGS] <DIR>...

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information
    -v, --verbose    Sets the level of verbosity

ARGS:
    <DIR>...    Paths to directories that will be searched for duplicate files

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
TODO.MD		TODO.MD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

Cargo.lock

Cargo.lock

Cargo.toml

Cargo.toml

README.md

README.md

TODO.MD

TODO.MD

Repository files navigation

file-dedup

How it works

Why it works this way

Usage

About

Releases

Packages

Languages

tengwar/file-dedup

Folders and files

Latest commit

History

Repository files navigation

file-dedup

How it works

Why it works this way

Usage

About

Topics

Resources

Stars

Watchers

Forks

Languages