Dedupe

Dedupe is a simple tool to deduplicate lines in text files. It processes files and directories, removing duplicate lines and optionally sorting the content.

Why

When I worked with large text files, I often found myself needing to remove duplicate lines. I tried several existing tools, but they either lacked features I needed or were too slow for my use case. So I decided to create my own tool that would be fast and with the features I wanted.

Features

Deduplication of lines in text files
Sorting of output file content
Processing of directories (even recursively)
Configurable memory usage
Concurrent processing with multiple workers
Progress indication and verbose logging

Help

Usage of dedupe:
  -input string
    	Input file or directory to process
  -max-memory uint
    	Maximum total memory usage in Megabytes (default 2048)
  -nologo
    	Disable printing the logo
  -output string
    	Output directory for deduplicated files (if not overwriting originals)
  -overwrite
    	Overwrite original files with deduplicated versions
  -recursive
    	Recursively process directories
  -sort
    	Sort output file content alphabetically (default true)
  -verbose
    	Enable verbose logging
  -workers int
    	Number of concurrent workers for processing (default <number of CPU cores>)

Next Steps

I don't plan to add many more features to this tool as it already serves my needs. The only feature I want to add is the ability to detect duplicated lines across multiple files in a directory, but this will take some time to implement properly since in my usecase I can't load all files in memory at once. Also, I want to handle file larger than available memory more gracefully, but this is a low priority for the moment.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
internals		internals
models		models
.gitignore		.gitignore
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dedupe

Why

Features

Help

Next Steps

About

Uh oh!

Releases

Packages

Languages

tomventa/dedupe

Folders and files

Latest commit

History

Repository files navigation

Dedupe

Why

Features

Help

Next Steps

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages