histzip

Hey! Thanks for your interest, but consider using something like brotli or zstd instead. Those modern compressors also compress quickly with long history windows (the problem histzip was written to solve) but are much better supported; you should be able to achieve better speed and compression ratios with them as well.

Compress wiki change histories, etc. Made for any input with long duplicated sections (100+ bytes) up to a few MB apart.

Download binaries for Linux amd64, Linux x86, Windows 64-bit and 32-bit, and Mac 64-bit. For faster compression try the gccgo Linux amd64 build.

Compress by piping text through histzip and bzip2 or similar:

./histzip < revisions.xml | bzip2 > revisions.xml.hbz

Turn that around to decompress:

bunzip2 < revisions.xml.hbz | ./histzip > revisions.xml

Running on dumps of English Wikipedia's history, that pipeline ran at 51 MB/s for the newest chunk and 151 MB/s for the oldest. Compression ratios were comparable to 7zip's: 8% worse for the new chunk and 10% better for the old chunk.

While compressing, histzip decompresses its output and compares checksums as a self-check. There are write-ups of the framing format and the format for compressed data. You can use the same compression engine in other programs via the histzip/lrcompress library.

If you're interested in long-range compression, some other projects might interest you. rzip is awesome; histzip lifts some implementation tricks directly from it. bm is a Bentley-McIlroy library by CloudFlare also written in Go, compressing matches against a fixed dictionary (in essence, performing a binary diff). Git, xdelta, and open-vcdiff also each have open-source binary diff implementations. Google's Brotli compressor is a tuned flate variant defaulting to a 4MB window, first used to compress WOFF 2.0 Web fonts.

If you have any trouble or questions, get in touch!

Public domain, Randall Farmer, 2013-4.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
lrcompress		lrcompress
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
format.md		format.md
histzip.go		histzip.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

histzip

About

Releases

Packages

Languages

License

twotwotwo/histzip

Folders and files

Latest commit

History

Repository files navigation

histzip

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages