Hey! Thanks for your interest, but consider using something like brotli or zstd instead. Those modern compressors also compress quickly with long history windows (the problem histzip was written to solve) but are much better supported; you should be able to achieve better speed and compression ratios with them as well.
Compress wiki change histories, etc. Made for any input with long duplicated sections (100+ bytes) up to a few MB apart.
Compress by piping text through histzip and bzip2 or similar:
./histzip < revisions.xml | bzip2 > revisions.xml.hbz
Turn that around to decompress:
bunzip2 < revisions.xml.hbz | ./histzip > revisions.xml
Running on dumps of English Wikipedia's history, that pipeline ran at 51 MB/s for the newest chunk and 151 MB/s for the oldest. Compression ratios were comparable to 7zip's: 8% worse for the new chunk and 10% better for the old chunk.
While compressing, histzip decompresses its output and compares checksums as a self-check. There are write-ups of the framing format and the format for compressed data. You can use the same compression engine in other programs via the histzip/lrcompress library.
If you're interested in long-range compression, some other projects might interest you. rzip is awesome; histzip lifts some implementation tricks directly from it. bm is a Bentley-McIlroy library by CloudFlare also written in Go, compressing matches against a fixed dictionary (in essence, performing a binary diff). Git, xdelta, and open-vcdiff also each have open-source binary diff implementations. Google's Brotli compressor is a tuned flate variant defaulting to a 4MB window, first used to compress WOFF 2.0 Web fonts.
If you have any trouble or questions, get in touch!
Public domain, Randall Farmer, 2013-4.