Skip to content

vorner/pgz

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The parallel gzip

This is an implementation of a parallel gzip. It works by splitting the input into chunks (by default by 32MBs, but this can be configured). Each chunk is compressed independently and the results are concatenated together. Such result can be read and decompressed by the usual gzip implementation.

The motivation is to speed up transfers of large amounts of data across a fast network through ssh. The ssh throughput is limited by either its compression or encryption routines, which are single-threaded. This allows turning compression off in ssh and using multiple cores to compress the data. As the decompression is much faster, it is not necessary to use parallel decompression.

Limitations

There are certain limitations:

  • The compressed representation is slightly different than from the usual sequential gzip. Technically, the output is multiple concatenated gzips, but decompression tools commonly accept that. Furthermore, due to the independent chunks, the compression ratio is likely to be a bit worse.
  • It uses more memory, to buffer the chunks.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

About

Parallel gzip

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages