Skip to content

A small library to zip and unzip as fast as possible

License

Notifications You must be signed in to change notification settings

velopack/ripzip-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ripzip

A multi-threaded zip/unzip library and CLI for Rust.

Features

  • Parallel compression -- files are compressed concurrently with rayon + flate2 (zlib-rs), then assembled into a valid ZIP archive
  • Parallel extraction -- files are decompressed concurrently from mmap'd archives with zero-copy reads
  • CRC32 on every file -- SIMD-accelerated (crc32fast), validated on every extraction
  • Atomic archive writes -- compression writes to a tempfile, fsyncs, then renames; a crash mid-write never produces a corrupt archive
  • Path traversal prevention -- rejects ../ attacks, absolute paths, and Windows drive letters before any extraction begins
  • ZIP64 support -- automatic for >65,535 entries, >4 GB files, or >4 GB offsets
  • Zstd compression -- Zstandard (method 93) as an alternative to DEFLATE, with full interop
  • Incompressible data detection -- falls back to Stored when compression would inflate the data
  • Windows long path support -- \\?\ extended-length paths for paths exceeding MAX_PATH (260 chars)
  • Adaptive memory management -- dynamically sizes the in-memory compression threshold based on available system RAM (up to 400 MB budget), so small files stay in memory while large files stream through temp files
  • Deterministic output -- archives are byte-identical across runs (entries sorted by path)

Benchmarks

ripzip (parallel, rayon + flate2/zlib-rs) vs the zip crate (single-threaded, miniz_oxide). Both at DEFLATE compression level 1. Best of 5 runs, filesystem caches warm.

CPU: Intel Core i7-14700K (20 cores / 28 threads) -- Windows 11 -- NVMe SSD

Compression

Scenario Files Data ripzip zip crate Speedup
50k small source files 50,000 14 MB 378ms (38 MB/s) 2.40s (6 MB/s) 6.3x
500 x 10 MB log files 500 5 GB 488ms (10.2 GB/s) 2.20s (2.3 GB/s) 4.5x
100 x 50 MB binary blobs 100 5 GB 214ms (23.4 GB/s) 2.29s (2.2 GB/s) 10.7x
Mixed (10k src + 1 GB assets) 10,050 1 GB 531ms (1.9 GB/s) 1.04s (967 MB/s) 2.0x

Extraction

Scenario Files Data ripzip zip crate Speedup
50k small source files 50,000 14 MB 27.47s (1 MB/s) 33.73s (0 MB/s) 1.2x
500 x 10 MB log files 500 5 GB 1.13s (4.4 GB/s) 3.68s (1.4 GB/s) 3.3x
100 x 50 MB binary blobs 100 5 GB 1.18s (4.2 GB/s) 4.45s (1.1 GB/s) 3.8x
Mixed (10k src + 1 GB assets) 10,050 1 GB 4.24s (237 MB/s) 6.20s (162 MB/s) 1.5x

Takeaway: ripzip compresses 2.0--10.7x faster and extracts 1.2--3.8x faster across all workloads. Speedup scales with individual file size -- the 5 GB binary blob corpus sees the biggest compression wins (10.7x) because all 28 threads are saturated with real DEFLATE work on large chunks. The 50k small files scenario is filesystem-metadata-bound, where parallelism still helps but the per-file overhead floor is higher.

Archive sizes are identical between the two -- same DEFLATE algorithm, same compression level.

Zstd vs Deflate (ripzip, both parallel, level 1)

Scenario Files Data Deflate Zstd Zstd speedup Deflate archive Zstd archive
50k small source files 50,000 14 MB 378ms (38 MB/s) 1.10s (13 MB/s) 0.3x 10 MB 10 MB
500 x 10 MB log files 500 5 GB 488ms (10.2 GB/s) 213ms (23.5 GB/s) 2.3x 62 MB 592 KB
100 x 50 MB binary blobs 100 5 GB 214ms (23.4 GB/s) 163ms (30.7 GB/s) 1.3x 64 MB 495 KB
Mixed (10k src + 1 GB assets) 10,050 1 GB 531ms (1.9 GB/s) 645ms (1.6 GB/s) 0.8x 36 MB 24 MB

Takeaway: Zstd achieves dramatically better compression ratios on large files (100x smaller archives for logs/blobs) while being comparable or faster for compression. On many small files, Deflate wins because Zstd's per-file initialization cost is higher. Extraction speeds are nearly identical -- both are I/O-bound at this level of parallelism.

Run benchmarks yourself

cargo bench -p ripzip

Library Usage

Add to your Cargo.toml:

[dependencies]
ripzip = { path = "ripzip" }
use std::path::Path;
use ripzip::{NoProgress, compress_directory, extract_to_directory};

// Compress a directory
use ripzip::CompressionMethod;

compress_directory(
    Path::new("my_project/"),
    Path::new("my_project.zip"),
    1,                            // compression level (1=fastest, 9=smallest)
    CompressionMethod::Deflate,   // or CompressionMethod::Zstd
    &NoProgress,                  // or implement ProgressReporter for progress bars
)?;

// Extract an archive
extract_to_directory(
    Path::new("my_project.zip"),
    Path::new("output/"),
    &NoProgress,
)?;
# Ok::<(), ripzip::RipzipError>(())

Progress Reporting

Implement the ProgressReporter trait for real-time progress updates:

use ripzip::ProgressReporter;

struct MyReporter;

impl ProgressReporter for MyReporter {
    fn start(&self, total_files: u64, total_bytes: u64) {
        println!("Processing {total_files} files ({total_bytes} bytes)");
    }

    fn progress(&self, bytes_delta: u64) {
        // Called from worker threads -- use atomics for aggregation.
        // bytes_delta is uncompressed bytes just processed.
    }

    fn finish(&self) {
        println!("Done!");
    }
}

Progress callbacks fire at chunk granularity (256 KB), so even single large files show smooth progress.

CLI

cargo install --path ripzip-cli
ripzip compress <DIR> -o <FILE> [--level 1-9] [--method deflate|zstd] [--quiet]
ripzip extract <ARCHIVE> [-o <DIR>] [--quiet]
ripzip list <ARCHIVE> [--verbose]

Aliases: c, x, l.

$ ripzip compress my_project/ -o my_project.zip --method zstd
 [00:00:00] [####################################] 142.3MB/142.3MB (1.8GB/s)
Created my_project.zip

$ ripzip extract my_project.zip -o output/
 [00:00:00] [####################################] 142.3MB/142.3MB (3.2GB/s)
Extracted to output/

$ ripzip list my_project.zip --verbose
Compressed   Original     Method   Name
------------------------------------------------------------
1234         5678         Deflate  src/main.rs
0            0            Stored   assets/

210 files, 142300000 bytes uncompressed

Safety Guarantees

  1. CRC32 on every file -- computed during compression, validated during extraction. Tampered or corrupt archives are rejected. On CRC mismatch during extraction, the corrupt output file is deleted.
  2. Atomic archive writes -- the archive is assembled into a tempfile, fsynced, then renamed. A crash or power loss mid-compression never produces a corrupt .zip file. (Extraction writes directly to destination for performance -- the archive is the source of truth and can always be re-extracted.)
  3. Path traversal prevention -- all archive paths are validated before any extraction. Paths containing .., absolute paths, and Windows drive letters are rejected.
  4. ZIP64 -- automatically used when entry counts exceed 65,535, file sizes exceed 4 GB, or offsets exceed 4 GB.
  5. fsync before rename -- data is flushed to disk before the atomic rename, ensuring durability.
  6. Incompressible data detection -- if compression produces output larger than the input, the file is stored uncompressed.

Architecture

                     COMPRESSION PIPELINE

  walkdir ──> Vec<FileEntry> ──> rayon::par_iter ──> Vec<CompressedEntry>
                                      |
                              per-file: read + CRC32 + DEFLATE/Zstd
                              (adaptive threshold: in memory or via temp file)
                                      |
                                      v
                            sequential ZIP assembly
                        (local headers + data + central dir + EOCD)
                                      |
                                  fsync + rename


                     EXTRACTION PIPELINE

  open archive ──> mmap (< 2GB) or per-thread file handles (>= 2GB)
       |
  parse EOCD ──> parse central directory ──> validate all paths
       |
  create directories (sequential)
       |
  rayon::par_iter (per file):
       zero-copy slice from mmap ──> DEFLATE/Zstd + CRC32 verify ──> write to destination

Project Structure

ripzip-rs/
  ripzip/           # Library crate
    src/
      lib.rs              # Public API
      error.rs            # RipzipError enum
      progress.rs         # ProgressReporter trait
      fs_utils.rs         # Path validation, directory walking, long path support
      compress/           # Compression pipeline
        mod.rs            # Orchestrator
        parallel.rs       # Per-file compression
        zip_writer.rs     # ZIP format assembler
      extract/            # Extraction pipeline
        mod.rs            # Orchestrator
        parallel.rs       # Per-file extraction + CRC validation
        zip_reader.rs     # EOCD + central directory parser
      zip_format/         # ZIP binary format
        mod.rs            # Constants, helpers
        local_header.rs   # Local file header
        central_dir.rs    # Central directory entry
        eocd.rs           # End of Central Directory
        zip64.rs          # ZIP64 extensions
        crc.rs            # CRC32 helpers
    tests/
      integration/        # 73 integration tests across 13 categories
    benches/
      compare.rs          # ripzip vs zip crate benchmarks
  ripzip-cli/       # CLI binary (clap + indicatif)

Testing

117 tests: 35 unit tests + 82 integration tests (3 ZIP64 stress tests are #[ignore]).

cargo test

Integration test categories: round-trip, empty files, unicode filenames, large files, deep directories, progress callbacks, error handling, CRC validation, path traversal, parallel determinism, binary data, single files, interop with the zip crate (Deflate + Zstd), ZIP64, Windows long paths.

License

MIT

About

A small library to zip and unzip as fast as possible

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages