A research‑grade fractal audio compression system for WAV files, progressively upgraded with GPU acceleration, memory‑mapped domain storage, ANN‑based candidate search, and pipeline parallelism.
This README documents everything implemented so far, step by step, explaining what was built, why it exists, and how the system currently works end‑to‑end.
The goal of this project is to implement a scalable fractal audio compressor capable of:
- Handling long audio files without exhausting RAM
- Leveraging GPU acceleration where available
- Maintaining reasonable reconstruction quality while experimenting with compression ratios
- Providing a clean CLI and Python API for experimentation
This is not a lossless codec. It is a fractal / self‑similarity‑based lossy compressor.
The system consists of four major phases:
- Input processing (WAV loading, mono conversion)
- Compression (domain construction, candidate search, affine matching)
- Serialization (custom
.fwavformat) - Decompression (iterative fractal reconstruction)
Key design principles:
- Stream large data via memmap instead of RAM
- Separate fast approximate search from exact matching
- Allow CPU/GPU fallback automatically
- Keep the format self‑contained
Implemented read_wav_mono():
- Supports:
- 8‑bit PCM
- 16‑bit PCM
- 24‑bit PCM
- 32‑bit float WAV
- Automatically converts multi‑channel audio to mono
- Normalizes data into
float32
This guarantees a consistent internal representation.
Implemented write_wav():
- Writes back using the original sample width
- Correctly handles integer and float formats
- Used during decompression output
At startup:
- Attempts to import
cupy - Performs a real GPU self‑test (
cp.arange(2).sum())
Three modes are supported automatically:
| Mode | Behavior |
|---|---|
| GPU working | Full GPU acceleration |
| CuPy present but failing | CPU fallback |
| No GPU | CPU only |
This prevents silent failures and ensures reliability.
- The signal is split into non‑overlapping ranges
- Range size is derived dynamically:
range_size = max(4, tile_size // 128)
- Domains are extracted from the original signal
- Larger
tile_size, overlapping bydomain_step - Each domain is downsampled to
range_size
This preserves shape while reducing storage cost.
Classic fractal compressors load all domains into RAM, which breaks for long audio.
- Domains are written into a temporary memmap file
- Stored as:
(n_domains, range_size) float32
- Constant memory usage
- Scales to arbitrarily long audio
- Can be streamed to GPU when needed
This is one of the most important architectural improvements so far.
Each range/domain is embedded using:
- DCT‑II (orthonormal)
- DC coefficient removed
- Weighted high‑frequency emphasis
- L2 normalization
Embedding dimension:
EMBED_K = 32
Instead of brute‑forcing all domains:
- Compute cosine similarity in embedding space
- Select top‑K candidates
- Perform exact affine matching only on those
This reduces complexity dramatically.
If available:
hnswlibis used- Builds a persistent HNSW index over embeddings
Conditions:
- Enabled automatically for large domain counts
- Transparent fallback to linear search
This provides near‑constant‑time candidate selection.
For each range/domain pair, the compressor solves:
R ≈ s * D + o
Implemented features:
- Optimal least‑squares scale (
s) and offset (o) - Mirrored domain check (symmetry)
- Error measured via L2 norm
The best match is stored as:
(domain_index, s, o, symmetry_flag, error)
- CPU good at search
- GPU good at math
- Naive approaches idle one or the other
A producer–consumer pipeline:
- Compute embeddings
- Perform ANN / linear candidate search
- Push candidate lists to a shared queue
- Consumes candidate batches
- Runs vectorized affine solves
- Returns best match per range
This allows full hardware utilization.
Implemented voiced_detection():
- Frame‑based energy analysis
- Hysteresis thresholding
- Smooths detection
Used to:
- Skip silent ranges
- Avoid wasting matches on noise
- Improve compression ratio
Silent ranges are encoded using a sentinel domain index (-1).
Includes:
- Version
- Range size
- Sample rate
- Sample width
- Tile size
- Domain step
- Energy threshold
- Number of ranges
- Number of domains
- Original signal length
- All domain tiles (float32)
- All matches
- SHA‑256 checksum embedded
- Verified during load
This ensures self‑contained, robust storage.
- Reconstruction buffer initialized to zeros
For each iteration:
- Apply stored transforms
- Average overlapping contributions
- Optionally refine scale (
s) - Check convergence
Stops when:
- Convergence threshold reached, or
- Max iterations exceeded
- Scale clipping (
s_clip) - Optional scale damping
Implemented utilities:
- SNR computation
- Compression ratio logging
- Timing statistics
Batch mode writes JSON metrics automatically.
python fractal_wav_compressor_gpu_cli.py compress input.wav output.fwav --gpu
python fractal_wav_compressor_gpu_cli.py decompress input.fwav --gpu
- Directory input
- Parallel workers
- Automatic skipping of existing outputs
✔ Fully working fractal compressor ✔ Handles long audio via memmap ✔ CPU/GPU automatic execution ✔ ANN‑accelerated search ✔ Robust file format ✔ Iterative reconstruction ✔ Batch & metrics support
This is well beyond a toy implementation and forms a solid experimental platform.
- Compression ratios are not yet competitive
- Quality tuning is ongoing (range/domain balance)
- ANN index rebuilding cost is non‑trivial
- No psychoacoustic weighting yet
These are expected for a research‑grade system.
Potential next steps:
- Multi‑resolution domains
- Per‑band fractal matching
- Adaptive range sizes
- Psychoacoustic error metrics
- Domain reuse across files
Status: Active research / experimental implementation