Release v3.7.0 · wpferrell/Bigsmall

v3.7.0 unlocks parallel tensor encoding on Windows. The historical hard-coded workers=1 default was overly conservative — Windows spawn-context multiprocessing works correctly and produces bit-identical output.

Speedup on Phi-3.5-mini partial shard (876 MB raw, 20 BF16 tensors)

Workers	Wall time	Speedup
1	115.19 s	1.00x
2	79.33 s	1.45x
4	63.30 s	1.82x
8	68.79 s	1.67x (past optimal)

Outputs are md5-identical across all worker counts.

Added

Default workers = min(cpu_count, 8) on all platforms (was 1 on Windows). Override via BIGSMALL_WORKERS env var still works.
encoder._safe_workers() — caps worker count by available RAM (psutil) and tensor count. Always returns ≥ 1.
Explicit mp_context = spawn on ProcessPoolExecutor for cross-platform consistency. Same fix applied to compress_delta().

Tests

5 new tests in tests/test_multiprocessing.py. 119 passed / 2 skipped (up from 114).

Compatibility

Output is deterministic across worker counts — every existing .bs file is reproducible at any workers setting.
BIGSMALL_WORKERS=1 still selects the serial (no-pool) path.

What did NOT pan out

Spec target >4x: actual 1.82x. Process-spawn + pickle overhead caps Windows-spawn at this workload. Pushing further needs Numba-warm workers or thread-pool variant.

Install: pip install bigsmall==3.7.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v3.7.0

Choose a tag to compare

Sorry, something went wrong.