Skip to content

v3.1.0

Choose a tag to compare

@wpferrell wpferrell released this 18 May 17:43
· 58 commits to main since this release

v3.1.0 ships the V4 Session B codec infrastructure: two new lossless candidate codecs (fp2_residual_v1, cross_layer_delta group API) are implemented, registered, and gated behind the auto_select_codec safety net.

Both codecs lose to bf16_se_ac on real transformer attention/MLP tensors — the V4 Session A entropy bound was based on a lossy BF16-rounded FP32-subtraction proxy that cannot be realised under a strict-lossless contract. The codecs are kept in the registry because (a) they are correctly lossless and tested as such, and (b) they provide infrastructure hooks for future V4 quantize-plus-residual / cross-layer work.

Added

  • fp2_residual_v1 codec — FP2 + lossless BF16 residual + XOR correction stream
  • cross_layer_delta group + pair APIs — pure-byte XOR transform with delta_from extras key
  • Container v2 stamping when either new codec is selected
  • enable_fp2_residual opt-out flag on auto_select_codec

Tests

  • 11 new tests, 74 passed / 2 skipped total

Empirical findings

  • FP2+residual averages 90.249% of raw vs 65.707% for bf16_se_ac on Phi-3.5-mini shard 1 (loses 86/86 tensors). Safety net keeps file size at v3.0.0 baseline exactly.
  • Cross-layer XOR delta wins ~1-1.4% on tiny norm-layer groups, loses on MLP/attention. Aggregate impact <0.0001%.

Backwards compatibility

  • Files written by 3.0.0 read identically by 3.1.0
  • Files using fp2_residual_v1 require bigsmall >= 3.1.0 to decode

Install: `pip install bigsmall==3.1.0`