Skip to content

v3.14.0

Latest

Choose a tag to compare

@wpferrell wpferrell released this 20 May 13:53
· 33 commits to main since this release

Triton KV cache kernels (V9B), progressive HTTP download (V10), reshard tool (V11).

  • V9B: Fused Triton pack/unpack kernels for GPUCompressedKVCache (2.94x faster), Triton rANS encode kernel
  • V10: stream_from_hub() — decompress directly from HuggingFace CDN, zero .bs bytes written to disk
  • V11: bigsmall reshard CLI — split, join, or rebalance .bs shards by layer boundary

See CHANGELOG.md for full details.