Skip to content

0.0.35

Choose a tag to compare

@github-actions github-actions released this 23 May 23:54
· 69 commits to master since this release
c0b20f6
  • Tensor parallel mode for Qwen3.5/3.6
  • New recurrent state manager avoid dynamics allocation of recurrent states (reduces fragmentation and keeps VRAM overhead constant for Qwen3.5 etc.)
  • Improved checkpointing decision to make better use of available cache space
  • Perform reconstruct-GEMM in slices for large layers, greatly reducing VRAM overhead
  • Fix race condition causing streaming token output to lag behind generation in some situations
  • Support new tensor keys in Mistral 3.5 Medium
  • KL-div and perplexity kernels for eval scripts
  • More tuning
  • Lots of bugfixes

Full Changelog: v0.0.34...v0.0.35