0.0.35
- Tensor parallel mode for Qwen3.5/3.6
- New recurrent state manager avoid dynamics allocation of recurrent states (reduces fragmentation and keeps VRAM overhead constant for Qwen3.5 etc.)
- Improved checkpointing decision to make better use of available cache space
- Perform reconstruct-GEMM in slices for large layers, greatly reducing VRAM overhead
- Fix race condition causing streaming token output to lag behind generation in some situations
- Support new tensor keys in Mistral 3.5 Medium
- KL-div and perplexity kernels for eval scripts
- More tuning
- Lots of bugfixes
Full Changelog: v0.0.34...v0.0.35