Release 0.0.35 · turboderp-org/exllamav3

Tensor parallel mode for Qwen3.5/3.6
New recurrent state manager avoid dynamics allocation of recurrent states (reduces fragmentation and keeps VRAM overhead constant for Qwen3.5 etc.)
Improved checkpointing decision to make better use of available cache space
Perform reconstruct-GEMM in slices for large layers, greatly reducing VRAM overhead
Fix race condition causing streaming token output to lag behind generation in some situations
Support new tensor keys in Mistral 3.5 Medium
KL-div and perplexity kernels for eval scripts
More tuning
Lots of bugfixes

Full Changelog: v0.0.34...v0.0.35

Provide feedback