Skip to content

Releases: turboderp/exllamav2

0.0.21

11 May 13:31
Compare
Choose a tag to compare
  • Support for Granite architecture
  • Support for GPT2 architecture
  • Support for banned strings in streaming generator
  • A bit more work on multimodal support (still unfinished)
  • Few bugfixes and stuff
  • Windows wheels for PyTorch 2.2.0 are included below to work around an apparent (likely temporary) issue in PyTorch. See #434 and pytorch/pytorch#125109

Full Changelog: v0.0.20...v0.0.21

0.0.20

27 Apr 00:56
Compare
Choose a tag to compare
  • Adds Phi3 support
  • Wheels compiled for PyTorch 2.3.0
  • ROCm 6.0 wheels

Full Changelog: v0.0.19...v0.0.20

0.0.19

19 Apr 06:44
ed118b4
Compare
Choose a tag to compare
  • More accurate Q4 cache using groupwise rotations
  • Better prompt ingestion speed when using flash-attn
  • Minor fixes related to issues quantizing Llama 3
  • New, more robust optimizer
  • Fix bug on long-sequence inference for GPTQ models

Full Changelog: v0.0.18...v0.0.19

0.0.18

07 Apr 18:41
dafb508
Compare
Choose a tag to compare
  • Support for Command-R-plus
  • Fix for pre-AVX2 CPUs
  • VRAM optimizations for quantization
  • Very preliminary multimodal support
  • Various other small fixes and optimizations

Full Changelog: v0.0.17...v0.0.18

0.0.17

31 Mar 03:19
Compare
Choose a tag to compare

Mostly just minor fixes and support for DBRX models.

Full Changelog: v0.0.16...v0.0.17

0.0.16

20 Mar 07:23
Compare
Choose a tag to compare
  • Adds support for Cohere models
  • N-gram decoding
  • A few bugfixes
  • Lots of optimizations

Full Changelog: v0.0.15...v0.0.16

0.0.15

07 Mar 02:26
Compare
Choose a tag to compare
  • Adds Q4 cache mode
  • Support for StarCoder2
  • Minor optimizations and a couple of bugfixes

Full Changelog: v0.0.14...v0.0.15

0.0.14

24 Feb 05:54
Compare
Choose a tag to compare

Adds support for Qwen1.5 and Gemma architectures.

Various fixes and optimizations.

Full Changelog since 0.0.13: v0.0.13...v0.0.14

0.0.13.post2

15 Feb 00:28
Compare
Choose a tag to compare

0.0.13.post1

04 Feb 23:11
Compare
Choose a tag to compare

Fixes inference on models with vocab sizes that are not multiples of 32