Releases: wejoncy/QLLM
Releases · wejoncy/QLLM
v0.2.0
v0.1.9.1
What's Changed
- add assert message && ci upgrade torch 2.2.2 by @wejoncy in #124
- Update README.md by @wejoncy in #125
- fix version match erros by @wejoncy in #128
- add macro GENERAL_TORCH to get rid of OptionalCUDAGuard by @wejoncy in #129
- quick fix by @wejoncy in #130
- v0.1.9.1 by @wejoncy in #131
Full Changelog: v0.1.9...v0.1.9.1
v0.1.9
What's Changed
- Bump to 0.1.8 by @wejoncy in #109
- new autogptq config format && parallel load by @wejoncy in #110
- bugfix by @wejoncy in #111
- fix issue by @wejoncy in #113
- Fix 112 by @wejoncy in #114
- Fix typos by @emphasis10 in #115
- minor fix, attn_implementation by @wejoncy in #120
- Bump to 0.1.9 by @wejoncy in #121
- -allow-unsupported-compiler by @wejoncy in #122
New Contributors
- @emphasis10 made their first contribution in #115
Full Changelog: v0.1.8...v0.1.9
v0.1.8
v0.1.7.1
v0.1.7
What's Changed
- ort ops support in main branch with act_order by @wejoncy in #92
- support export hqq to onnx by @wejoncy in #93
- Bump to 0.1.7 by @wejoncy in #94
- improve .cpu() with non_blocking by @wejoncy in #95
- disable win in release by @wejoncy in #96
- refactor args by @wejoncy in #97
Full Changelog: v0.1.6...v0.1.7
v0.1.6
What's Changed
- illegal memory access by @wejoncy in #71
- Format by @wejoncy in #72
- Hqq support by @wejoncy in #73
- [feature]fast build support [a few seconds to build and install qllm ] by @wejoncy in #74
- minor fix for detecting ort_ops and torch.compile by @wejoncy in #75
- static_groups by @wejoncy in #76
- minor Fix (cudaguard) by @wejoncy in #77
- fix cudaguard by @wejoncy in #78
- ruff Format by @wejoncy in #79
- move ops into qllm by @wejoncy in #80
- fix memory layout in QuantizeLinear by @yufenglee in #82
- add continuous check for ort kernel by @wejoncy in #84
- Ort fix by @wejoncy in #85
- more general Ort ops export by @wejoncy in #86
- 0.1.6.dev by @wejoncy in #87
- 0.1.6 by @wejoncy in #88
- speed up ort node packing by @wejoncy in #89
- fix attn_implementation by @wejoncy in #90
New Contributors
- @yufenglee made their first contribution in #82
Full Changelog: v0.1.5...v0.1.6
v0.1.5
What's Changed
- works on windows, set dtype is importang by @wejoncy in #54
- use_heuristic=false default for models having hard to predict unquantized layers like mixtral-8x7b by @wejoncy in #55
- add mixtral in readme example by @wejoncy in #56
- bugfix when export 16bit model by @wejoncy in #57
- Fix build err, uint32_t is not defined. <stdint.h> by @wejoncy in #58
- dp kernel support g_idx by @wejoncy in #59
- [important] packing improve, faster by @wejoncy in #60
- [improve packing]fix for awq unpack by @wejoncy in #61
- 3bit support with g_idx in dq_kernel by @wejoncy in #63
- 3bit fix by @wejoncy in #64
- 0.1.5.dev by @wejoncy in #65
- onnx support Act_order && some onnx fix by @wejoncy in #66
- Support gemv with g_idx and some fix in exporter/dataloader by @wejoncy in #67
- support mixtral in gptq/awq by @wejoncy in #68
- minor fix for act_order detect by @wejoncy in #70
- Bump version to 0.1.5 by @wejoncy in #69
Full Changelog: v0.1....v0.1.5
v0.1.4
What's Changed
- suport Phi, detect multi blocks by @wejoncy in #43
- quick fix by @wejoncy in #44
- add colab example && turing support for awq && remove dependency of xbitops by @wejoncy in #46
- quick fix for meta device by @wejoncy in #47
- add trust code by @wejoncy in #48
- fix trust_code by @wejoncy in #49
- quick fix for turing awq 75 by @wejoncy in #50
- fix low_cpu_mem_usage by @wejoncy in #51
- fix model dtype ,default half by @wejoncy in #52
Full Changelog: v0.1.3...v0.1.4
v0.1.3
What's Changed
- Bump version to 0.1.3 by @wejoncy in #29
- pipeline by @wejoncy in #33
- minor fix win/special by @wejoncy in #34
- Fix pack_mode issue and add Proxy for transformers. by @wejoncy in #36
- work around autoqptq/vLLM by @wejoncy in #37
- update readme and fix some pack_mode conversion bugs by @wejoncy in #38
- minor fix and rename quat_linear folders by @wejoncy in #39
- [fix] Weight pack && tokenizer && more awq models by @wejoncy in #40
- Readme by @wejoncy in #41
- ready for pypi package by @wejoncy in #42
Full Changelog: https://github.com/wejoncy/QLLM/commits/v0.1.3