Releases · wejoncy/QLLM

illegal memory access by @wejoncy in #71
Format by @wejoncy in #72
Hqq support by @wejoncy in #73
[feature]fast build support [a few seconds to build and install qllm ] by @wejoncy in #74
minor fix for detecting ort_ops and torch.compile by @wejoncy in #75
static_groups by @wejoncy in #76
minor Fix (cudaguard) by @wejoncy in #77
fix cudaguard by @wejoncy in #78
ruff Format by @wejoncy in #79
move ops into qllm by @wejoncy in #80
fix memory layout in QuantizeLinear by @yufenglee in #82
add continuous check for ort kernel by @wejoncy in #84
Ort fix by @wejoncy in #85
more general Ort ops export by @wejoncy in #86
0.1.6.dev by @wejoncy in #87
0.1.6 by @wejoncy in #88
speed up ort node packing by @wejoncy in #89
fix attn_implementation by @wejoncy in #90

New Contributors

@yufenglee made their first contribution in #82

Full Changelog: v0.1.5...v0.1.6

Contributors

wejoncy and yufenglee

Assets 18

03 Jan 03:22

github-actions

v0.1.5

9beef0b

v0.1.5

What's Changed

works on windows, set dtype is importang by @wejoncy in #54
use_heuristic=false default for models having hard to predict unquantized layers like mixtral-8x7b by @wejoncy in #55
add mixtral in readme example by @wejoncy in #56
bugfix when export 16bit model by @wejoncy in #57
Fix build err, uint32_t is not defined. <stdint.h> by @wejoncy in #58
dp kernel support g_idx by @wejoncy in #59
[important] packing improve, faster by @wejoncy in #60
[improve packing]fix for awq unpack by @wejoncy in #61
3bit support with g_idx in dq_kernel by @wejoncy in #63
3bit fix by @wejoncy in #64
0.1.5.dev by @wejoncy in #65
onnx support Act_order && some onnx fix by @wejoncy in #66
Support gemv with g_idx and some fix in exporter/dataloader by @wejoncy in #67
support mixtral in gptq/awq by @wejoncy in #68
minor fix for act_order detect by @wejoncy in #70
Bump version to 0.1.5 by @wejoncy in #69

Full Changelog: v0.1....v0.1.5

Contributors

wejoncy

Assets 18

19 Dec 02:32

github-actions

v0.1.4

98fea3e

v0.1.4

What's Changed

suport Phi, detect multi blocks by @wejoncy in #43
quick fix by @wejoncy in #44
add colab example && turing support for awq && remove dependency of xbitops by @wejoncy in #46
quick fix for meta device by @wejoncy in #47
add trust code by @wejoncy in #48
fix trust_code by @wejoncy in #49
quick fix for turing awq 75 by @wejoncy in #50
fix low_cpu_mem_usage by @wejoncy in #51
fix model dtype ,default half by @wejoncy in #52

Full Changelog: v0.1.3...v0.1.4

Contributors

wejoncy

Assets 18

14 Dec 11:41

github-actions

v0.1.3

c0be5ed

v0.1.3

What's Changed

Bump version to 0.1.3 by @wejoncy in #29
pipeline by @wejoncy in #33
minor fix win/special by @wejoncy in #34
Fix pack_mode issue and add Proxy for transformers. by @wejoncy in #36
work around autoqptq/vLLM by @wejoncy in #37
update readme and fix some pack_mode conversion bugs by @wejoncy in #38
minor fix and rename quat_linear folders by @wejoncy in #39
[fix] Weight pack && tokenizer && more awq models by @wejoncy in #40
Readme by @wejoncy in #41
ready for pypi package by @wejoncy in #42

Full Changelog: https://github.com/wejoncy/QLLM/commits/v0.1.3

Contributors

wejoncy

Assets 18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

Releases: wejoncy/QLLM

v0.2.0

What's Changed

Contributors

v0.1.9.1

What's Changed

Contributors

v0.1.9

What's Changed

New Contributors

Contributors

v0.1.8

What's Changed

Contributors

v0.1.7.1

What's Changed

Contributors

v0.1.7

What's Changed

Contributors

v0.1.6

What's Changed

New Contributors

Contributors

v0.1.5

What's Changed

Contributors

v0.1.4

What's Changed

Contributors

v0.1.3

What's Changed

Contributors