Skip to content

Tags: vllm-project/llm-compressor

Tags

0.4.1

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Update gemma2 examples with a note about sample generation (#1176)

SUMMARY:
- Add a note advising users to either downgrade transformers from 4.49
or use vLLM for generation
- We should revisit why this is only happening on generation with this
new release but can be revisited down the road

0.4.0

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
bump; set ct version (#1076)

SUMMARY:
"please provide a brief summary"


TEST PLAN:
"please outline how the changes were tested"

0.3.1

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
update version (#969)

* update version

* pin ct version

0.3.0

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
bump version (#907)

Signed-off-by: Dipika <dipikasikka1@gmail.com>

0.2.0

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Update MoE examples (#192)

* Update MoE examples

* Add top-level link

* Fix deepseek_moe_w8a8_int8.py

* Add deepseek_moe_w8a8_fp8.py

* Quality

* Quality

0.1.0

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Offloading Bug Fix (#58)

* fix fstring

* fix offloaded sparsity calculation