-
Notifications
You must be signed in to change notification settings - Fork 92
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
License
ModelCloud/GPTQModel
ErrorLooks like something went wrong!
About
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Topics
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published