Skip to content

Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

License

Notifications You must be signed in to change notification settings

ModelCloud/GPTQModel

Error
Looks like something went wrong!

About

Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages