[Feature]: Triton GPTQ #5127

double-vin · 2024-05-30T05:41:26Z

Hope to support the implementation of triton gptq, similar to tgi ( https://github.com/huggingface/text-generation-inference)
There are some differences between the implementation of this PR (https://github.com/huggingface/text-generation-inference/pull/1370/files ) and the implementation of gptq in vllm. It seems that the performance of gptq int4 is better in the range [4,50] for m on rocm.

No response

No response

double-vin added the feature request label May 30, 2024

DarkLight1337 changed the title ~~[Feature]:~~ [Feature]: Triton GPTQ May 31, 2024

Provide feedback