Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Triton GPTQ #5127

Open
double-vin opened this issue May 30, 2024 · 0 comments
Open

[Feature]: Triton GPTQ #5127

double-vin opened this issue May 30, 2024 · 0 comments

Comments

@double-vin
Copy link

馃殌 The feature, motivation and pitch

  1. Hope to support the implementation of triton gptq, similar to tgi ( https://github.com/huggingface/text-generation-inference)
  2. There are some differences between the implementation of this PR (https://github.com/huggingface/text-generation-inference/pull/1370/files ) and the implementation of gptq in vllm. It seems that the performance of gptq int4 is better in the range [4,50] for m on rocm.

Alternatives

No response

Additional context

No response

@DarkLight1337 DarkLight1337 changed the title [Feature]: [Feature]: Triton GPTQ May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant