Enable CUDA graph for GPTQ & SqueezeLLM #2318

WoosukKwon · 2024-01-02T01:25:33Z

Fixes #2147

Thanks @chu-tianxiang for finding the cause of the bug!

zhuohan123

LGTM! Thanks for the fix!

Enable CUDA graph for GPTQ & SqueezeLLM

a1d4a39

WoosukKwon requested a review from zhuohan123 January 2, 2024 02:20

Merge branch 'main' into quant-cuda-graph

fc2c864

zhuohan123 approved these changes Jan 3, 2024

View reviewed changes

WoosukKwon merged commit 6ef00b0 into main Jan 3, 2024
2 checks passed

WoosukKwon deleted the quant-cuda-graph branch January 3, 2024 17:52

jedibrillo pushed a commit to jedibrillo/vllm that referenced this pull request Jan 5, 2024

Enable CUDA graph for GPTQ & SqueezeLLM (vllm-project#2318)

5df6e54

chu-tianxiang mentioned this pull request Jan 19, 2024

Integrate Marlin Kernels for Int4 GPTQ inference #2497

Merged

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Enable CUDA graph for GPTQ & SqueezeLLM (vllm-project#2318)

9f3559c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable CUDA graph for GPTQ & SqueezeLLM #2318

Enable CUDA graph for GPTQ & SqueezeLLM #2318

WoosukKwon commented Jan 2, 2024

zhuohan123 left a comment

Enable CUDA graph for GPTQ & SqueezeLLM #2318

Enable CUDA graph for GPTQ & SqueezeLLM #2318

Conversation

WoosukKwon commented Jan 2, 2024

zhuohan123 left a comment

Choose a reason for hiding this comment