Skip to content

Conversation

WoosukKwon
Copy link
Collaborator

@WoosukKwon WoosukKwon commented Dec 17, 2023

Related to #2147

This PR temporarily forbid using CUDA graph for GPTQ models.

@WoosukKwon WoosukKwon changed the title Enforce eager mode for GPTQ models Temporarily enforce eager mode for GPTQ models Dec 17, 2023
@WoosukKwon WoosukKwon merged commit 3a765bd into main Dec 17, 2023
@WoosukKwon WoosukKwon deleted the gptq-cuda-graph branch December 17, 2023 09:51
xjpang pushed a commit to xjpang/vllm that referenced this pull request Dec 18, 2023
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
amy-why-3459 pushed a commit to amy-why-3459/vllm that referenced this pull request Sep 15, 2025
…2154)

### What this PR does / why we need it?
Updates the FusedMoE method to determine whether to use ACL Graph based
on the `torchair_graph_config`

### Does this PR introduce _any_ user-facing change?
None.

### How was this patch tested?
None needed.

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants