Hello, I tested DFlash with Qwen3.5-35B-A3B using vLLM, and found that most of the time the mean acceptance length is only around 2–3, regardless of whether num_spec_tokens is set to 4 or 16.
I wonder if there are multiple versions of Qwen3.5-35B-A3B, and whether the DFlash draft model only works with a specific version?
Also filed an issue in vllm: vllm-project/vllm#42505
Hello, I tested DFlash with Qwen3.5-35B-A3B using vLLM, and found that most of the time the mean acceptance length is only around 2–3, regardless of whether num_spec_tokens is set to 4 or 16.
I wonder if there are multiple versions of Qwen3.5-35B-A3B, and whether the DFlash draft model only works with a specific version?
Also filed an issue in vllm: vllm-project/vllm#42505