-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mixtral AWQ fails to work: asyncio.exceptions.CancelledError: Cancelled by cancel scope 7fd214489990 #2621
Comments
When I try to use a different model, I get other errors:
|
Same exception for https://huggingface.co/TheBloke/Nous-Hermes-2-Mixtral-8x7B-DPO-AWQ
|
On 0.2.7 release, never get any answer back even if seems to generate. |
TheBloke/Mixtral-8x7B-Instruct-v0.1-AWQ on 0.2.7 just hangs in generation. never shows any output. https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-AWQ/discussions/3 |
Encountered the same problem,i use Qwen/Qwen-7B-Chat model。It seems that vllm.entrypoints.openai.api_server.py is not compatible。 |
The error
seems related to the version of |
@pseudotensor are you still experiencing this issue? |
Not lately. Can close. |
@pseudotensor When using Qwen1.5 awq model, I still have this problem.
|
doesn't seem to be model nor quantization specific, got the same error now using last week it still worked, so not sure what the reason is. tgi-scripts-vllm-starcoder-2-7b-gptq-1 | INFO 05-28 09:47:25 llm_engine.py:100] Initializing an LLM engine (v0.4.2) with config: model='TechxGenus/starcoder2-7b-GPTQ', speculative_config=None, tokenizer='TechxGenus/starcoder2-7b-GPTQ', skip_tokenizer_init=False, tokenizer_mode=auto, revision=48dc06e6a6df8a8e8567694ead23f59204fa0d26, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=4096, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=gptq_marlin, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=TechxGenus/starcoder2-7b-GPTQ) |
FWIW, I had this
@hmellor @pseudotensor Maybe this could this be re-opened for the benefit of others, since two others have recently commented with similar issues since it was closed. Potentially related: |
The same generic error in starlette does not mean the problem is the same, especially when the original error was reported 4 months ago using a (now) old version of vLLM. The different traces shared in the thread comments are not that similar. If someone is experiencing the error they should open an issue with instructions on how to reproduce the error. |
Any where, even simple, leads to:
The text was updated successfully, but these errors were encountered: