[Usage]: Run local models using vLLM #5011

bibutikoley · 2024-05-23T15:57:40Z

Your current environment

I tried running the vLLM using the TheBloke/Mistral-7B-Instruct-v0.1-GGUF with the below command

python -m vllm.entrypoints.openai.api_server --model /aimodels/mistral-7b-instruct-v0.1.Q4_K_S.gguf --host 0.0.0.0 --port 5555 --tokenizer=hf-internal-testing/llama-tokenizer --trust-remote-code

But got the below error.
OSError: It looks like the config file at '/aimodels/mistral-7b-instruct-v0.1.Q4_K_S.gguf' is not a valid JSON file.

How would you like to use vllm

I would like to use the TheBloke/Mistral-7B-Instruct-v0.1-GGUF using vLLM

The text was updated successfully, but these errors were encountered:

mgoin · 2024-05-23T17:04:00Z

Hi @bibutikoley, vLLM doesn't support models in the GGUF format. Please use original precision models or one of the many quantizations we do support, such as GPTQ, AWQ, FP8, etc. Thanks!

bibutikoley added the usage How to use vllm label May 23, 2024

mgoin closed this as completed May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: Run local models using vLLM #5011

[Usage]: Run local models using vLLM #5011

bibutikoley commented May 23, 2024

mgoin commented May 23, 2024

[Usage]: Run local models using vLLM #5011

[Usage]: Run local models using vLLM #5011

Comments

bibutikoley commented May 23, 2024

Your current environment

How would you like to use vllm

mgoin commented May 23, 2024