Your current environment
When I execute the following script
docker run --runtime nvidia --gpus all
-v /home/model-tran/models/DeepSeek-R1-Distill-Qwen-32B/:/deepseek-r1-32b
-v ~/.cache/huggingface:/root/.cache/huggingface
--env "HF_HUB_OFFLINE=1"
-p 8000:8000
--ipc=host
-d
vllm/vllm-openai:latest
--gpu-memory-utilization 0.9
--tensor-parallel-size 4
--max-concurrency 20
--model /deepseek-r1-32b
appear
api_server.py: error: unrecognized arguments: --max-concurrency 20
How would you like to use vllm
I want to set the maximum concurrent number of requests for external API calls to VLLM models when there are too many requests
Before submitting a new issue...