[Usage]: I don't know how to set the maximum number of simultaneous API requests to be processed when calling an API

### Your current environment

When I execute the following script 
docker run --runtime nvidia --gpus all \
    -v /home/model-tran/models/DeepSeek-R1-Distill-Qwen-32B/:/deepseek-r1-32b \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_HUB_OFFLINE=1" \
     -p 8000:8000 \
    --ipc=host \
   -d \
    vllm/vllm-openai:latest \
    --gpu-memory-utilization 0.9 \
    --tensor-parallel-size 4 \
     --max-concurrency 20 \
    --model /deepseek-r1-32b
appear 
api_server.py: error: unrecognized arguments: --max-concurrency 20



### How would you like to use vllm

I want to set the maximum concurrent number of requests for external API calls to VLLM models when there are too many requests


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Usage]: I don't know how to set the maximum number of simultaneous API requests to be processed when calling an API #15609

Your current environment

How would you like to use vllm

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Usage]: I don't know how to set the maximum number of simultaneous API requests to be processed when calling an API #15609

Description

Your current environment

How would you like to use vllm

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions