[Bug]: vLLM returning 415 status code at high load

### Your current environment

<details>
<summary>The output of `python collect_env.py`</summary>

// TODO
// not able to run this because it is not an interactive environment

</details>


### 🐛 Describe the bug

We are running `neuralmagic/Llama-3.3-70B-Instruct-quantized.w8a8` on `2 x H100 80 GB`


vLLM openai image tag: `v0.7.3`

Docker Args
```
--host 0.0.0.0 --port 8000 --disable-log-requests --download-dir /data/ --tokenizer-mode auto --model neuralmagic/Llama-3.3-70B-Instruct-quantized.w8a8 --tokenizer neuralmagic/Llama-3.3-70B-Instruct-quantized.w8a8 --trust-remote-code --dtype auto --tensor-parallel-size 2 --gpu-memory-utilization 0.99 --served-model-name llm --max-model-len 20000 --enforce-eager --kv-cache-dtype fp8 --max-num-seqs 16
```

When running a load test (input = 16000 tokens, output = 256 tokens), as load increases at some point vLLM starts returning 415 for most of the requests

```
INFO 03-05 22:52:30 metrics.py:455] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 174.9 tokens/s, Running: 6 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 16.9%, CPU KV cache usage: 0.0%.
INFO 03-05 22:52:35 metrics.py:455] Avg prompt throughput: 7901.7 tokens/s, Avg generation throughput: 9.2 tokens/s, Running: 9 reqs, Swapped: 0 reqs, Pending: 7 reqs, GPU KV cache usage: 25.2%, CPU KV cache usage: 0.0%.
INFO 03-05 22:52:40 metrics.py:455] Avg prompt throughput: 8464.6 tokens/s, Avg generation throughput: 0.6 tokens/s, Running: 12 reqs, Swapped: 0 reqs, Pending: 4 reqs, GPU KV cache usage: 33.6%, CPU KV cache usage: 0.0%.
INFO 03-05 22:52:46 metrics.py:455] Avg prompt throughput: 8490.1 tokens/s, Avg generation throughput: 0.6 tokens/s, Running: 15 reqs, Swapped: 0 reqs, Pending: 1 reqs, GPU KV cache usage: 41.9%, CPU KV cache usage: 0.0%.
INFO 03-05 22:52:51 metrics.py:455] Avg prompt throughput: 2907.7 tokens/s, Avg generation throughput: 120.9 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 44.8%, CPU KV cache usage: 0.0%.
INFO:     100.64.0.26:41786 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO:     100.64.0.26:35572 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO:     100.64.0.26:58366 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO:     100.64.0.27:49902 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO:     100.64.0.26:58372 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO 03-05 22:52:56 metrics.py:455] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 174.9 tokens/s, Running: 11 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 30.8%, CPU KV cache usage: 0.0%.
INFO:     100.64.0.27:54012 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO 03-05 22:53:01 metrics.py:455] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 156.5 tokens/s, Running: 10 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 28.2%, CPU KV cache usage: 0.0%.
INFO:     100.64.0.25:51610 - "POST /v1/chat/completions HTTP/1.1" 415 Unsupported Media Type
INFO:     100.64.0.25:51610 - "POST /v1/chat/completions HTTP/1.1" 415 Unsupported Media Type
INFO:     100.64.0.25:51624 - "POST /v1/chat/completions HTTP/1.1" 415 Unsupported Media Type
INFO:     100.64.0.25:51636 - "POST /v1/chat/completions HTTP/1.1" 415 Unsupported Media Type
INFO:     100.64.0.26:42064 - "POST /v1/chat/completions HTTP/1.1" 415 Unsupported Media Type
INFO:     100.64.0.25:51624 - "POST /v1/chat/completions HTTP/1.1" 415 Unsupported Media Type
...
```

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: vLLM returning 415 status code at high load #14333

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: vLLM returning 415 status code at high load #14333

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions