[Bug]: vLLM returning 415 status code at high load #14333

chiragjn · 2025-03-06T07:07:20Z

Your current environment

The output of `python collect_env.py`

// TODO
// not able to run this because it is not an interactive environment

🐛 Describe the bug

We are running neuralmagic/Llama-3.3-70B-Instruct-quantized.w8a8 on 2 x H100 80 GB

vLLM openai image tag: v0.7.3

Docker Args

--host 0.0.0.0 --port 8000 --disable-log-requests --download-dir /data/ --tokenizer-mode auto --model neuralmagic/Llama-3.3-70B-Instruct-quantized.w8a8 --tokenizer neuralmagic/Llama-3.3-70B-Instruct-quantized.w8a8 --trust-remote-code --dtype auto --tensor-parallel-size 2 --gpu-memory-utilization 0.99 --served-model-name llm --max-model-len 20000 --enforce-eager --kv-cache-dtype fp8 --max-num-seqs 16

When running a load test (input = 16000 tokens, output = 256 tokens), as load increases at some point vLLM starts returning 415 for most of the requests

INFO 03-05 22:52:30 metrics.py:455] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 174.9 tokens/s, Running: 6 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 16.9%, CPU KV cache usage: 0.0%.
INFO 03-05 22:52:35 metrics.py:455] Avg prompt throughput: 7901.7 tokens/s, Avg generation throughput: 9.2 tokens/s, Running: 9 reqs, Swapped: 0 reqs, Pending: 7 reqs, GPU KV cache usage: 25.2%, CPU KV cache usage: 0.0%.
INFO 03-05 22:52:40 metrics.py:455] Avg prompt throughput: 8464.6 tokens/s, Avg generation throughput: 0.6 tokens/s, Running: 12 reqs, Swapped: 0 reqs, Pending: 4 reqs, GPU KV cache usage: 33.6%, CPU KV cache usage: 0.0%.
INFO 03-05 22:52:46 metrics.py:455] Avg prompt throughput: 8490.1 tokens/s, Avg generation throughput: 0.6 tokens/s, Running: 15 reqs, Swapped: 0 reqs, Pending: 1 reqs, GPU KV cache usage: 41.9%, CPU KV cache usage: 0.0%.
INFO 03-05 22:52:51 metrics.py:455] Avg prompt throughput: 2907.7 tokens/s, Avg generation throughput: 120.9 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 44.8%, CPU KV cache usage: 0.0%.
INFO:     100.64.0.26:41786 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO:     100.64.0.26:35572 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO:     100.64.0.26:58366 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO:     100.64.0.27:49902 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO:     100.64.0.26:58372 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO 03-05 22:52:56 metrics.py:455] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 174.9 tokens/s, Running: 11 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 30.8%, CPU KV cache usage: 0.0%.
INFO:     100.64.0.27:54012 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO 03-05 22:53:01 metrics.py:455] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 156.5 tokens/s, Running: 10 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 28.2%, CPU KV cache usage: 0.0%.
INFO:     100.64.0.25:51610 - "POST /v1/chat/completions HTTP/1.1" 415 Unsupported Media Type
INFO:     100.64.0.25:51610 - "POST /v1/chat/completions HTTP/1.1" 415 Unsupported Media Type
INFO:     100.64.0.25:51624 - "POST /v1/chat/completions HTTP/1.1" 415 Unsupported Media Type
INFO:     100.64.0.25:51636 - "POST /v1/chat/completions HTTP/1.1" 415 Unsupported Media Type
INFO:     100.64.0.26:42064 - "POST /v1/chat/completions HTTP/1.1" 415 Unsupported Media Type
INFO:     100.64.0.25:51624 - "POST /v1/chat/completions HTTP/1.1" 415 Unsupported Media Type
...

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

chiragjn added the bug Something isn't working label Mar 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: vLLM returning 415 status code at high load #14333

[Bug]: vLLM returning 415 status code at high load #14333

chiragjn commented Mar 6, 2025

[Bug]: vLLM returning 415 status code at high load #14333

[Bug]: vLLM returning 415 status code at high load #14333

Comments

chiragjn commented Mar 6, 2025

Your current environment

🐛 Describe the bug

Before submitting a new issue...