Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: vLLM returning 415 status code at high load #14333

Open
1 task done
chiragjn opened this issue Mar 6, 2025 · 0 comments
Open
1 task done

[Bug]: vLLM returning 415 status code at high load #14333

chiragjn opened this issue Mar 6, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@chiragjn
Copy link
Contributor

chiragjn commented Mar 6, 2025

Your current environment

The output of `python collect_env.py`

// TODO
// not able to run this because it is not an interactive environment

🐛 Describe the bug

We are running neuralmagic/Llama-3.3-70B-Instruct-quantized.w8a8 on 2 x H100 80 GB

vLLM openai image tag: v0.7.3

Docker Args

--host 0.0.0.0 --port 8000 --disable-log-requests --download-dir /data/ --tokenizer-mode auto --model neuralmagic/Llama-3.3-70B-Instruct-quantized.w8a8 --tokenizer neuralmagic/Llama-3.3-70B-Instruct-quantized.w8a8 --trust-remote-code --dtype auto --tensor-parallel-size 2 --gpu-memory-utilization 0.99 --served-model-name llm --max-model-len 20000 --enforce-eager --kv-cache-dtype fp8 --max-num-seqs 16

When running a load test (input = 16000 tokens, output = 256 tokens), as load increases at some point vLLM starts returning 415 for most of the requests

INFO 03-05 22:52:30 metrics.py:455] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 174.9 tokens/s, Running: 6 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 16.9%, CPU KV cache usage: 0.0%.
INFO 03-05 22:52:35 metrics.py:455] Avg prompt throughput: 7901.7 tokens/s, Avg generation throughput: 9.2 tokens/s, Running: 9 reqs, Swapped: 0 reqs, Pending: 7 reqs, GPU KV cache usage: 25.2%, CPU KV cache usage: 0.0%.
INFO 03-05 22:52:40 metrics.py:455] Avg prompt throughput: 8464.6 tokens/s, Avg generation throughput: 0.6 tokens/s, Running: 12 reqs, Swapped: 0 reqs, Pending: 4 reqs, GPU KV cache usage: 33.6%, CPU KV cache usage: 0.0%.
INFO 03-05 22:52:46 metrics.py:455] Avg prompt throughput: 8490.1 tokens/s, Avg generation throughput: 0.6 tokens/s, Running: 15 reqs, Swapped: 0 reqs, Pending: 1 reqs, GPU KV cache usage: 41.9%, CPU KV cache usage: 0.0%.
INFO 03-05 22:52:51 metrics.py:455] Avg prompt throughput: 2907.7 tokens/s, Avg generation throughput: 120.9 tokens/s, Running: 16 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 44.8%, CPU KV cache usage: 0.0%.
INFO:     100.64.0.26:41786 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO:     100.64.0.26:35572 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO:     100.64.0.26:58366 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO:     100.64.0.27:49902 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO:     100.64.0.26:58372 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO 03-05 22:52:56 metrics.py:455] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 174.9 tokens/s, Running: 11 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 30.8%, CPU KV cache usage: 0.0%.
INFO:     100.64.0.27:54012 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO 03-05 22:53:01 metrics.py:455] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 156.5 tokens/s, Running: 10 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 28.2%, CPU KV cache usage: 0.0%.
INFO:     100.64.0.25:51610 - "POST /v1/chat/completions HTTP/1.1" 415 Unsupported Media Type
INFO:     100.64.0.25:51610 - "POST /v1/chat/completions HTTP/1.1" 415 Unsupported Media Type
INFO:     100.64.0.25:51624 - "POST /v1/chat/completions HTTP/1.1" 415 Unsupported Media Type
INFO:     100.64.0.25:51636 - "POST /v1/chat/completions HTTP/1.1" 415 Unsupported Media Type
INFO:     100.64.0.26:42064 - "POST /v1/chat/completions HTTP/1.1" 415 Unsupported Media Type
INFO:     100.64.0.25:51624 - "POST /v1/chat/completions HTTP/1.1" 415 Unsupported Media Type
...

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@chiragjn chiragjn added the bug Something isn't working label Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant