You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running a load test (input = 16000 tokens, output = 256 tokens), as load increases at some point vLLM starts returning 415 for most of the requests
Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
The text was updated successfully, but these errors were encountered:
Your current environment
The output of `python collect_env.py`
// TODO
// not able to run this because it is not an interactive environment
🐛 Describe the bug
We are running
neuralmagic/Llama-3.3-70B-Instruct-quantized.w8a8
on2 x H100 80 GB
vLLM openai image tag:
v0.7.3
Docker Args
When running a load test (input = 16000 tokens, output = 256 tokens), as load increases at some point vLLM starts returning 415 for most of the requests
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: