[Usage]: Logprobs Scaling with O(n) Complexity – Unexpected Performance Degradation #14300
Open
1 task done
Labels
usage
How to use vllm
Title: Logprobs Scaling with O(n) Complexity – Unexpected Performance Degradation
Description:
When increasing the
logprobs
parameter, I expected only a minor increase in runtime due to slicing the top-k values from the full vocabulary logits. However, my experiments show an almost O(n) increase in runtime, which suggests that retrieving logprobs is more computationally expensive than anticipated.Reproduction Code
Observed Results
Expected Behavior
Since the model inherently computes full logits for the vocabulary on every forward pass, I expected retrieving
logprobs
to involve only a minor computational overhead (e.g., sorting/selecting top-k). However, the results suggest that requesting more logprobs significantly increases runtime, implying an O(n) complexity scaling instead of an efficient selection from precomputed logits.Questions:
logprobs
scale in an O(n) fashion?System Info:
Qwen/Qwen2.5-7B-Instruct
Looking forward to insights on whether this is expected behavior or a possible optimization opportunity! Thanks!
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: