[Usage]: Logprobs Scaling with O(n) Complexity – Unexpected Performance Degradation #14300

Rachum-thu · 2025-03-05T17:47:25Z

Title: Logprobs Scaling with O(n) Complexity – Unexpected Performance Degradation

Description:
When increasing the logprobs parameter, I expected only a minor increase in runtime due to slicing the top-k values from the full vocabulary logits. However, my experiments show an almost O(n) increase in runtime, which suggests that retrieving logprobs is more computationally expensive than anticipated.

Reproduction Code

import time
from vllm import LLM
from vllm.sampling_params import SamplingParams

def test_generation_time(llm, logprobs_value, batch_size=32):
    sampling_params = SamplingParams(logprobs=logprobs_value, max_tokens=1)
    
    # Timed run
    start_time = time.time()
    output = llm.generate(["Tell me something about LLMs."] * batch_size,
                         sampling_params=sampling_params,
                         use_tqdm=False)
    end_time = time.time()
    
    return end_time - start_time

def main():
    print("Initializing model...")
    llm = LLM(model="Qwen/Qwen2.5-7B-Instruct", max_logprobs=152_064)  # vocab size
    
    batch_size = 32
    logprobs_values = [10, 100, 1000, 10000, 100000, 152064]
    results = []
    
    print("\nStarting tests...")
    for logprobs in logprobs_values:
        time_taken = test_generation_time(llm, logprobs, batch_size)
        results.append((logprobs, time_taken))
    
    print("\nResults Summary:")
    print("╔══════════════╦═══════════════╗")
    print("║   Logprobs   ║  Time (secs)  ║")
    print("╠══════════════╬═══════════════╣")
    for logprobs, time_taken in results:
        print(f"║ {logprobs:^12} ║ {time_taken:^13.4f} ║")
    print("╚══════════════╩═══════════════╝")

if __name__ == "__main__":
    main()

Observed Results

╔══════════════╦═══════════════╗
║   Logprobs   ║  Time (secs)  ║
╠══════════════╬═══════════════╣
║      10      ║    0.0784     ║
║     100      ║    0.0410     ║
║     1000     ║    0.1909     ║
║    10000     ║    1.9388     ║
║    100000    ║    19.9256    ║
║    152064    ║    29.2862    ║
╚══════════════╩═══════════════╝

Expected Behavior

Since the model inherently computes full logits for the vocabulary on every forward pass, I expected retrieving logprobs to involve only a minor computational overhead (e.g., sorting/selecting top-k). However, the results suggest that requesting more logprobs significantly increases runtime, implying an O(n) complexity scaling instead of an efficient selection from precomputed logits.

Questions:

Why does increasing logprobs scale in an O(n) fashion?
- Is the model recomputing or performing expensive operations instead of just slicing logits?
Is there a way to retrieve logprobs for the full vocabulary without incurring this high runtime penalty?
Would it be possible to expose full logits instead of just logprobs?

System Info:

vLLM version: 0.7.4.dev142+g9804145c.d20250228
Model: Qwen/Qwen2.5-7B-Instruct
CUDA Version: CUDA Version: 12.5

Looking forward to insights on whether this is expected behavior or a possible optimization opportunity! Thanks!

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

Rachum-thu added the usage How to use vllm label Mar 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: Logprobs Scaling with O(n) Complexity – Unexpected Performance Degradation #14300

[Usage]: Logprobs Scaling with O(n) Complexity – Unexpected Performance Degradation #14300

Rachum-thu commented Mar 5, 2025 •

edited

Loading

[Usage]: Logprobs Scaling with O(n) Complexity – Unexpected Performance Degradation #14300

[Usage]: Logprobs Scaling with O(n) Complexity – Unexpected Performance Degradation #14300

Comments

Rachum-thu commented Mar 5, 2025 • edited Loading

Reproduction Code

Observed Results

Expected Behavior

Questions:

System Info:

Before submitting a new issue...

Rachum-thu commented Mar 5, 2025 •

edited

Loading