You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Model loaded and generate output without any error, but the output content is gibberish
An temporary approach?
Inspired by this comment, I turned compilation_config.use_cudagraph from True to False (diff: imkero@92116c3, should change the source code in vllm/config.py because it always override compilation_config), and then it works as expected.
Prompt: '<|begin▁of▁sentence|><|User|>Hello<|Assistant|>', Generated text: '开头\n空气质量格佯抽 bist exc� 3好00\n210的,的,100000<think>10<think>0100329003021 the0,111的1的1 2的1,1,,0的1 的,的1的的,ence,的 的0ence,\n1的4的的的1的,的,0201<think>,,1001,002的,的,的<think>的frac1,的<think><think>的的的的,'
Prompt: '<|begin▁of▁sentence|><|User|>Which one is greater: 9.11 or 9.9?<|Assistant|>', Generated text: '开头\n reogra�001-001<think> record pre��使用2=1,00011000,,,11,,,0,,,,,1\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n'
Exptected output:
Prompt: '<|begin▁of▁sentence|><|User|>Hello<|Assistant|>', Generated text: '<think>\n\n</think>\n\nHello! How can I assist you today? 😊'
Prompt: '<|begin▁of▁sentence|><|User|>Which one is greater: 9.11 or 9.9?<|Assistant|>', Generated text: "<think>\nFirst, I'll compare the whole number parts of both numbers. Both 9.11 and 9.9 have the same whole number, which is 9.\n\nNext, I'll look at the decimal parts. In 9.11, the decimal part is 0.11, and in 9.9, it's 0.9.\n\nTo make a clear comparison, I'll express 0.9 as 0.90. Now, comparing 0.90 and 0.11, it's evident that 0.90 is greater than 0."
Before submitting a new issue...
Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
The text was updated successfully, but these errors were encountered:
imkero
changed the title
[Bug]: FP8 W8A8 quantized model doesn't work properly with default compilation_config (use_cudagraph = True)
[Bug]: FP8 W8A8 quantized model doesn't work properly on V1 with default compilation_config (use_cudagraph = True)
Feb 13, 2025
Thanks for the thorough bug report. I'm taking a look now. Have not been able to reproduce the issue so far -- output looks great using V1 on latest main on an H100
Hmm... I wasn't able to reproduce the error on my local machine for some reason. My output looks correct:
Prompt: '<|begin▁of▁sentence|><|User|>Hello<|Assistant|>', Generated text: '<think>\n\n</think>\n\nHello! How can I assist you today? 😊'
Prompt: '<|begin▁of▁sentence|><|User|>Which one is greater: 9.11 or 9.9?<|Assistant|>', Generated text: "<think>\nFirst, I'll compare the whole number parts of both numbers. Both 9.11 and 9.9 have the same whole number, which is 9.\n\nNext, I'll look at the decimal parts. In 9.11, the decimal part is 0.11, and in 9.9, it's 0.9.\n\nTo make the comparison easier, I'll express 0.9 as 0.90. Now, comparing 0.90 and 0.11, it's clear that 0.90 is greater than 0."
Your current environment
The output of `python collect_env.py`
🐛 Describe the bug
enforce_eager=False
An temporary approach?
Inspired by this comment, I turned
compilation_config.use_cudagraph
fromTrue
toFalse
(diff: imkero@92116c3, should change the source code invllm/config.py
because it always overridecompilation_config
), and then it works as expected.vllm/vllm/config.py
Lines 3237 to 3249 in 009439c
Should this problem be addressed by modifying
compilation_config
, or some bug should be fixed instead?Code and model to reproduce
Model: nm-testing/DeepSeek-R1-Distill-Qwen-14B-FP8-Dynamic
Code:
vLLM latest main: 0ccd876, and this script:
Outputs
vLLM main branch code output:
Exptected output:
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: