-
-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable chunked prefill and/or prefix caching when MLA is enabled #12642
Conversation
Signed-off-by: mgoin <[email protected]>
Signed-off-by: mgoin <[email protected]>
Signed-off-by: simon-mo <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: simon-mo <[email protected]>
self.scheduler_config.enable_chunked_prefill = False | ||
self.scheduler_config.chunked_prefill_enabled = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never knew we had these two different flags, what's the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😅
…lm-project#12642) From @mgoin in vllm-project#12638 I cannot push to that branch, therefore a new PR to unblock release. --------- Signed-off-by: mgoin <[email protected]> Signed-off-by: simon-mo <[email protected]> Co-authored-by: mgoin <[email protected]> Signed-off-by: Isotr0py <[email protected]>
…lm-project#12642) From @mgoin in vllm-project#12638 I cannot push to that branch, therefore a new PR to unblock release. --------- Signed-off-by: mgoin <[email protected]> Signed-off-by: simon-mo <[email protected]> Co-authored-by: mgoin <[email protected]>
…lm-project#12642) From @mgoin in vllm-project#12638 I cannot push to that branch, therefore a new PR to unblock release. --------- Signed-off-by: mgoin <[email protected]> Signed-off-by: simon-mo <[email protected]> Co-authored-by: mgoin <[email protected]> Signed-off-by: Srikanth Srinivas <[email protected]>
…lm-project#12642) From @mgoin in vllm-project#12638 I cannot push to that branch, therefore a new PR to unblock release. --------- Signed-off-by: mgoin <[email protected]> Signed-off-by: simon-mo <[email protected]> Co-authored-by: mgoin <[email protected]>
…lm-project#12642) From @mgoin in vllm-project#12638 I cannot push to that branch, therefore a new PR to unblock release. --------- Signed-off-by: mgoin <[email protected]> Signed-off-by: simon-mo <[email protected]> Co-authored-by: mgoin <[email protected]> Signed-off-by: Felix Marty <[email protected]>
…lm-project#12642) From @mgoin in vllm-project#12638 I cannot push to that branch, therefore a new PR to unblock release. --------- Signed-off-by: mgoin <[email protected]> Signed-off-by: simon-mo <[email protected]> Co-authored-by: mgoin <[email protected]>
Why was this change required. We are concerned we are seeing instability in the VLLM server with chunked prefill disabled. We are observing the server hangs when it receives a large prompt while other requests are in flight. Thank you. |
@siddartha-RE MLA does not supported chunked prefill yet, so it will error if you try to run it with that enabled. We are working on enabling this ASAP |
Thank you for the explanation. We are testing on 16x GH200 with 96GB per GPU. Let me know if you would like me to verify / benchmark anything on this configuration. |
…lm-project#12642) From @mgoin in vllm-project#12638 I cannot push to that branch, therefore a new PR to unblock release. --------- Signed-off-by: mgoin <[email protected]> Signed-off-by: simon-mo <[email protected]> Co-authored-by: mgoin <[email protected]>
…lm-project#12642) From @mgoin in vllm-project#12638 I cannot push to that branch, therefore a new PR to unblock release. --------- Signed-off-by: mgoin <[email protected]> Signed-off-by: simon-mo <[email protected]> Co-authored-by: mgoin <[email protected]>
…lm-project#12642) From @mgoin in vllm-project#12638 I cannot push to that branch, therefore a new PR to unblock release. --------- Signed-off-by: mgoin <[email protected]> Signed-off-by: simon-mo <[email protected]> Co-authored-by: mgoin <[email protected]>
…lm-project#12642) From @mgoin in vllm-project#12638 I cannot push to that branch, therefore a new PR to unblock release. --------- Signed-off-by: mgoin <[email protected]> Signed-off-by: simon-mo <[email protected]> Co-authored-by: mgoin <[email protected]>
…lm-project#12642) From @mgoin in vllm-project#12638 I cannot push to that branch, therefore a new PR to unblock release. --------- Signed-off-by: mgoin <[email protected]> Signed-off-by: simon-mo <[email protected]> Co-authored-by: mgoin <[email protected]> Signed-off-by: Linkun Chen <[email protected]>
From @mgoin in #12638
I cannot push to that branch, therefore a new PR to unblock release.