-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Re-enable Sparse Compression for 2of4 Examples #1153
Conversation
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a comment about prev vllm versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥳
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update the E2E test case as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider adding a comment to the example about disable_sparse_compression (just that it exists/can be used)?
It is there in the readme as a separate section, if anyone wants to try |
d7756df
Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]>
d7756df
to
8862d8b
Compare
Done and tested! |
This PR restores sparse compression for our
2of4
examples, which was previously disabled due to a bug in the vLLM Cutlass integration.Background
A bug in the Cutlass integration caused certain sparse-only compressed models to produce gibberish results. To mitigate this issue, we temporarily turned off sparse compression for our
2of4
examples.The bug has since been fixed by @tlrmchlsmth in vllm-project/vllm#13198. With this fix in place, we can safely re-enable sparse compression for these examples.
Changes
2of4
examples.Testing