-
Notifications
You must be signed in to change notification settings - Fork 93
Issues: vllm-project/llm-compressor
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Bug of quantizing part of the Qwen model
bug
Something isn't working
#1230
opened Mar 6, 2025 by
Kha-Zix-1
Why quantized model of 640MB took almost 3GB of VRAM?
question
Further information is requested
vllm
Using vLLM
#1229
opened Mar 6, 2025 by
Bakht-Ullah
Quantization Memory Requirements
question
Further information is requested
#1228
opened Mar 5, 2025 by
sneha5gsm
Fail when invalid arguments are provided in the recipe for a modifier
bug
Something isn't working
good first issue
A good first issue for users wanting to contribute
#1226
opened Mar 5, 2025 by
dsikka
Fail when faulty recipes are provided during Something isn't working
good first issue
A good first issue for users wanting to contribute
oneshot
bug
#1225
opened Mar 5, 2025 by
dsikka
Expand tests for New feature or request
good first issue
A good first issue for users wanting to contribute
ModuleSparsificationInfo
enhancement
#1224
opened Mar 5, 2025 by
dsikka
Update to use New feature or request
good first issue
A good first issue for users wanting to contribute
loguru
enhancement
#1223
opened Mar 5, 2025 by
dsikka
Update observers to make New feature or request
good first issue
A good first issue for users wanting to contribute
MSE
the default
enhancement
#1222
opened Mar 5, 2025 by
dsikka
Incomplete warning message during Something isn't working
good first issue
A good first issue for users wanting to contribute
oneshot
post process - WARNING - Optimized model not saved. To save, please provide
bug
#1219
opened Mar 3, 2025 by
dsikka
Lazy Loading of Weights for Large Model Quantization
enhancement
New feature or request
#1216
opened Mar 1, 2025 by
zjnyly
W4A8 model larger than W4A16
compressed-tensors
Relates to compressed-tensors
question
Further information is requested
#1215
opened Feb 28, 2025 by
chmeyers
Cannot load quantized Multimodal_audio model using whisper.load_model("quantized-model-path)
#1204
opened Feb 27, 2025 by
Bakht-Ullah
[Question] Has anyone successfully quantinize Deepseek-V3 to int4-w4a16?
#1203
opened Feb 27, 2025 by
halexan
Significant Inference Performance Degradation After W8A8 Quantization on CommandR-35B Model
question
Further information is requested
#1196
opened Feb 26, 2025 by
Iridescent-gcrace
does it support asymmetric ?
bug
Something isn't working
compressed-tensors
Relates to compressed-tensors
#1190
opened Feb 26, 2025 by
Molly-Dooker
OOM during save_pretrained of compressed model
bug
Something isn't working
#1183
opened Feb 22, 2025 by
chmeyers
Is it supported to quantize attention to fp8 with calibration?
question
Further information is requested
#1158
opened Feb 16, 2025 by
YSF-A
When quantizing gemma2 in W8A8 format, the input is not positive-definite and gemma2-27B cannot be quantized.
bug
Something isn't working
#1152
opened Feb 14, 2025 by
HelloCard
Add support for W8A8 quantization with CPU weight offloading
enhancement
New feature or request
#1078
opened Jan 17, 2025 by
NeoChen1024
[Bug] SparseGPTModifier with OutputDistillationModifier
bug
Something isn't working
#1058
opened Jan 11, 2025 by
Thunderbeee
About lora finetuning of 2:4 sparse and sparse quant models
enhancement
New feature or request
#952
opened Dec 4, 2024 by
arunpatala
Previous Next
ProTip!
Find all open issues with in progress development work with linked:pr.