-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Has anyone successfully quantinize Deepseek-V3 to int4-w4a16? #1203
Comments
KeyError: 'model.layers.61.self_attn.q_a_proj.weight' |
import torch from llmcompressor.entrypoints import oneshot NOTE: transformers 4.49.0 has an attribute error with DeepSeek.Please consider either downgrading your transformers version to aprevious version or upgrading to a version where this bug is fixedselect a Mixture of Experts model for quantizationMODEL_ID = "/home/wanglch/data/DeepSeek-R1-bf16" adjust based off number of desired GPUsif not enough memory is available, some layers will automatically be offlaoded to cpudevice_map = calculate_offload_device_map( model = AutoModelForCausalLM.from_pretrained( Select calibration dataset.its recommended to use more calibration samples for MoE models so each expert is hitDATASET_ID = "HuggingFaceH4/ultrachat_200k" Load dataset and preprocess.ds = load_dataset(DATASET_ID, split=DATASET_SPLIT) def preprocess(example): ds = ds.map(preprocess) Tokenize inputs.def tokenize(sample): ds = ds.map(tokenize, remove_columns=ds.column_names) define a llmcompressor recipe for INT8 W8A8 quantizationsince the MoE gate layers are sensitive to quantization, we add them to the ignorelist so they remain at full precisionrecipe = [ SAVE_DIR = MODEL_ID.split("/")[1] + "-W8A8" oneshot( print("========== SAMPLE GENERATION ==============") |
You can use markdown code block to show your codes. Its easy to read |
Hi, is your stacktrace the same as shown in #1204 ? If so, please see this comment. If not, please reply |
hello,did u solve the problem,i encountered the same |
Has anyone successfully quantinize Deepseek-V3 to int4-w4a16?
The text was updated successfully, but these errors were encountered: