Quantization Memory Requirements #1228

sneha5gsm · 2025-03-05T15:03:41Z

Hello!

I was trying the various quantization recipes for quantizing a 70B Llama 3 based model to FP8, INT8, INT4(A16) precisions as mentioned in the quantization docs by vLLM.

Could you help me understand the memory requirements for the quantization recipes, i.e SmoothQuant (SmoothQuantModifier), GPTQ (GPTQModifier) and RTN (QuantizationModifier). A calculation/formula would help, for example, like the one we have for calculating kv cache:

memory in bytes for kv cache = 80 (layers) * 8 (kv heads) * 128 (head_dim) * 8192 (seq length) * 2 (k and v) * 2 (fp16)

I understand that the calculate_offload_device_map creates a custom device map by reserving memory for
GPTQ (reserve_for_hessians), but I would still like to understand the memory requirements to be able to utilize the GPU memory efficiently, to understand where all the GPU memory is consumed and to ensure that there are no bugs.

Also, I understand that currently, for quantization of big models, the model is split in a pipeline parallel way on multiple GPUs available on the instance.

Since the GPU which is being used at any given time is the one which has the model layer that is being quantized at that time, would the time taken to quantize the model be similar to using a single GPU to quantize the model vs using multiple GPUs?
Is it possible to split the model in a tensor parallel way?
I understand that 'non-sequential GPTQ ' is deprecated, but how much memory is required for a non-sequential GPTQ? I think the above memory calculation would help. Also, how much speed up would we see using the non-sequential approach (compared to the sequential one)?

Thank you!

The text was updated successfully, but these errors were encountered:

dsikka added the question Further information is requested label Mar 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization Memory Requirements #1228

Quantization Memory Requirements #1228

sneha5gsm commented Mar 5, 2025

Quantization Memory Requirements #1228

Quantization Memory Requirements #1228

Comments

sneha5gsm commented Mar 5, 2025