How can I just quantize a matrix W? #149

bxren · 2025-02-11T06:53:45Z

I am trying to apply hqq to other fields other than LLM or DL, such as approximate nearest neighbor search. I want to quantize the n-by-d matrix with each element being FP32 into n-by-d matrix with each element being 4bit/8bit integers. How can I use hqq API to achieve it? Thanks a lot!

mobicham · 2025-02-11T08:24:34Z

Hey, just follow the basic usage section:

from hqq.core.quantize import *
#Quantization settings
quant_config = BaseQuantizeConfig(nbits=4, group_size=64, axis=0) #use axis=0 for better quality


tmp_linear_layer = torch.nn.Linear(W.shape[1], W.shape[0], bias=False)
tmp_linear_layer.data = W #your matrix

#Replace your linear layer 
hqq_layer = HQQLinear(tmp_linear_layer, #torch.nn.Linear or None 
                      quant_config=quant_config, #quantization configuration
                      compute_dtype=W.dtype, #compute dtype
                      device='cuda', #cuda device
                      initialize=True, #Use False to quantize later
                      del_orig=True #if True, delete the original layer
                      )


W_r = hqq_layer.dequantize() #reconstructed matrix

You can also do the following:

hqq_layer = HQQLinear.from_weights(W, bias=None, quant_config=quant_config, compute_dtype=W.dtype, device="cuda")

bxren · 2025-02-11T10:00:29Z

I got it. Thank you! I still have a problem. Could you tell me how to use the parameters s and z generated by W(n-by-d) to quantize another query vector q(1-by-d)?

mobicham · 2025-02-11T10:08:20Z

I am not sure I understand, the s and z would be specific to W, quantizing q would require re-quantizing. Can you provide a detailed example?

bxren · 2025-02-11T12:28:29Z

W can be seen as n d-dimensional base vectors, and q is the d-dimensional query vector. In ANNS(approximate nearest neighbor search), we need to compute the euclidean distance between q and some of the base vectors, and find the closest k vectors. If we apply the same transform parameter s and z to q as W, we can just compute the distances between q_q and W_q, which is the same as the distances between q and W. That is why I want to get the s and z specific to W. I am not sure if I made it clear.

mobicham · 2025-02-11T12:36:42Z

You can't use the same s and z, these are specific quant parameters per data group defined by the group_size.

What you can do instead is to use the dot-prodct directly with the quantized weights via the HQQLinear module. For example, if you are calculating the cosine distance via the dot-product between W and q, you can quantize multiple chunks of W (the ones that fit the vram), and for each chunk W_i that is quantized in hqq_linear[i], you can do hqq_linear[i](q) -> distance_scores_i.

Note that by convention, hqq_linear[i](x) will actually do x @ dequantized().T, so you should transpose before quantizing if you want to do x @ W

By default, HQQLinear will dequantize first then do the dot-product, but you can use other backends like torchao_int4 that will do a fused dot-product without dequantization.

bxren · 2025-02-11T14:51:04Z

Thanks a lot! I really appreciate your help.

mobicham · 2025-02-11T15:02:35Z

Happy to help, if you have a toy example with code, I can help you out

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I just quantize a matrix W? #149

How can I just quantize a matrix W? #149

bxren commented Feb 11, 2025 •

edited

Loading

mobicham commented Feb 11, 2025 •

edited

Loading

bxren commented Feb 11, 2025

mobicham commented Feb 11, 2025

bxren commented Feb 11, 2025

mobicham commented Feb 11, 2025 •

edited

Loading

bxren commented Feb 11, 2025

mobicham commented Feb 11, 2025

How can I just quantize a matrix W? #149

How can I just quantize a matrix W? #149

Comments

bxren commented Feb 11, 2025 • edited Loading

mobicham commented Feb 11, 2025 • edited Loading

bxren commented Feb 11, 2025

mobicham commented Feb 11, 2025

bxren commented Feb 11, 2025

mobicham commented Feb 11, 2025 • edited Loading

bxren commented Feb 11, 2025

mobicham commented Feb 11, 2025

bxren commented Feb 11, 2025 •

edited

Loading

mobicham commented Feb 11, 2025 •

edited

Loading

mobicham commented Feb 11, 2025 •

edited

Loading