-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can I just quantize a matrix W? #149
Comments
Hey, just follow the basic usage section: from hqq.core.quantize import *
#Quantization settings
quant_config = BaseQuantizeConfig(nbits=4, group_size=64, axis=0) #use axis=0 for better quality
tmp_linear_layer = torch.nn.Linear(W.shape[1], W.shape[0], bias=False)
tmp_linear_layer.data = W #your matrix
#Replace your linear layer
hqq_layer = HQQLinear(tmp_linear_layer, #torch.nn.Linear or None
quant_config=quant_config, #quantization configuration
compute_dtype=W.dtype, #compute dtype
device='cuda', #cuda device
initialize=True, #Use False to quantize later
del_orig=True #if True, delete the original layer
)
W_r = hqq_layer.dequantize() #reconstructed matrix You can also do the following: hqq_layer = HQQLinear.from_weights(W, bias=None, quant_config=quant_config, compute_dtype=W.dtype, device="cuda") |
I got it. Thank you! I still have a problem. Could you tell me how to use the parameters s and z generated by W(n-by-d) to quantize another query vector q(1-by-d)? |
I am not sure I understand, the |
W can be seen as n d-dimensional base vectors, and q is the d-dimensional query vector. In ANNS(approximate nearest neighbor search), we need to compute the euclidean distance between q and some of the base vectors, and find the closest k vectors. If we apply the same transform parameter s and z to q as W, we can just compute the distances between q_q and W_q, which is the same as the distances between q and W. That is why I want to get the s and z specific to W. I am not sure if I made it clear. |
You can't use the same What you can do instead is to use the dot-prodct directly with the quantized weights via the HQQLinear module. For example, if you are calculating the cosine distance via the dot-product between Note that by convention, By default, HQQLinear will dequantize first then do the dot-product, but you can use other backends like |
Thanks a lot! I really appreciate your help. |
Happy to help, if you have a toy example with code, I can help you out |
I am trying to apply hqq to other fields other than LLM or DL, such as approximate nearest neighbor search. I want to quantize the n-by-d matrix with each element being FP32 into n-by-d matrix with each element being 4bit/8bit integers. How can I use hqq API to achieve it? Thanks a lot!
The text was updated successfully, but these errors were encountered: