You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I want to apply different sparsification rates to different Transformer layers, or even to individual Linear layers. How can I write a correct recipe?
I wrote the following recipe, but it seems that the parser skips the first modifier and only prunes the second layer.
What's more, I found that sparsity structure like 6:8 is not sopported when saving, how to bypass it? I don't need to compress the weight.
Traceback (most recent call last):
File "/home/zjnyly/llm-compressor.py", line 73, in <module>
model.save_pretrained(SAVE_DIR, save_compressed=False)
File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/site-packages/llmcompressor/transformers/sparsification/compressed_tensors_utils.py", line 169, in save_pretrained_wrapper
compressor = get_model_compressor(
File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/site-packages/llmcompressor/transformers/sparsification/compressed_tensors_utils.py", line 289, in get_model_compressor
sparsity_stucture = SparsityConfigMetadata.infer_sparsity_structure(model)
File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/site-packages/llmcompressor/transformers/compression/sparsity_config.py", line 77, in infer_sparsity_structure
return SparsityStructure(sparsity_structure).value
File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/enum.py", line 385, in __call__
return cls.__new__(cls, value)
File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/enum.py", line 718, in __new__
raise exc
File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/enum.py", line 700, in __new__
result = cls._missing_(value)
File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/site-packages/compressed_tensors/config/base.py", line 91, in _missing_
raise ValueError(f"{value} is not a valid {cls.__name__}")
ValueError: 6:8 is not a valid SparsityStructure
Is there a better way to write the recipe to prune any Linear layer with arbitrary sparsity? I also want to quantize the model, so I need to preserve the pruned weights.
Traceback (most recent call last):
File "/home/zjnyly/llm-compressor.py", line 73, in <module>
model.save_pretrained(SAVE_DIR, save_compressed=False)
File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/site-packages/llmcompressor/transformers/sparsification/compressed_tensors_utils.py", line 169, in save_pretrained_wrapper
compressor = get_model_compressor(
File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/site-packages/llmcompressor/transformers/sparsification/compressed_tensors_utils.py", line 289, in get_model_compressor
sparsity_stucture = SparsityConfigMetadata.infer_sparsity_structure(model)
File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/site-packages/llmcompressor/transformers/compression/sparsity_config.py", line 77, in infer_sparsity_structure
return SparsityStructure(sparsity_structure).value
File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/enum.py", line 385, in __call__
return cls.__new__(cls, value)
File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/enum.py", line 718, in __new__
raise exc
File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/enum.py", line 700, in __new__
result = cls._missing_(value)
File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/site-packages/compressed_tensors/config/base.py", line 91, in _missing_
raise ValueError(f"{value} is not a valid {cls.__name__}")
ValueError: 6:8 is not a valid SparsityStructure
I edited compressed_tensors/config/base.pyclass SparsityStructure to unstructured sparsity to avoid the error. It is enough for me.
Hi, I want to apply different sparsification rates to different Transformer layers, or even to individual Linear layers. How can I write a correct recipe?
I wrote the following recipe, but it seems that the parser skips the first modifier and only prunes the second layer.
What's more, I found that sparsity structure like 6:8 is not sopported when saving, how to bypass it? I don't need to compress the weight.
Is there a better way to write the recipe to prune any Linear layer with arbitrary sparsity? I also want to quantize the model, so I need to preserve the pruned weights.
Thank you!
The text was updated successfully, but these errors were encountered: