How to choose sparsification rates for different layers? #1231

zjnyly · 2025-03-06T10:14:26Z

Hi, I want to apply different sparsification rates to different Transformer layers, or even to individual Linear layers. How can I write a correct recipe?

I wrote the following recipe, but it seems that the parser skips the first modifier and only prunes the second layer.

recipe = [
    SparseGPTModifier(
        sparsity=0.5,
        ignore = ["lm_head"],
        mask_structure="4:8",
        # sequential_update=True,
        targets=[r"re:model.layers.0\d*$"],
    ),
    SparseGPTModifier(
        sparsity=0.75,
        ignore = ["lm_head"],
        mask_structure="6:8",
        # sequential_update=True,
        targets=[r"re:model.layers.1\d*$"],
    )
]

(1/33): Calibrating: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [00:05<00:00, 101.07it/s]
(1/33): Propagating: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [00:07<00:00, 71.27it/s]
(2/33): Calibrating: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [00:24<00:00, 20.78it/s]
2025-03-06T18:01:50.685722+0800 | on_sequential_batch_end | INFO - Sparsifying model.layers.1.self_attn.q_proj using 512 samples
2025-03-06T18:01:51.074850+0800 | compress | METRIC - time 0.39s
2025-03-06T18:01:51.074963+0800 | compress | METRIC - error 1440.30
2025-03-06T18:01:51.075180+0800 | compress | METRIC - GPU 0 | usage: 82.43% | total memory: 25 GB
2025-03-06T18:01:51.075241+0800 | compress | METRIC - Compressed module size: 33.554432 MB
2025-03-06T18:01:51.075352+0800 | on_sequential_batch_end | INFO - Sparsifying model.layers.1.self_attn.k_proj using 512 samples
2025-03-06T18:01:51.345968+0800 | compress | METRIC - time 0.27s
2025-03-06T18:01:51.346061+0800 | compress | METRIC - error 683.39
2025-03-06T18:01:51.346163+0800 | compress | METRIC - GPU 0 | usage: 82.43% | total memory: 25 GB
2025-03-06T18:01:51.346202+0800 | compress | METRIC - Compressed module size: 8.388608 MB
2025-03-06T18:01:51.346267+0800 | on_sequential_batch_end | INFO - Sparsifying model.layers.1.self_attn.v_proj using 512 samples
2025-03-06T18:01:51.614664+0800 | compress | METRIC - time 0.27s
2025-03-06T18:01:51.614734+0800 | compress | METRIC - error 72.16
2025-03-06T18:01:51.614818+0800 | compress | METRIC - GPU 0 | usage: 82.43% | total memory: 25 GB
2025-03-06T18:01:51.614857+0800 | compress | METRIC - Compressed module size: 8.388608 MB
2025-03-06T18:01:51.614913+0800 | on_sequential_batch_end | INFO - Sparsifying model.layers.1.self_attn.o_proj using 512 samples
2025-03-06T18:01:51.883579+0800 | compress | METRIC - time 0.27s
2025-03-06T18:01:51.883656+0800 | compress | METRIC - error 5.43
2025-03-06T18:01:51.883749+0800 | compress | METRIC - GPU 0 | usage: 82.43% | total memory: 25 GB
2025-03-06T18:01:51.883785+0800 | compress | METRIC - Compressed module size: 33.554432 MB
2025-03-06T18:01:51.883847+0800 | on_sequential_batch_end | INFO - Sparsifying model.layers.1.mlp.gate_proj using 512 samples
2025-03-06T18:01:52.177392+0800 | compress | METRIC - time 0.29s
2025-03-06T18:01:52.177493+0800 | compress | METRIC - error 6707.05
2025-03-06T18:01:52.177594+0800 | compress | METRIC - GPU 0 | usage: 82.44% | total memory: 25 GB
2025-03-06T18:01:52.177634+0800 | compress | METRIC - Compressed module size: 117.440512 MB
2025-03-06T18:01:52.177698+0800 | on_sequential_batch_end | INFO - Sparsifying model.layers.1.mlp.up_proj using 512 samples
2025-03-06T18:01:52.471159+0800 | compress | METRIC - time 0.29s
2025-03-06T18:01:52.471243+0800 | compress | METRIC - error 5519.00
2025-03-06T18:01:52.471337+0800 | compress | METRIC - GPU 0 | usage: 82.44% | total memory: 25 GB
2025-03-06T18:01:52.471374+0800 | compress | METRIC - Compressed module size: 117.440512 MB
2025-03-06T18:01:52.471434+0800 | on_sequential_batch_end | INFO - Sparsifying model.layers.1.mlp.down_proj using 512 samples
2025-03-06T18:01:53.857160+0800 | compress | METRIC - time 1.39s
2025-03-06T18:01:53.857282+0800 | compress | METRIC - error 74.40
2025-03-06T18:01:53.857402+0800 | compress | METRIC - GPU 0 | usage: 85.63% | total memory: 25 GB
2025-03-06T18:01:53.857445+0800 | compress | METRIC - Compressed module size: 117.440512 MB
(2/33): Propagating: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [00:05<00:00, 86.53it/s]
(3/33): Calibrating: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [00:05<00:00, 95.25it/s]
(3/33): Propagating: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [00:05<00:00, 88.27it/s]
(4/33): Calibrating: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [00:05<00:00, 94.14it/s]
(4/33): Propagating: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [00:05<00:00, 87.86it/s]
(5/33): Calibrating: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [00:05<00:00, 93.67it/s]
.................................................

What's more, I found that sparsity structure like 6:8 is not sopported when saving, how to bypass it? I don't need to compress the weight.

Traceback (most recent call last):
  File "/home/zjnyly/llm-compressor.py", line 73, in <module>
    model.save_pretrained(SAVE_DIR, save_compressed=False)
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/site-packages/llmcompressor/transformers/sparsification/compressed_tensors_utils.py", line 169, in save_pretrained_wrapper
    compressor = get_model_compressor(
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/site-packages/llmcompressor/transformers/sparsification/compressed_tensors_utils.py", line 289, in get_model_compressor
    sparsity_stucture = SparsityConfigMetadata.infer_sparsity_structure(model)
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/site-packages/llmcompressor/transformers/compression/sparsity_config.py", line 77, in infer_sparsity_structure
    return SparsityStructure(sparsity_structure).value
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/enum.py", line 385, in __call__
    return cls.__new__(cls, value)
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/enum.py", line 718, in __new__
    raise exc
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/enum.py", line 700, in __new__
    result = cls._missing_(value)
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/site-packages/compressed_tensors/config/base.py", line 91, in _missing_
    raise ValueError(f"{value} is not a valid {cls.__name__}")
ValueError: 6:8 is not a valid SparsityStructure

Is there a better way to write the recipe to prune any Linear layer with arbitrary sparsity? I also want to quantize the model, so I need to preserve the pruned weights.

recipe = [
    SparseGPTModifier(
        sparsity=[0.25, 0.75]
        ignore = ["lm_head"],
        mask_structure=["4:8", "6:8"]
        sequential_update=True,
        targets=[r"re:model.layers.0\d*$", r"re:model.layers.1\d*$"],
    )
    ConstantPruningModifier(
                targets=[
                    r"re:.*q_proj.weight",
                    r"re:.*k_proj.weight",
                    r"re:.*v_proj.weight",
                    r"re:.*o_proj.weight",
                    r"re:.*gate_proj.weight",
                    r"re:.*up_proj.weight",
                    r"re:.*down_proj.weight",
                ],
                start=0,
            ),
   GPTQModifier(targets="Linear", scheme="W8A8", ignore=["lm_head"], offload_hessians=True),
]

Thank you!

The text was updated successfully, but these errors were encountered:

zjnyly · 2025-03-06T11:49:20Z

Traceback (most recent call last):
  File "/home/zjnyly/llm-compressor.py", line 73, in <module>
    model.save_pretrained(SAVE_DIR, save_compressed=False)
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/site-packages/llmcompressor/transformers/sparsification/compressed_tensors_utils.py", line 169, in save_pretrained_wrapper
    compressor = get_model_compressor(
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/site-packages/llmcompressor/transformers/sparsification/compressed_tensors_utils.py", line 289, in get_model_compressor
    sparsity_stucture = SparsityConfigMetadata.infer_sparsity_structure(model)
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/site-packages/llmcompressor/transformers/compression/sparsity_config.py", line 77, in infer_sparsity_structure
    return SparsityStructure(sparsity_structure).value
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/enum.py", line 385, in __call__
    return cls.__new__(cls, value)
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/enum.py", line 718, in __new__
    raise exc
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/enum.py", line 700, in __new__
    result = cls._missing_(value)
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/site-packages/compressed_tensors/config/base.py", line 91, in _missing_
    raise ValueError(f"{value} is not a valid {cls.__name__}")
ValueError: 6:8 is not a valid SparsityStructure

I edited compressed_tensors/config/base.py class SparsityStructure to unstructured sparsity to avoid the error. It is enough for me.

@classmethod
    def _missing_(cls, value):
        # Handle None and case-insensitive values
        return cls.UNSTRUCTURED

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to choose sparsification rates for different layers? #1231

How to choose sparsification rates for different layers? #1231

zjnyly commented Mar 6, 2025 •

edited

Loading

zjnyly commented Mar 6, 2025

How to choose sparsification rates for different layers? #1231

How to choose sparsification rates for different layers? #1231

Comments

zjnyly commented Mar 6, 2025 • edited Loading

zjnyly commented Mar 6, 2025

zjnyly commented Mar 6, 2025 •

edited

Loading