Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to choose sparsification rates for different layers? #1231

Open
zjnyly opened this issue Mar 6, 2025 · 1 comment
Open

How to choose sparsification rates for different layers? #1231

zjnyly opened this issue Mar 6, 2025 · 1 comment

Comments

@zjnyly
Copy link

zjnyly commented Mar 6, 2025

Hi, I want to apply different sparsification rates to different Transformer layers, or even to individual Linear layers. How can I write a correct recipe?

I wrote the following recipe, but it seems that the parser skips the first modifier and only prunes the second layer.

recipe = [
    SparseGPTModifier(
        sparsity=0.5,
        ignore = ["lm_head"],
        mask_structure="4:8",
        # sequential_update=True,
        targets=[r"re:model.layers.0\d*$"],
    ),
    SparseGPTModifier(
        sparsity=0.75,
        ignore = ["lm_head"],
        mask_structure="6:8",
        # sequential_update=True,
        targets=[r"re:model.layers.1\d*$"],
    )
]
(1/33): Calibrating: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [00:05<00:00, 101.07it/s]
(1/33): Propagating: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [00:07<00:00, 71.27it/s]
(2/33): Calibrating: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [00:24<00:00, 20.78it/s]
2025-03-06T18:01:50.685722+0800 | on_sequential_batch_end | INFO - Sparsifying model.layers.1.self_attn.q_proj using 512 samples
2025-03-06T18:01:51.074850+0800 | compress | METRIC - time 0.39s
2025-03-06T18:01:51.074963+0800 | compress | METRIC - error 1440.30
2025-03-06T18:01:51.075180+0800 | compress | METRIC - GPU 0 | usage: 82.43% | total memory: 25 GB
2025-03-06T18:01:51.075241+0800 | compress | METRIC - Compressed module size: 33.554432 MB
2025-03-06T18:01:51.075352+0800 | on_sequential_batch_end | INFO - Sparsifying model.layers.1.self_attn.k_proj using 512 samples
2025-03-06T18:01:51.345968+0800 | compress | METRIC - time 0.27s
2025-03-06T18:01:51.346061+0800 | compress | METRIC - error 683.39
2025-03-06T18:01:51.346163+0800 | compress | METRIC - GPU 0 | usage: 82.43% | total memory: 25 GB
2025-03-06T18:01:51.346202+0800 | compress | METRIC - Compressed module size: 8.388608 MB
2025-03-06T18:01:51.346267+0800 | on_sequential_batch_end | INFO - Sparsifying model.layers.1.self_attn.v_proj using 512 samples
2025-03-06T18:01:51.614664+0800 | compress | METRIC - time 0.27s
2025-03-06T18:01:51.614734+0800 | compress | METRIC - error 72.16
2025-03-06T18:01:51.614818+0800 | compress | METRIC - GPU 0 | usage: 82.43% | total memory: 25 GB
2025-03-06T18:01:51.614857+0800 | compress | METRIC - Compressed module size: 8.388608 MB
2025-03-06T18:01:51.614913+0800 | on_sequential_batch_end | INFO - Sparsifying model.layers.1.self_attn.o_proj using 512 samples
2025-03-06T18:01:51.883579+0800 | compress | METRIC - time 0.27s
2025-03-06T18:01:51.883656+0800 | compress | METRIC - error 5.43
2025-03-06T18:01:51.883749+0800 | compress | METRIC - GPU 0 | usage: 82.43% | total memory: 25 GB
2025-03-06T18:01:51.883785+0800 | compress | METRIC - Compressed module size: 33.554432 MB
2025-03-06T18:01:51.883847+0800 | on_sequential_batch_end | INFO - Sparsifying model.layers.1.mlp.gate_proj using 512 samples
2025-03-06T18:01:52.177392+0800 | compress | METRIC - time 0.29s
2025-03-06T18:01:52.177493+0800 | compress | METRIC - error 6707.05
2025-03-06T18:01:52.177594+0800 | compress | METRIC - GPU 0 | usage: 82.44% | total memory: 25 GB
2025-03-06T18:01:52.177634+0800 | compress | METRIC - Compressed module size: 117.440512 MB
2025-03-06T18:01:52.177698+0800 | on_sequential_batch_end | INFO - Sparsifying model.layers.1.mlp.up_proj using 512 samples
2025-03-06T18:01:52.471159+0800 | compress | METRIC - time 0.29s
2025-03-06T18:01:52.471243+0800 | compress | METRIC - error 5519.00
2025-03-06T18:01:52.471337+0800 | compress | METRIC - GPU 0 | usage: 82.44% | total memory: 25 GB
2025-03-06T18:01:52.471374+0800 | compress | METRIC - Compressed module size: 117.440512 MB
2025-03-06T18:01:52.471434+0800 | on_sequential_batch_end | INFO - Sparsifying model.layers.1.mlp.down_proj using 512 samples
2025-03-06T18:01:53.857160+0800 | compress | METRIC - time 1.39s
2025-03-06T18:01:53.857282+0800 | compress | METRIC - error 74.40
2025-03-06T18:01:53.857402+0800 | compress | METRIC - GPU 0 | usage: 85.63% | total memory: 25 GB
2025-03-06T18:01:53.857445+0800 | compress | METRIC - Compressed module size: 117.440512 MB
(2/33): Propagating: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [00:05<00:00, 86.53it/s]
(3/33): Calibrating: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [00:05<00:00, 95.25it/s]
(3/33): Propagating: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [00:05<00:00, 88.27it/s]
(4/33): Calibrating: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [00:05<00:00, 94.14it/s]
(4/33): Propagating: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [00:05<00:00, 87.86it/s]
(5/33): Calibrating: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [00:05<00:00, 93.67it/s]
.................................................

What's more, I found that sparsity structure like 6:8 is not sopported when saving, how to bypass it? I don't need to compress the weight.

Traceback (most recent call last):
  File "/home/zjnyly/llm-compressor.py", line 73, in <module>
    model.save_pretrained(SAVE_DIR, save_compressed=False)
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/site-packages/llmcompressor/transformers/sparsification/compressed_tensors_utils.py", line 169, in save_pretrained_wrapper
    compressor = get_model_compressor(
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/site-packages/llmcompressor/transformers/sparsification/compressed_tensors_utils.py", line 289, in get_model_compressor
    sparsity_stucture = SparsityConfigMetadata.infer_sparsity_structure(model)
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/site-packages/llmcompressor/transformers/compression/sparsity_config.py", line 77, in infer_sparsity_structure
    return SparsityStructure(sparsity_structure).value
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/enum.py", line 385, in __call__
    return cls.__new__(cls, value)
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/enum.py", line 718, in __new__
    raise exc
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/enum.py", line 700, in __new__
    result = cls._missing_(value)
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/site-packages/compressed_tensors/config/base.py", line 91, in _missing_
    raise ValueError(f"{value} is not a valid {cls.__name__}")
ValueError: 6:8 is not a valid SparsityStructure

Is there a better way to write the recipe to prune any Linear layer with arbitrary sparsity? I also want to quantize the model, so I need to preserve the pruned weights.

recipe = [
    SparseGPTModifier(
        sparsity=[0.25, 0.75]
        ignore = ["lm_head"],
        mask_structure=["4:8", "6:8"]
        sequential_update=True,
        targets=[r"re:model.layers.0\d*$", r"re:model.layers.1\d*$"],
    )
    ConstantPruningModifier(
                targets=[
                    r"re:.*q_proj.weight",
                    r"re:.*k_proj.weight",
                    r"re:.*v_proj.weight",
                    r"re:.*o_proj.weight",
                    r"re:.*gate_proj.weight",
                    r"re:.*up_proj.weight",
                    r"re:.*down_proj.weight",
                ],
                start=0,
            ),
   GPTQModifier(targets="Linear", scheme="W8A8", ignore=["lm_head"], offload_hessians=True),
]

Thank you!

@zjnyly
Copy link
Author

zjnyly commented Mar 6, 2025

Traceback (most recent call last):
  File "/home/zjnyly/llm-compressor.py", line 73, in <module>
    model.save_pretrained(SAVE_DIR, save_compressed=False)
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/site-packages/llmcompressor/transformers/sparsification/compressed_tensors_utils.py", line 169, in save_pretrained_wrapper
    compressor = get_model_compressor(
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/site-packages/llmcompressor/transformers/sparsification/compressed_tensors_utils.py", line 289, in get_model_compressor
    sparsity_stucture = SparsityConfigMetadata.infer_sparsity_structure(model)
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/site-packages/llmcompressor/transformers/compression/sparsity_config.py", line 77, in infer_sparsity_structure
    return SparsityStructure(sparsity_structure).value
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/enum.py", line 385, in __call__
    return cls.__new__(cls, value)
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/enum.py", line 718, in __new__
    raise exc
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/enum.py", line 700, in __new__
    result = cls._missing_(value)
  File "/home/zjnyly/miniconda3/envs/py310/lib/python3.10/site-packages/compressed_tensors/config/base.py", line 91, in _missing_
    raise ValueError(f"{value} is not a valid {cls.__name__}")
ValueError: 6:8 is not a valid SparsityStructure

I edited compressed_tensors/config/base.py class SparsityStructure to unstructured sparsity to avoid the error. It is enough for me.

@classmethod
    def _missing_(cls, value):
        # Handle None and case-insensitive values
        return cls.UNSTRUCTURED

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant