Replace LayerCompressor with HooksMixin #1038

kylesayrs · 2025-01-06T22:24:05Z

Purpose

Remove layer compressor to decouple modifiers from data pipelines
Reduce abstractions
Support VLMs with SparseGPT and Wanda

Prerequisites

Changes

Interface/ Features

SparseGPT and Wanda now both support VLM architectures
Added sequential_targets to match GPTQ and made targets an alias
Support hessian offloading for SparseGPT
Add customized _LinAlgError for SparseGPT

Implementations

Changed implementation styles of SparseGPTModifier and WandaPruningModifier to match GPTQModifier
Removed LayerCompressor, ModuleCompressionWrapper, SparseGptWrapper, and WandaWrapper
Shared implementations between SparseGPT and Wanda are implemented by the SparsityModifierMixin
Removed lines blocking allow_tf32
- Maybe @rahul-tuli knows why this was originally implemented, potentially to avoid hardware issues?
- This change was only present for wanda. Given that all other modifiers do not have this change, I see no reason why it should stay
Updated sparsegpt tests to reflect new implementation

Tests

Updated obcq tests to reflect new implementations
Removed test_sgpt_defaults.py since this test doesn't test anything new or novel about this modifier

Testing

grep -r "LayerCompressor\|ModuleCompressionWrapper\|SparseGptWrapper\|WandaWrapper" src/ examples/ tests/
Modified test_invalid_layerwise_recipes_raise_exceptions and test_successful_layerwise_recipe pass
llama3_8b_2of4.py passes and was evaluated with both SparseGPT and Wanda

Potential Follow ups

Add module targets and ignore to SparseGPT and Wanda

Regression Testing

The hessian, row scalar, and compressed weight values were confirmed to be unchanged in the case that of one calibration sample. The final evaluations are different, which is likely due to numerical imprecision (dividing by int vs torch.int), different pipelines (different subgraph partitions => different imprecision from cpu offloading, potentially different module arguments).

Evaluation

Models were compressed using examples/sparse_2of4_quantization_fp8/llama3_8b_2of4.py

sparsegpt

Main

hf (pretrained=/home/ksayers/llm-compressor/old_Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1                                                           
|  Tasks   |Version|Filter|n-shot|Metric|   |Value |   |Stderr|                                                        
|----------|------:|------|-----:|------|---|-----:|---|-----:|                                                        
|winogrande|      1|none  |     5|acc   |?  |0.5391|?  | 0.014|

Branch

hf (pretrained=/home/ksayers/llm-compressor/new_Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|  Tasks   |Version|Filter|n-shot|Metric|   |Value|   |Stderr|
|----------|------:|------|-----:|------|---|----:|---|-----:|
|winogrande|      1|none  |     5|acc   |?  |0.547|?  | 0.014|

To test wanda, the SparseGPTModifier was replaced with the WandaPruningModifier

wanda

Main

hf (pretrained=/home/kyle/old_llm-compressor/Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|  Tasks   |Version|Filter|n-shot|Metric|   |Value|   |Stderr|
|----------|------:|------|-----:|------|---|----:|---|-----:|
|winogrande|      1|none  |     5|acc   |↑  |0.532|±  | 0.014|

Branch

hf (pretrained=/home/kyle/llm-compressor/Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|  Tasks   |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|----------|------:|------|-----:|------|---|-----:|---|-----:|
|winogrande|      1|none  |     5|acc   |↑  |0.5414|±  | 0.014|

Signed-off-by: Kyle Sayers <[email protected]>

github-actions · 2025-01-06T22:24:18Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Signed-off-by: Kyle Sayers <[email protected]>

…pressor

Signed-off-by: Kyle Sayers <[email protected]>

…pressor

Signed-off-by: Kyle Sayers <[email protected]>

…branch 'origin' into kylesayrs/hooks-mixin-keep

Signed-off-by: Kyle Sayers <[email protected]>

…pressor

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs · 2025-01-31T19:54:48Z

Looks like adding this change fixed the test

weights = torch.rand(10, 4)
    if is_24:
        weights = _make_24_sparse(weights)
    else:
        weights[0, :] = torch.ones(4, )  # guarantee not 24 sparse

The most likely explanation is that this test randomly fails, and that this PR happened to be unlucky

Signed-off-by: Kyle Sayers <[email protected]>

dsikka · 2025-02-04T18:33:29Z

Purpose

Remove layer compressor to decouple modifiers from data pipelines

Reduce abstractions

Support VLMs with SparseGPT and Wanda

Prerequisites

Extend remove_hooks to remove subsets #1021

Extend disable_hooks to keep subsets #1023

GPTQModifier Nits and Code Clarity #1068

VLM: Model Tracing Guide #1030

Changes

Interface/ Features

SparseGPT and Wanda now both support VLM architectures

Added sequential_targets to match GPTQ and made targets an alias

Support hessian offloading for SparseGPT

Add customized _LinAlgError for SparseGPT

Implementations

Changed implementation styles of SparseGPTModifier and WandaPruningModifier to match GPTQModifier

Removed LayerCompressor, ModuleCompressionWrapper, SparseGptWrapper, and WandaWrapper

Shared implementations between SparseGPT and Wanda are implemented by the SparsityModifierMixin

Removed lines blocking allow_tf32

Maybe @rahul-tuli knows why this was originally implemented, potentially to avoid hardware issues?

This change was only present for wanda. Given that all other modifiers do not have this change, I see no reason why it should stay

Updated sparsegpt tests to reflect new implementation

Tests

Updated obcq tests to reflect new implementations

Removed test_sgpt_defaults.py since this test doesn't test anything new or novel about this modifier

Testing

grep -r "LayerCompressor\|ModuleCompressionWrapper\|SparseGptWrapper\|WandaWrapper" src/ examples/ tests/

Modified test_invalid_layerwise_recipes_raise_exceptions and test_successful_layerwise_recipe pass

llama3_8b_2of4.py passes and was evaluated with both SparseGPT and Wanda

Potential Follow ups

Add module targets and ignore to SparseGPT and Wanda

Regression Testing

The hessian, row scalar, and compressed weight values were confirmed to be unchanged in the case that of one calibration sample. The final evaluations are different, which is likely due to numerical imprecision (dividing by int vs torch.int), different pipelines (different subgraph partitions => different imprecision from cpu offloading, potentially different module arguments).

Evaluation

Models were compressed using examples/sparse_2of4_quantization_fp8/llama3_8b_2of4.py

sparsegpt
To test wanda, the SparseGPTModifier was replaced with the WandaPruningModifier

wanda

In terms of regression testing being different, do you mind posting the values you're now seeing?

dsikka

This looks good overall. Happy to see the wrapper go.

My only concern is the discrepancy you're regression testing.
Would be good to get a sense of how much variation we're seeing with these changes.

src/llmcompressor/modifiers/utils/compression_wrapper.py

src/llmcompressor/modifiers/utils/pytorch_helpers.py

kylesayrs · 2025-02-05T17:17:51Z

In terms of regression testing being different, do you mind posting the values you're now seeing?

You can see the values by expanding the collapsables in the PR

src/llmcompressor/modifiers/obcq/sgpt_mixin.py

Signed-off-by: Kyle Sayers <[email protected]>

…-compressor

kylesayrs · 2025-02-05T21:32:09Z

The test failure on 5fb18d9 seems to be ephemeral, as it wasn't able to be replicated locally and was fixed by the subsequent merge commit

dsikka · 2025-02-05T21:42:04Z

The test failure on 5fb18d9 seems to be ephemeral, as it wasn't able to be replicated locally and was fixed by the subsequent merge commit

auto-merge is enabled. should merge in once testing is finished

kylesayrs added 5 commits December 31, 2024 16:14

extend remove_hooks to remove subsets

de278ce

Signed-off-by: Kyle Sayers <[email protected]>

change arg type

2754145

Signed-off-by: Kyle Sayers <[email protected]>

implement keep argument

b2e98c3

Signed-off-by: Kyle Sayers <[email protected]>

use lazy value assignment rather than container, update docstring

3ab5323

Signed-off-by: Kyle Sayers <[email protected]>

make keeps composable

b605db5

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs changed the base branch from main to kylesayrs/gptq-partition January 6, 2025 22:24

Base automatically changed from kylesayrs/gptq-partition to main January 8, 2025 22:15

kylesayrs mentioned this pull request Jan 10, 2025

[MoE] GPTQ compress using callback not hook #1049

Merged

kylesayrs force-pushed the kylesayrs/remove-layer-compressor branch from d8c3261 to 08d700c Compare January 13, 2025 21:41

kylesayrs marked this pull request as ready for review January 13, 2025 23:10

kylesayrs marked this pull request as draft January 13, 2025 23:15

kylesayrs marked this pull request as ready for review January 14, 2025 04:31

kylesayrs requested review from dsikka, rahul-tuli and horheynm January 14, 2025 04:33

kylesayrs self-assigned this Jan 14, 2025

squash

59bdb66

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs force-pushed the kylesayrs/remove-layer-compressor branch from 71067ad to 59bdb66 Compare January 23, 2025 17:56

Merge remote-tracking branch 'origin' into kylesayrs/remove-layer-com…

669965e

…pressor

kylesayrs added the ready When a PR is ready for review label Jan 23, 2025

kylesayrs added 4 commits January 24, 2025 00:15

Merge branch 'main' into kylesayrs/hooks-mixin-keep

54067ab

Merge branch 'main' into kylesayrs/hooks-mixin-remove-subsets

1b11b54

Merge branch 'main' into kylesayrs/hooks-mixin-keep

e3623cc

Merge branch 'main' into kylesayrs/remove-layer-compressor

e12d4da

kylesayrs marked this pull request as draft January 26, 2025 16:54

kylesayrs removed the ready When a PR is ready for review label Jan 26, 2025

kylesayrs added 3 commits January 27, 2025 15:24

fix tests

f4f3d26

Signed-off-by: Kyle Sayers <[email protected]>

Merge remote-tracking branch 'origin' into kylesayrs/remove-layer-com…

46cc9bc

…pressor

style

1eea2ab

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs added 5 commits January 29, 2025 05:19

Merge branch 'kylesayrs/hooks-mixin-remove-subsets', remote-tracking …

2d6e366

…branch 'origin' into kylesayrs/hooks-mixin-keep

fix merge

5070615

Signed-off-by: Kyle Sayers <[email protected]>

Merge remote-tracking branch 'origin' into kylesayrs/remove-layer-com…

077c68e

…pressor

Merge remote-tracking branch 'origin' into kylesayrs/hooks-mixin-keep

922ea62

ensure the random weight is not 24 sparse

ecee510

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs added the ready When a PR is ready for review label Jan 31, 2025

remove leftover comment

54fd6fb

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs mentioned this pull request Jan 31, 2025

Move SparseGPTModifier location with backwards compatibility #919

Closed

kylesayrs and others added 3 commits February 3, 2025 20:43

add ignore

20c6c00

Signed-off-by: Kyle Sayers <[email protected]>

update docstring with more examples

ea4f2a2

Signed-off-by: Kyle Sayers <[email protected]>

Merge branch 'main' into kylesayrs/hooks-mixin-keep

46ae8eb

dsikka requested changes Feb 4, 2025

View reviewed changes

src/llmcompressor/modifiers/utils/compression_wrapper.py Show resolved Hide resolved

src/llmcompressor/modifiers/utils/pytorch_helpers.py Show resolved Hide resolved

rahul-tuli previously approved these changes Feb 5, 2025

View reviewed changes

src/llmcompressor/modifiers/obcq/sgpt_mixin.py Show resolved Hide resolved

kylesayrs requested a review from dsikka February 5, 2025 18:31

kylesayrs added 2 commits February 5, 2025 18:39

use immutable default

a2934b3

Signed-off-by: Kyle Sayers <[email protected]>

Merge branch 'kylesayrs/hooks-mixin-keep' into kylesayrs/remove-layer…

da9df2e

…-compressor

kylesayrs dismissed rahul-tuli’s stale review via da9df2e February 5, 2025 18:40

rahul-tuli approved these changes Feb 5, 2025

View reviewed changes

Merge branch 'main' into kylesayrs/remove-layer-compressor

5fb18d9

dsikka approved these changes Feb 5, 2025

View reviewed changes

dsikka enabled auto-merge (squash) February 5, 2025 20:43

Merge branch 'main' into kylesayrs/remove-layer-compressor

ab00e52

dsikka merged commit f807a2a into main Feb 5, 2025
7 checks passed

dsikka deleted the kylesayrs/remove-layer-compressor branch February 5, 2025 22:17

kylesayrs mentioned this pull request Feb 8, 2025

[Bugfix] Support model offloading SparseGPTQ #918

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace LayerCompressor with HooksMixin #1038

Replace LayerCompressor with HooksMixin #1038

kylesayrs commented Jan 6, 2025 •

edited

Loading

github-actions bot commented Jan 6, 2025

kylesayrs commented Jan 31, 2025

dsikka commented Feb 4, 2025

Purpose

Prerequisites

Changes

Interface/ Features

Implementations

Tests

Testing

Potential Follow ups

Regression Testing

Evaluation

dsikka left a comment

kylesayrs commented Feb 5, 2025

kylesayrs commented Feb 5, 2025

dsikka commented Feb 5, 2025

Replace LayerCompressor with HooksMixin #1038

Replace LayerCompressor with HooksMixin #1038

Conversation

kylesayrs commented Jan 6, 2025 • edited Loading

Purpose

Prerequisites

Changes

Interface/ Features

Implementations

Tests

Testing

Potential Follow ups

Regression Testing

Evaluation

github-actions bot commented Jan 6, 2025

kylesayrs commented Jan 31, 2025

dsikka commented Feb 4, 2025

Purpose

Prerequisites

Changes

Interface/ Features

Implementations

Tests

Testing

Potential Follow ups

Regression Testing

Evaluation

dsikka left a comment

Choose a reason for hiding this comment

kylesayrs commented Feb 5, 2025

kylesayrs commented Feb 5, 2025

dsikka commented Feb 5, 2025

kylesayrs commented Jan 6, 2025 •

edited

Loading