Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace LayerCompressor with HooksMixin #1038

Merged
merged 29 commits into from
Feb 5, 2025
Merged

Conversation

kylesayrs
Copy link
Collaborator

@kylesayrs kylesayrs commented Jan 6, 2025

Purpose

  • Remove layer compressor to decouple modifiers from data pipelines
  • Reduce abstractions
  • Support VLMs with SparseGPT and Wanda

Prerequisites

Changes

Interface/ Features

  • SparseGPT and Wanda now both support VLM architectures
  • Added sequential_targets to match GPTQ and made targets an alias
  • Support hessian offloading for SparseGPT
  • Add customized _LinAlgError for SparseGPT

Implementations

  • Changed implementation styles of SparseGPTModifier and WandaPruningModifier to match GPTQModifier
  • Removed LayerCompressor, ModuleCompressionWrapper, SparseGptWrapper, and WandaWrapper
  • Shared implementations between SparseGPT and Wanda are implemented by the SparsityModifierMixin
  • Removed lines blocking allow_tf32
    • Maybe @rahul-tuli knows why this was originally implemented, potentially to avoid hardware issues?
    • This change was only present for wanda. Given that all other modifiers do not have this change, I see no reason why it should stay
  • Updated sparsegpt tests to reflect new implementation

Tests

  • Updated obcq tests to reflect new implementations
  • Removed test_sgpt_defaults.py since this test doesn't test anything new or novel about this modifier

Testing

  • grep -r "LayerCompressor\|ModuleCompressionWrapper\|SparseGptWrapper\|WandaWrapper" src/ examples/ tests/
  • Modified test_invalid_layerwise_recipes_raise_exceptions and test_successful_layerwise_recipe pass
  • llama3_8b_2of4.py passes and was evaluated with both SparseGPT and Wanda

Potential Follow ups

  • Add module targets and ignore to SparseGPT and Wanda

Regression Testing

The hessian, row scalar, and compressed weight values were confirmed to be unchanged in the case that of one calibration sample. The final evaluations are different, which is likely due to numerical imprecision (dividing by int vs torch.int), different pipelines (different subgraph partitions => different imprecision from cpu offloading, potentially different module arguments).

Evaluation

Models were compressed using examples/sparse_2of4_quantization_fp8/llama3_8b_2of4.py

sparsegpt

Main

hf (pretrained=/home/ksayers/llm-compressor/old_Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1                                                           
|  Tasks   |Version|Filter|n-shot|Metric|   |Value |   |Stderr|                                                        
|----------|------:|------|-----:|------|---|-----:|---|-----:|                                                        
|winogrande|      1|none  |     5|acc   |?  |0.5391|?  | 0.014|

Branch

hf (pretrained=/home/ksayers/llm-compressor/new_Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|  Tasks   |Version|Filter|n-shot|Metric|   |Value|   |Stderr|
|----------|------:|------|-----:|------|---|----:|---|-----:|
|winogrande|      1|none  |     5|acc   |?  |0.547|?  | 0.014|

To test wanda, the SparseGPTModifier was replaced with the WandaPruningModifier

wanda

Main

hf (pretrained=/home/kyle/old_llm-compressor/Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|  Tasks   |Version|Filter|n-shot|Metric|   |Value|   |Stderr|
|----------|------:|------|-----:|------|---|----:|---|-----:|
|winogrande|      1|none  |     5|acc   |↑  |0.532|±  | 0.014|

Branch

hf (pretrained=/home/kyle/llm-compressor/Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|  Tasks   |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|----------|------:|------|-----:|------|---|-----:|---|-----:|
|winogrande|      1|none  |     5|acc   |↑  |0.5414|±  | 0.014|

Copy link

github-actions bot commented Jan 6, 2025

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

@kylesayrs kylesayrs changed the base branch from main to kylesayrs/gptq-partition January 6, 2025 22:24
Base automatically changed from kylesayrs/gptq-partition to main January 8, 2025 22:15
@kylesayrs kylesayrs force-pushed the kylesayrs/remove-layer-compressor branch from d8c3261 to 08d700c Compare January 13, 2025 21:41
@kylesayrs kylesayrs marked this pull request as ready for review January 13, 2025 23:10
@kylesayrs kylesayrs marked this pull request as draft January 13, 2025 23:15
@kylesayrs kylesayrs marked this pull request as ready for review January 14, 2025 04:31
@kylesayrs kylesayrs self-assigned this Jan 14, 2025
Signed-off-by: Kyle Sayers <[email protected]>
@kylesayrs kylesayrs force-pushed the kylesayrs/remove-layer-compressor branch from 71067ad to 59bdb66 Compare January 23, 2025 17:56
@kylesayrs kylesayrs added the ready When a PR is ready for review label Jan 23, 2025
@kylesayrs kylesayrs marked this pull request as draft January 26, 2025 16:54
@kylesayrs kylesayrs removed the ready When a PR is ready for review label Jan 26, 2025
@kylesayrs kylesayrs added the ready When a PR is ready for review label Jan 31, 2025
Signed-off-by: Kyle Sayers <[email protected]>
@kylesayrs
Copy link
Collaborator Author

Looks like adding this change fixed the test

weights = torch.rand(10, 4)
    if is_24:
        weights = _make_24_sparse(weights)
    else:
        weights[0, :] = torch.ones(4, )  # guarantee not 24 sparse

The most likely explanation is that this test randomly fails, and that this PR happened to be unlucky

@dsikka
Copy link
Collaborator

dsikka commented Feb 4, 2025

Purpose

  • Remove layer compressor to decouple modifiers from data pipelines
  • Reduce abstractions
  • Support VLMs with SparseGPT and Wanda

Prerequisites

Changes

Interface/ Features

  • SparseGPT and Wanda now both support VLM architectures
  • Added sequential_targets to match GPTQ and made targets an alias
  • Support hessian offloading for SparseGPT
  • Add customized _LinAlgError for SparseGPT

Implementations

  • Changed implementation styles of SparseGPTModifier and WandaPruningModifier to match GPTQModifier

  • Removed LayerCompressor, ModuleCompressionWrapper, SparseGptWrapper, and WandaWrapper

  • Shared implementations between SparseGPT and Wanda are implemented by the SparsityModifierMixin

  • Removed lines blocking allow_tf32

    • Maybe @rahul-tuli knows why this was originally implemented, potentially to avoid hardware issues?
    • This change was only present for wanda. Given that all other modifiers do not have this change, I see no reason why it should stay
  • Updated sparsegpt tests to reflect new implementation

Tests

  • Updated obcq tests to reflect new implementations
  • Removed test_sgpt_defaults.py since this test doesn't test anything new or novel about this modifier

Testing

  • grep -r "LayerCompressor\|ModuleCompressionWrapper\|SparseGptWrapper\|WandaWrapper" src/ examples/ tests/
  • Modified test_invalid_layerwise_recipes_raise_exceptions and test_successful_layerwise_recipe pass
  • llama3_8b_2of4.py passes and was evaluated with both SparseGPT and Wanda

Potential Follow ups

  • Add module targets and ignore to SparseGPT and Wanda

Regression Testing

The hessian, row scalar, and compressed weight values were confirmed to be unchanged in the case that of one calibration sample. The final evaluations are different, which is likely due to numerical imprecision (dividing by int vs torch.int), different pipelines (different subgraph partitions => different imprecision from cpu offloading, potentially different module arguments).

Evaluation

Models were compressed using examples/sparse_2of4_quantization_fp8/llama3_8b_2of4.py

sparsegpt
To test wanda, the SparseGPTModifier was replaced with the WandaPruningModifier

wanda

In terms of regression testing being different, do you mind posting the values you're now seeing?

Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good overall. Happy to see the wrapper go.

My only concern is the discrepancy you're regression testing.
Would be good to get a sense of how much variation we're seeing with these changes.

@kylesayrs
Copy link
Collaborator Author

In terms of regression testing being different, do you mind posting the values you're now seeing?

You can see the values by expanding the collapsables in the PR

rahul-tuli
rahul-tuli previously approved these changes Feb 5, 2025
@kylesayrs kylesayrs requested a review from dsikka February 5, 2025 18:31
@dsikka dsikka enabled auto-merge (squash) February 5, 2025 20:43
@kylesayrs
Copy link
Collaborator Author

The test failure on 5fb18d9 seems to be ephemeral, as it wasn't able to be replicated locally and was fixed by the subsequent merge commit

@dsikka
Copy link
Collaborator

dsikka commented Feb 5, 2025

The test failure on 5fb18d9 seems to be ephemeral, as it wasn't able to be replicated locally and was fixed by the subsequent merge commit

auto-merge is enabled. should merge in once testing is finished

@dsikka dsikka merged commit f807a2a into main Feb 5, 2025
7 checks passed
@dsikka dsikka deleted the kylesayrs/remove-layer-compressor branch February 5, 2025 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready When a PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants