Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend disable_hooks to keep subsets #1023

Merged
merged 15 commits into from
Feb 5, 2025
Merged

Conversation

kylesayrs
Copy link
Collaborator

@kylesayrs kylesayrs commented Jan 1, 2025

Purpose

  • Allow subsets of handles to remain active, as is needed in the case of wanda
def _get_activations(self, model, dataloader, nsamples=128):
  def save_acts(module, input, name):
      ...
  
  hooks = set(
      self.register_hook(mod, partial(save_acts, name=name), "forward_pre")
      for name, mod in self.model.named_modules()
      if isinstance(mod, torch.nn.Linear) and "lm_head" not in name
  )
  # in the future, if the user puts wanda after another modifier,
  # initialize will run after other modifiers have added hooks

  # want to disable hooks from other modifiers, but keep the ones we just added
  with HooksMixin.disable_hooks(keep=hooks):
      run_basic(model, dataloader)
  self.remove_hooks(hooks)

Prerequisites

Postrequisites

  • Layer compressor deprecation

Changes

  • Add a _HOOKS_KEEP_ENABLED class variable

Tests

  • Added test_disable_hooks_keep in tests

Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Copy link

github-actions bot commented Jan 1, 2025

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

@kylesayrs kylesayrs self-assigned this Jan 1, 2025
@kylesayrs kylesayrs added the ready When a PR is ready for review label Jan 19, 2025
rahul-tuli
rahul-tuli previously approved these changes Jan 29, 2025
Copy link
Collaborator

@rahul-tuli rahul-tuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new changes LGTM! Great work @kylesayrs

@kylesayrs kylesayrs changed the base branch from main to kylesayrs/hooks-mixin-remove-subsets January 29, 2025 18:32
@kylesayrs kylesayrs changed the base branch from kylesayrs/hooks-mixin-remove-subsets to main January 29, 2025 18:34
@kylesayrs kylesayrs dismissed rahul-tuli’s stale review January 29, 2025 18:34

The base branch was changed.

mgoin pushed a commit that referenced this pull request Jan 29, 2025
## Purpose ##
* Allow subsets of hooks to be removed
* Not strictly needed but helps promote code clarity in the case of
wanda which adds and removes subsets of hooks at different times.

## Postrequisites ##
* #1023
* Layer compressor deprecation

## Changes ##
* Change the datatype of `_hooks` from `List` to `Set`
* Add `handles` argument to `HooksMixin.remove_hooks`

## Testing ##
* Added `test_remove_hooks_parameterized` test

---------

Signed-off-by: Kyle Sayers <[email protected]>
@kylesayrs kylesayrs requested a review from dsikka February 3, 2025 20:59
dsikka
dsikka previously approved these changes Feb 5, 2025
@dsikka dsikka requested a review from rahul-tuli February 5, 2025 17:20
rahul-tuli
rahul-tuli previously approved these changes Feb 5, 2025
Copy link
Collaborator

@rahul-tuli rahul-tuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending @brian-dellabetta 's comment on defaulting to None

Signed-off-by: Kyle Sayers <[email protected]>
@kylesayrs kylesayrs dismissed stale reviews from rahul-tuli and dsikka via a2934b3 February 5, 2025 18:39
Copy link
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@dsikka dsikka merged commit c4e213a into main Feb 5, 2025
6 of 7 checks passed
@dsikka dsikka deleted the kylesayrs/hooks-mixin-keep branch February 5, 2025 19:57
dsikka added a commit that referenced this pull request Feb 5, 2025
## Purpose ##
* Remove layer compressor to decouple modifiers from data pipelines
* Reduce abstractions
* Support VLMs with SparseGPT and Wanda

## Prerequisites ##
* #1021
* #1023
* #1068
* #1030

## Changes ##
### Interface/ Features ###
* SparseGPT and Wanda now both support VLM architectures
* Added `sequential_targets` to match GPTQ and made `targets` an alias
* Support hessian offloading for `SparseGPT`
* Add customized `_LinAlgError` for `SparseGPT`

### Implementations ###
* Changed implementation styles of `SparseGPTModifier` and
`WandaPruningModifier` to match `GPTQModifier`
* Removed `LayerCompressor`, `ModuleCompressionWrapper`,
`SparseGptWrapper`, and `WandaWrapper`
* Shared implementations between SparseGPT and Wanda are implemented by
the `SparsityModifierMixin`
* Removed lines blocking `allow_tf32`
* Maybe @rahul-tuli knows why this was originally implemented,
potentially to avoid hardware issues?
* This change was only present for wanda. Given that all other modifiers
do not have this change, I see no reason why it should stay
* Updated sparsegpt tests to reflect new implementation

### Tests ###
* Updated obcq tests to reflect new implementations
* Removed `test_sgpt_defaults.py` since this test doesn't test anything
new or novel about this modifier

## Testing ##
* `grep -r
"LayerCompressor\|ModuleCompressionWrapper\|SparseGptWrapper\|WandaWrapper"
src/ examples/ tests/`
* Modified `test_invalid_layerwise_recipes_raise_exceptions` and
`test_successful_layerwise_recipe` pass
* `llama3_8b_2of4.py` passes and was evaluated with both SparseGPT and
Wanda

## Potential Follow ups ##
* Add module `targets` and `ignore` to SparseGPT and Wanda

## Regression Testing ##
The hessian, row scalar, and compressed weight values were confirmed to
be unchanged in the case that of one calibration sample. The final
evaluations are different, which is likely due to numerical imprecision
(dividing by int vs torch.int), different pipelines (different subgraph
partitions => different imprecision from cpu offloading, potentially
different module arguments).

### Evaluation
Models were compressed using
`examples/sparse_2of4_quantization_fp8/llama3_8b_2of4.py`
<details><summary>sparsegpt</summary>

Main
```
hf (pretrained=/home/ksayers/llm-compressor/old_Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1                                                           
|  Tasks   |Version|Filter|n-shot|Metric|   |Value |   |Stderr|                                                        
|----------|------:|------|-----:|------|---|-----:|---|-----:|                                                        
|winogrande|      1|none  |     5|acc   |?  |0.5391|?  | 0.014|
```

Branch
```
hf (pretrained=/home/ksayers/llm-compressor/new_Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|  Tasks   |Version|Filter|n-shot|Metric|   |Value|   |Stderr|
|----------|------:|------|-----:|------|---|----:|---|-----:|
|winogrande|      1|none  |     5|acc   |?  |0.547|?  | 0.014|
```
</details>

To test wanda, the `SparseGPTModifier` was replaced with the
`WandaPruningModifier`

<details><summary>wanda</summary>

Main
```
hf (pretrained=/home/kyle/old_llm-compressor/Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|  Tasks   |Version|Filter|n-shot|Metric|   |Value|   |Stderr|
|----------|------:|------|-----:|------|---|----:|---|-----:|
|winogrande|      1|none  |     5|acc   |↑  |0.532|±  | 0.014|
```

Branch
```
hf (pretrained=/home/kyle/llm-compressor/Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|  Tasks   |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|----------|------:|------|-----:|------|---|-----:|---|-----:|
|winogrande|      1|none  |     5|acc   |↑  |0.5414|±  | 0.014|
```
</details>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready When a PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants