Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend remove_hooks to remove subsets #1021

Merged
merged 4 commits into from
Jan 29, 2025
Merged

Conversation

kylesayrs
Copy link
Collaborator

@kylesayrs kylesayrs commented Dec 31, 2024

Purpose

  • Allow subsets of hooks to be removed
  • Not strictly needed but helps promote code clarity in the case of wanda which adds and removes subsets of hooks at different times.

Postrequisites

Changes

  • Change the datatype of _hooks from List to Set
  • Add handles argument to HooksMixin.remove_hooks

Testing

  • Added test_remove_hooks_parameterized test

Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Signed-off-by: Kyle Sayers <[email protected]>
@kylesayrs kylesayrs changed the title Extend remove_hooks to remove subsets Extend remove_hooks to remove subsets Jan 1, 2025
@kylesayrs kylesayrs self-assigned this Jan 1, 2025
@kylesayrs kylesayrs added the ready When a PR is ready for review label Jan 19, 2025
@mgoin mgoin merged commit 507b1a4 into main Jan 29, 2025
7 checks passed
@mgoin mgoin deleted the kylesayrs/hooks-mixin-remove-subsets branch January 29, 2025 23:15
dsikka added a commit that referenced this pull request Feb 5, 2025
## Purpose ##
* Allow subsets of handles to remain active, as is needed in the case of
wanda

```python3
def _get_activations(self, model, dataloader, nsamples=128):
  def save_acts(module, input, name):
      ...
  
  hooks = set(
      self.register_hook(mod, partial(save_acts, name=name), "forward_pre")
      for name, mod in self.model.named_modules()
      if isinstance(mod, torch.nn.Linear) and "lm_head" not in name
  )
  # in the future, if the user puts wanda after another modifier,
  # initialize will run after other modifiers have added hooks

  # want to disable hooks from other modifiers, but keep the ones we just added
  with HooksMixin.disable_hooks(keep=hooks):
      run_basic(model, dataloader)
  self.remove_hooks(hooks)
```


## Prerequisites ##
* #1021

## Postrequisites ##
* Layer compressor deprecation

## Changes ##
* Add a `_HOOKS_KEEP_ENABLED` class variable

## Tests ##
* Added `test_disable_hooks_keep` in tests

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
dsikka added a commit that referenced this pull request Feb 5, 2025
## Purpose ##
* Remove layer compressor to decouple modifiers from data pipelines
* Reduce abstractions
* Support VLMs with SparseGPT and Wanda

## Prerequisites ##
* #1021
* #1023
* #1068
* #1030

## Changes ##
### Interface/ Features ###
* SparseGPT and Wanda now both support VLM architectures
* Added `sequential_targets` to match GPTQ and made `targets` an alias
* Support hessian offloading for `SparseGPT`
* Add customized `_LinAlgError` for `SparseGPT`

### Implementations ###
* Changed implementation styles of `SparseGPTModifier` and
`WandaPruningModifier` to match `GPTQModifier`
* Removed `LayerCompressor`, `ModuleCompressionWrapper`,
`SparseGptWrapper`, and `WandaWrapper`
* Shared implementations between SparseGPT and Wanda are implemented by
the `SparsityModifierMixin`
* Removed lines blocking `allow_tf32`
* Maybe @rahul-tuli knows why this was originally implemented,
potentially to avoid hardware issues?
* This change was only present for wanda. Given that all other modifiers
do not have this change, I see no reason why it should stay
* Updated sparsegpt tests to reflect new implementation

### Tests ###
* Updated obcq tests to reflect new implementations
* Removed `test_sgpt_defaults.py` since this test doesn't test anything
new or novel about this modifier

## Testing ##
* `grep -r
"LayerCompressor\|ModuleCompressionWrapper\|SparseGptWrapper\|WandaWrapper"
src/ examples/ tests/`
* Modified `test_invalid_layerwise_recipes_raise_exceptions` and
`test_successful_layerwise_recipe` pass
* `llama3_8b_2of4.py` passes and was evaluated with both SparseGPT and
Wanda

## Potential Follow ups ##
* Add module `targets` and `ignore` to SparseGPT and Wanda

## Regression Testing ##
The hessian, row scalar, and compressed weight values were confirmed to
be unchanged in the case that of one calibration sample. The final
evaluations are different, which is likely due to numerical imprecision
(dividing by int vs torch.int), different pipelines (different subgraph
partitions => different imprecision from cpu offloading, potentially
different module arguments).

### Evaluation
Models were compressed using
`examples/sparse_2of4_quantization_fp8/llama3_8b_2of4.py`
<details><summary>sparsegpt</summary>

Main
```
hf (pretrained=/home/ksayers/llm-compressor/old_Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1                                                           
|  Tasks   |Version|Filter|n-shot|Metric|   |Value |   |Stderr|                                                        
|----------|------:|------|-----:|------|---|-----:|---|-----:|                                                        
|winogrande|      1|none  |     5|acc   |?  |0.5391|?  | 0.014|
```

Branch
```
hf (pretrained=/home/ksayers/llm-compressor/new_Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|  Tasks   |Version|Filter|n-shot|Metric|   |Value|   |Stderr|
|----------|------:|------|-----:|------|---|----:|---|-----:|
|winogrande|      1|none  |     5|acc   |?  |0.547|?  | 0.014|
```
</details>

To test wanda, the `SparseGPTModifier` was replaced with the
`WandaPruningModifier`

<details><summary>wanda</summary>

Main
```
hf (pretrained=/home/kyle/old_llm-compressor/Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|  Tasks   |Version|Filter|n-shot|Metric|   |Value|   |Stderr|
|----------|------:|------|-----:|------|---|----:|---|-----:|
|winogrande|      1|none  |     5|acc   |↑  |0.532|±  | 0.014|
```

Branch
```
hf (pretrained=/home/kyle/llm-compressor/Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|  Tasks   |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|----------|------:|------|-----:|------|---|-----:|---|-----:|
|winogrande|      1|none  |     5|acc   |↑  |0.5414|±  | 0.014|
```
</details>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready When a PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants