Extend `remove_hooks` to remove subsets #1021

kylesayrs · 2024-12-31T21:16:22Z

Purpose

Allow subsets of hooks to be removed
Not strictly needed but helps promote code clarity in the case of wanda which adds and removes subsets of hooks at different times.

Postrequisites

Extend disable_hooks to keep subsets #1023
Layer compressor deprecation

Changes

Change the datatype of _hooks from List to Set
Add handles argument to HooksMixin.remove_hooks

Testing

Added test_remove_hooks_parameterized test

Signed-off-by: Kyle Sayers <[email protected]>

github-actions · 2024-12-31T21:16:34Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Signed-off-by: Kyle Sayers <[email protected]>

## Purpose ## * Allow subsets of handles to remain active, as is needed in the case of wanda ```python3 def _get_activations(self, model, dataloader, nsamples=128): def save_acts(module, input, name): ... hooks = set( self.register_hook(mod, partial(save_acts, name=name), "forward_pre") for name, mod in self.model.named_modules() if isinstance(mod, torch.nn.Linear) and "lm_head" not in name ) # in the future, if the user puts wanda after another modifier, # initialize will run after other modifiers have added hooks # want to disable hooks from other modifiers, but keep the ones we just added with HooksMixin.disable_hooks(keep=hooks): run_basic(model, dataloader) self.remove_hooks(hooks) ``` ## Prerequisites ## * #1021 ## Postrequisites ## * Layer compressor deprecation ## Changes ## * Add a `_HOOKS_KEEP_ENABLED` class variable ## Tests ## * Added `test_disable_hooks_keep` in tests --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]>

@rahul-tuli

## Purpose ## * Remove layer compressor to decouple modifiers from data pipelines * Reduce abstractions * Support VLMs with SparseGPT and Wanda ## Prerequisites ## * #1021 * #1023 * #1068 * #1030 ## Changes ## ### Interface/ Features ### * SparseGPT and Wanda now both support VLM architectures * Added `sequential_targets` to match GPTQ and made `targets` an alias * Support hessian offloading for `SparseGPT` * Add customized `_LinAlgError` for `SparseGPT` ### Implementations ### * Changed implementation styles of `SparseGPTModifier` and `WandaPruningModifier` to match `GPTQModifier` * Removed `LayerCompressor`, `ModuleCompressionWrapper`, `SparseGptWrapper`, and `WandaWrapper` * Shared implementations between SparseGPT and Wanda are implemented by the `SparsityModifierMixin` * Removed lines blocking `allow_tf32` * Maybe @rahul-tuli knows why this was originally implemented, potentially to avoid hardware issues? * This change was only present for wanda. Given that all other modifiers do not have this change, I see no reason why it should stay * Updated sparsegpt tests to reflect new implementation ### Tests ### * Updated obcq tests to reflect new implementations * Removed `test_sgpt_defaults.py` since this test doesn't test anything new or novel about this modifier ## Testing ## * `grep -r "LayerCompressor\|ModuleCompressionWrapper\|SparseGptWrapper\|WandaWrapper" src/ examples/ tests/` * Modified `test_invalid_layerwise_recipes_raise_exceptions` and `test_successful_layerwise_recipe` pass * `llama3_8b_2of4.py` passes and was evaluated with both SparseGPT and Wanda ## Potential Follow ups ## * Add module `targets` and `ignore` to SparseGPT and Wanda ## Regression Testing ## The hessian, row scalar, and compressed weight values were confirmed to be unchanged in the case that of one calibration sample. The final evaluations are different, which is likely due to numerical imprecision (dividing by int vs torch.int), different pipelines (different subgraph partitions => different imprecision from cpu offloading, potentially different module arguments). ### Evaluation Models were compressed using `examples/sparse_2of4_quantization_fp8/llama3_8b_2of4.py` <details><summary>sparsegpt</summary> Main ``` hf (pretrained=/home/ksayers/llm-compressor/old_Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1 | Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr| |----------|------:|------|-----:|------|---|-----:|---|-----:| |winogrande| 1|none | 5|acc |? |0.5391|? | 0.014| ``` Branch ``` hf (pretrained=/home/ksayers/llm-compressor/new_Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1 | Tasks |Version|Filter|n-shot|Metric| |Value| |Stderr| |----------|------:|------|-----:|------|---|----:|---|-----:| |winogrande| 1|none | 5|acc |? |0.547|? | 0.014| ``` </details> To test wanda, the `SparseGPTModifier` was replaced with the `WandaPruningModifier` <details><summary>wanda</summary> Main ``` hf (pretrained=/home/kyle/old_llm-compressor/Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1 | Tasks |Version|Filter|n-shot|Metric| |Value| |Stderr| |----------|------:|------|-----:|------|---|----:|---|-----:| |winogrande| 1|none | 5|acc |↑ |0.532|± | 0.014| ``` Branch ``` hf (pretrained=/home/kyle/llm-compressor/Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1 | Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr| |----------|------:|------|-----:|------|---|-----:|---|-----:| |winogrande| 1|none | 5|acc |↑ |0.5414|± | 0.014| ``` </details> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]>

extend remove_hooks to remove subsets

de278ce

Signed-off-by: Kyle Sayers <[email protected]>

change arg type

2754145

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs changed the title ~~Extend remove_hooks to remove subsets~~ Extend remove_hooks to remove subsets Jan 1, 2025

kylesayrs mentioned this pull request Jan 1, 2025

Extend disable_hooks to keep subsets #1023

Merged

kylesayrs self-assigned this Jan 1, 2025

kylesayrs mentioned this pull request Jan 14, 2025

Replace LayerCompressor with HooksMixin #1038

Merged

kylesayrs added the ready When a PR is ready for review label Jan 19, 2025

kylesayrs added 2 commits January 24, 2025 11:38

Merge branch 'main' into kylesayrs/hooks-mixin-remove-subsets

1b11b54

update docstring

eb83e67

Signed-off-by: Kyle Sayers <[email protected]>

rahul-tuli approved these changes Jan 29, 2025

View reviewed changes

mgoin approved these changes Jan 29, 2025

View reviewed changes

mgoin merged commit 507b1a4 into main Jan 29, 2025
7 checks passed

mgoin deleted the kylesayrs/hooks-mixin-remove-subsets branch January 29, 2025 23:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend `remove_hooks` to remove subsets #1021

Extend `remove_hooks` to remove subsets #1021

kylesayrs commented Dec 31, 2024 •

edited

Loading

github-actions bot commented Dec 31, 2024

Extend remove_hooks to remove subsets #1021

Extend remove_hooks to remove subsets #1021

Conversation

kylesayrs commented Dec 31, 2024 • edited Loading

Purpose

Postrequisites

Changes

Testing

github-actions bot commented Dec 31, 2024

Extend `remove_hooks` to remove subsets #1021

Extend `remove_hooks` to remove subsets #1021

kylesayrs commented Dec 31, 2024 •

edited

Loading