GPTQModifier Nits and Code Clarity #1068

kylesayrs · 2025-01-14T04:31:26Z

Purpose

Small nits

Changes

Do not require State to be lazily type checked
Unhoist GPTQModifier kwargs in tests

Testing

Ran examples/quantization_w4a16/llama3_example.py to completion

Signed-off-by: Kyle Sayers <[email protected]>

github-actions · 2025-01-14T04:31:38Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

@rahul-tuli

## Purpose ## * Remove layer compressor to decouple modifiers from data pipelines * Reduce abstractions * Support VLMs with SparseGPT and Wanda ## Prerequisites ## * #1021 * #1023 * #1068 * #1030 ## Changes ## ### Interface/ Features ### * SparseGPT and Wanda now both support VLM architectures * Added `sequential_targets` to match GPTQ and made `targets` an alias * Support hessian offloading for `SparseGPT` * Add customized `_LinAlgError` for `SparseGPT` ### Implementations ### * Changed implementation styles of `SparseGPTModifier` and `WandaPruningModifier` to match `GPTQModifier` * Removed `LayerCompressor`, `ModuleCompressionWrapper`, `SparseGptWrapper`, and `WandaWrapper` * Shared implementations between SparseGPT and Wanda are implemented by the `SparsityModifierMixin` * Removed lines blocking `allow_tf32` * Maybe @rahul-tuli knows why this was originally implemented, potentially to avoid hardware issues? * This change was only present for wanda. Given that all other modifiers do not have this change, I see no reason why it should stay * Updated sparsegpt tests to reflect new implementation ### Tests ### * Updated obcq tests to reflect new implementations * Removed `test_sgpt_defaults.py` since this test doesn't test anything new or novel about this modifier ## Testing ## * `grep -r "LayerCompressor\|ModuleCompressionWrapper\|SparseGptWrapper\|WandaWrapper" src/ examples/ tests/` * Modified `test_invalid_layerwise_recipes_raise_exceptions` and `test_successful_layerwise_recipe` pass * `llama3_8b_2of4.py` passes and was evaluated with both SparseGPT and Wanda ## Potential Follow ups ## * Add module `targets` and `ignore` to SparseGPT and Wanda ## Regression Testing ## The hessian, row scalar, and compressed weight values were confirmed to be unchanged in the case that of one calibration sample. The final evaluations are different, which is likely due to numerical imprecision (dividing by int vs torch.int), different pipelines (different subgraph partitions => different imprecision from cpu offloading, potentially different module arguments). ### Evaluation Models were compressed using `examples/sparse_2of4_quantization_fp8/llama3_8b_2of4.py` <details><summary>sparsegpt</summary> Main ``` hf (pretrained=/home/ksayers/llm-compressor/old_Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1 | Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr| |----------|------:|------|-----:|------|---|-----:|---|-----:| |winogrande| 1|none | 5|acc |? |0.5391|? | 0.014| ``` Branch ``` hf (pretrained=/home/ksayers/llm-compressor/new_Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1 | Tasks |Version|Filter|n-shot|Metric| |Value| |Stderr| |----------|------:|------|-----:|------|---|----:|---|-----:| |winogrande| 1|none | 5|acc |? |0.547|? | 0.014| ``` </details> To test wanda, the `SparseGPTModifier` was replaced with the `WandaPruningModifier` <details><summary>wanda</summary> Main ``` hf (pretrained=/home/kyle/old_llm-compressor/Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1 | Tasks |Version|Filter|n-shot|Metric| |Value| |Stderr| |----------|------:|------|-----:|------|---|----:|---|-----:| |winogrande| 1|none | 5|acc |↑ |0.532|± | 0.014| ``` Branch ``` hf (pretrained=/home/kyle/llm-compressor/Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1 | Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr| |----------|------:|------|-----:|------|---|-----:|---|-----:| |winogrande| 1|none | 5|acc |↑ |0.5414|± | 0.014| ``` </details> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]>

nits

9f4d612

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs mentioned this pull request Jan 14, 2025

Replace LayerCompressor with HooksMixin #1038

Merged

kylesayrs self-assigned this Jan 14, 2025

horheynm approved these changes Jan 17, 2025

View reviewed changes

rahul-tuli approved these changes Jan 17, 2025

View reviewed changes

Merge branch 'main' into kylesayrs/gptq-nits

76e370e

kylesayrs added the ready When a PR is ready for review label Jan 19, 2025

dsikka approved these changes Jan 20, 2025

View reviewed changes

dsikka merged commit 8f91d2c into main Jan 20, 2025
7 of 8 checks passed

dsikka deleted the kylesayrs/gptq-nits branch January 20, 2025 17:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPTQModifier Nits and Code Clarity #1068

GPTQModifier Nits and Code Clarity #1068

kylesayrs commented Jan 14, 2025 •

edited

Loading

github-actions bot commented Jan 14, 2025

GPTQModifier Nits and Code Clarity #1068

GPTQModifier Nits and Code Clarity #1068

Conversation

kylesayrs commented Jan 14, 2025 • edited Loading

Purpose

Changes

Testing

github-actions bot commented Jan 14, 2025

kylesayrs commented Jan 14, 2025 •

edited

Loading