Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPTQModifier Nits and Code Clarity #1068

Merged
merged 2 commits into from
Jan 20, 2025
Merged

GPTQModifier Nits and Code Clarity #1068

merged 2 commits into from
Jan 20, 2025

Conversation

kylesayrs
Copy link
Collaborator

@kylesayrs kylesayrs commented Jan 14, 2025

Purpose

  • Small nits

Changes

  • Do not require State to be lazily type checked
  • Unhoist GPTQModifier kwargs in tests

Testing

  • Ran examples/quantization_w4a16/llama3_example.py to completion

Signed-off-by: Kyle Sayers <[email protected]>
Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

@kylesayrs kylesayrs added the ready When a PR is ready for review label Jan 19, 2025
@dsikka dsikka merged commit 8f91d2c into main Jan 20, 2025
7 of 8 checks passed
@dsikka dsikka deleted the kylesayrs/gptq-nits branch January 20, 2025 17:23
dsikka added a commit that referenced this pull request Feb 5, 2025
## Purpose ##
* Remove layer compressor to decouple modifiers from data pipelines
* Reduce abstractions
* Support VLMs with SparseGPT and Wanda

## Prerequisites ##
* #1021
* #1023
* #1068
* #1030

## Changes ##
### Interface/ Features ###
* SparseGPT and Wanda now both support VLM architectures
* Added `sequential_targets` to match GPTQ and made `targets` an alias
* Support hessian offloading for `SparseGPT`
* Add customized `_LinAlgError` for `SparseGPT`

### Implementations ###
* Changed implementation styles of `SparseGPTModifier` and
`WandaPruningModifier` to match `GPTQModifier`
* Removed `LayerCompressor`, `ModuleCompressionWrapper`,
`SparseGptWrapper`, and `WandaWrapper`
* Shared implementations between SparseGPT and Wanda are implemented by
the `SparsityModifierMixin`
* Removed lines blocking `allow_tf32`
* Maybe @rahul-tuli knows why this was originally implemented,
potentially to avoid hardware issues?
* This change was only present for wanda. Given that all other modifiers
do not have this change, I see no reason why it should stay
* Updated sparsegpt tests to reflect new implementation

### Tests ###
* Updated obcq tests to reflect new implementations
* Removed `test_sgpt_defaults.py` since this test doesn't test anything
new or novel about this modifier

## Testing ##
* `grep -r
"LayerCompressor\|ModuleCompressionWrapper\|SparseGptWrapper\|WandaWrapper"
src/ examples/ tests/`
* Modified `test_invalid_layerwise_recipes_raise_exceptions` and
`test_successful_layerwise_recipe` pass
* `llama3_8b_2of4.py` passes and was evaluated with both SparseGPT and
Wanda

## Potential Follow ups ##
* Add module `targets` and `ignore` to SparseGPT and Wanda

## Regression Testing ##
The hessian, row scalar, and compressed weight values were confirmed to
be unchanged in the case that of one calibration sample. The final
evaluations are different, which is likely due to numerical imprecision
(dividing by int vs torch.int), different pipelines (different subgraph
partitions => different imprecision from cpu offloading, potentially
different module arguments).

### Evaluation
Models were compressed using
`examples/sparse_2of4_quantization_fp8/llama3_8b_2of4.py`
<details><summary>sparsegpt</summary>

Main
```
hf (pretrained=/home/ksayers/llm-compressor/old_Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1                                                           
|  Tasks   |Version|Filter|n-shot|Metric|   |Value |   |Stderr|                                                        
|----------|------:|------|-----:|------|---|-----:|---|-----:|                                                        
|winogrande|      1|none  |     5|acc   |?  |0.5391|?  | 0.014|
```

Branch
```
hf (pretrained=/home/ksayers/llm-compressor/new_Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|  Tasks   |Version|Filter|n-shot|Metric|   |Value|   |Stderr|
|----------|------:|------|-----:|------|---|----:|---|-----:|
|winogrande|      1|none  |     5|acc   |?  |0.547|?  | 0.014|
```
</details>

To test wanda, the `SparseGPTModifier` was replaced with the
`WandaPruningModifier`

<details><summary>wanda</summary>

Main
```
hf (pretrained=/home/kyle/old_llm-compressor/Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|  Tasks   |Version|Filter|n-shot|Metric|   |Value|   |Stderr|
|----------|------:|------|-----:|------|---|----:|---|-----:|
|winogrande|      1|none  |     5|acc   |↑  |0.532|±  | 0.014|
```

Branch
```
hf (pretrained=/home/kyle/llm-compressor/Llama-3.2-1B-Instruct2of4-sparse,dtype=bfloat16,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|  Tasks   |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|----------|------:|------|-----:|------|---|-----:|---|-----:|
|winogrande|      1|none  |     5|acc   |↑  |0.5414|±  | 0.014|
```
</details>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready When a PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants