-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace LayerCompressor with HooksMixin #1038
Conversation
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. |
d8c3261
to
08d700c
Compare
Signed-off-by: Kyle Sayers <[email protected]>
71067ad
to
59bdb66
Compare
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
…branch 'origin' into kylesayrs/hooks-mixin-keep
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Looks like adding this change fixed the test weights = torch.rand(10, 4)
if is_24:
weights = _make_24_sparse(weights)
else:
weights[0, :] = torch.ones(4, ) # guarantee not 24 sparse The most likely explanation is that this test randomly fails, and that this PR happened to be unlucky |
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
In terms of regression testing being different, do you mind posting the values you're now seeing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good overall. Happy to see the wrapper go.
My only concern is the discrepancy you're regression testing.
Would be good to get a sense of how much variation we're seeing with these changes.
You can see the values by expanding the collapsables in the PR |
Signed-off-by: Kyle Sayers <[email protected]>
The test failure on 5fb18d9 seems to be ephemeral, as it wasn't able to be replicated locally and was fixed by the subsequent merge commit |
auto-merge is enabled. should merge in once testing is finished |
Purpose
Prerequisites
remove_hooks
to remove subsets #1021disable_hooks
to keep subsets #1023Changes
Interface/ Features
sequential_targets
to match GPTQ and madetargets
an aliasSparseGPT
_LinAlgError
forSparseGPT
Implementations
SparseGPTModifier
andWandaPruningModifier
to matchGPTQModifier
LayerCompressor
,ModuleCompressionWrapper
,SparseGptWrapper
, andWandaWrapper
SparsityModifierMixin
allow_tf32
Tests
test_sgpt_defaults.py
since this test doesn't test anything new or novel about this modifierTesting
grep -r "LayerCompressor\|ModuleCompressionWrapper\|SparseGptWrapper\|WandaWrapper" src/ examples/ tests/
test_invalid_layerwise_recipes_raise_exceptions
andtest_successful_layerwise_recipe
passllama3_8b_2of4.py
passes and was evaluated with both SparseGPT and WandaPotential Follow ups
targets
andignore
to SparseGPT and WandaRegression Testing
The hessian, row scalar, and compressed weight values were confirmed to be unchanged in the case that of one calibration sample. The final evaluations are different, which is likely due to numerical imprecision (dividing by int vs torch.int), different pipelines (different subgraph partitions => different imprecision from cpu offloading, potentially different module arguments).
Evaluation
Models were compressed using
examples/sparse_2of4_quantization_fp8/llama3_8b_2of4.py
sparsegpt
Main
Branch
To test wanda, the
SparseGPTModifier
was replaced with theWandaPruningModifier
wanda
Main
Branch