[e2e] Update vllm tests with additional datasets #1131

brian-dellabetta · 2025-02-07T21:00:03Z

SUMMARY:
Adding a handful more e2e tests with 3 more datasets

neuralmagic/LLM_compression_calibration
garage-bAInd/Open-Platypus
Open-Orca/slimorca-deduped-cleaned-corrected

and a new SLM:

Qwen/Qwen2.5-0.5B

I also included an env var flag to skip uploads to HF, defaulting to original behavior. I found this useful for tests.
This adds 15-20 minutes of extra testing (and 1.2GB of HF assets to download) to the nightly runs, which team has said is fine.

To run (you'll have you update your path & device_id):

CADENCE=nightly \
SKIP_HF_UPLOAD=yes \
CUDA_VISIBLE_DEVICES=4 \
TEST_DATA_FILE=~/projects/llm-compressor/tests/e2e/vLLM/configs/fp8_weight_only_channel.yaml \
pytest -s ~/projects/llm-compressor/tests/e2e/vLLM/test_vllm.py

TEST PLAN:
Additional config files for a broader range of datasets and an additional model.

github-actions · 2025-02-07T21:00:14Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

tests/e2e/vLLM/configs/w4a16_2of4_channel_quant_qwen.yaml

dbarbuzzi · 2025-02-07T21:19:28Z

If you could determine and note how much extra time these new additions would take when running the e2e tests, that would be super helpful! It would be ideal if we can preempt any required timeout changes to the nightly runs.

We have a custom test run workflow which you can easily configure to run that specific set of tests against this PR's branch (and choose a different timeout) to do the measurements. I am available to help use that workflow, just ping me on Slack.

brian-dellabetta · 2025-02-07T21:34:00Z

If you could determine and note how much extra time these new additions would take when running the e2e tests, that would be super helpful! It would be ideal if we can preempt any required timeout changes to the nightly runs.

We have a custom test run workflow which you can easily configure to run that specific set of tests against this PR's branch (and choose a different timeout) to do the measurements. I am available to help use that workflow, just ping me on Slack.

@dbarbuzzi sure thing! Worth discussing in standup Monday to see how to balance coverage and time to complete. Right now it's 6 new configs to 22 pre-existing, not sure if we want/need that many. Probably 15-20 more minutes total as is

brian-dellabetta · 2025-02-10T15:51:37Z

@dsikka responded an additional 15-20 minutes of nightly run time will not be a concern, so moving this to ready. Just FYI this requires downloading of the following datasets/models.
Open-Orca/slimorca-deduped-cleaned-corrected -- 162MB
neuralmagic/LLM_compression_calibration -- <5MB
garage-bAInd/Open-Platypus -- 15MB
Qwen/Qwen2.5-0.5B -- 1GB

Is there a way to run the nightly tests to confirm, or should we just merge this in today and see how nightly tests go tonight?

tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py

tests/e2e/vLLM/test_vllm.py

tests/e2e/vLLM/configs/fp8_static_per_tensor_qwen.yaml

tests/e2e/vLLM/configs/sparse_24_qwen.yaml

tests/e2e/vLLM/configs/w4a16_grouped_quant_qwen.yaml

dsikka

great job!

kylesayrs

It'd be nice to move some of these preprocessing and tokenization function definitions to TextGenerationDataset in the future so that others can use it, but no rush there.

LGTM!

brian-dellabetta · 2025-02-11T18:49:17Z

All tests are passing except tests/e2e/vLLM/configs/sparse2of4_fp8_dynamic_qwen.yaml, which we are removing because of a runtime CUTLASS error. Once resolved, we can bring this back:

cadence: "nightly"
test_type: "regression"
model: Qwen/Qwen2.5-0.5B
recipe: tests/e2e/vLLM/recipes/Sparse_2of4/recipe_sparse_2of4_fp8_dynamic.yaml
scheme: sparse2of4_fp8_dynamic
dataset_id: garage-bAInd/Open-Platypus
dataset_split: train

rahul-tuli

LGTM!

tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py

…bration, open-elm and qwen Signed-off-by: Brian Dellabetta <[email protected]>

Signed-off-by: Brian Dellabetta <[email protected]>

SUMMARY: An e2e test was removed from #1131 as it was failing out at vllm for a reason that has since been resolved by vllm-project/vllm#13198. This re-adds the test shown [here](#1131 (comment)). I confirmed this runs with [the nightly vllm wheel built by the testing CI/CD](https://github.com/neuralmagic/llm-compressor-testing/actions/runs/13360960551). This adds <2 minutes to the nightly test time. TEST PLAN: No new src code to test. Signed-off-by: Brian Dellabetta <[email protected]> Co-authored-by: Dipika Sikka <[email protected]>

brian-dellabetta requested review from kylesayrs, dsikka, rahul-tuli and horheynm February 7, 2025 21:00

brian-dellabetta commented Feb 7, 2025

View reviewed changes

tests/e2e/vLLM/configs/w4a16_2of4_channel_quant_qwen.yaml Outdated Show resolved Hide resolved

brian-dellabetta force-pushed the bdellabe/e2e-vllm-tests-more-datasets branch from a1abe8a to 35f1d50 Compare February 7, 2025 21:11

brian-dellabetta marked this pull request as ready for review February 10, 2025 15:46

brian-dellabetta force-pushed the bdellabe/e2e-vllm-tests-more-datasets branch from 8e7484f to 7b39550 Compare February 10, 2025 16:30

dbarbuzzi reviewed Feb 10, 2025

View reviewed changes

tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py Outdated Show resolved Hide resolved

dsikka changed the title ~~Bdellabe/e2e vllm tests more datasets~~ [e2e] Update vllm tests with additional datasets Feb 10, 2025

dsikka reviewed Feb 10, 2025

View reviewed changes

tests/e2e/vLLM/test_vllm.py Show resolved Hide resolved

dsikka reviewed Feb 10, 2025

View reviewed changes

tests/e2e/vLLM/configs/fp8_static_per_tensor_qwen.yaml Outdated Show resolved Hide resolved

tests/e2e/vLLM/configs/sparse_24_qwen.yaml Outdated Show resolved Hide resolved

tests/e2e/vLLM/configs/w4a16_grouped_quant_qwen.yaml Outdated Show resolved Hide resolved

brian-dellabetta requested a review from dsikka February 10, 2025 20:10

dsikka previously approved these changes Feb 11, 2025

View reviewed changes

kylesayrs previously approved these changes Feb 11, 2025

View reviewed changes

brian-dellabetta dismissed stale reviews from kylesayrs and dsikka via ff2cb87 February 11, 2025 18:51

brian-dellabetta force-pushed the bdellabe/e2e-vllm-tests-more-datasets branch from ff2cb87 to adf9210 Compare February 11, 2025 18:52

rahul-tuli previously approved these changes Feb 11, 2025

View reviewed changes

tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py Outdated Show resolved Hide resolved

kylesayrs previously approved these changes Feb 11, 2025

View reviewed changes

brian-dellabetta dismissed stale reviews from kylesayrs and rahul-tuli via 0f90d94 February 11, 2025 19:13

rahul-tuli approved these changes Feb 11, 2025

View reviewed changes

kylesayrs approved these changes Feb 11, 2025

View reviewed changes

e2e tests with open orca slim, open-platypus and llm_compression_cali…

fc328c2

…bration, open-elm and qwen Signed-off-by: Brian Dellabetta <[email protected]>

brian-dellabetta added 6 commits February 11, 2025 13:18

remove commented upload-to-HF code, use env var to toggle

2d0f775

Signed-off-by: Brian Dellabetta <[email protected]>

black/isort

404c2ca

Signed-off-by: Brian Dellabetta <[email protected]>

stick to regular 2:4 sparsity, avoid marlin kernel

c6771a5

Signed-off-by: Brian Dellabetta <[email protected]>

codereview updates

2e3dc51

Signed-off-by: Brian Dellabetta <[email protected]>

drop qwen sparse 2:4 fp8 test that is failing due to runtime issue

d6440ce

Signed-off-by: Brian Dellabetta <[email protected]>

rebase formatting fix

0166ffc

Signed-off-by: Brian Dellabetta <[email protected]>

brian-dellabetta force-pushed the bdellabe/e2e-vllm-tests-more-datasets branch from 0f90d94 to 0166ffc Compare February 11, 2025 19:18

brian-dellabetta added the ready When a PR is ready for review label Feb 11, 2025

dsikka enabled auto-merge (squash) February 11, 2025 20:07

dsikka merged commit 6377c30 into main Feb 11, 2025
8 checks passed

dsikka deleted the bdellabe/e2e-vllm-tests-more-datasets branch February 11, 2025 20:39

brian-dellabetta mentioned this pull request Feb 17, 2025

re-add vllm e2e test now that bug is fixed #1162

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[e2e] Update vllm tests with additional datasets #1131

[e2e] Update vllm tests with additional datasets #1131

brian-dellabetta commented Feb 7, 2025 •

edited

Loading

github-actions bot commented Feb 7, 2025

dbarbuzzi commented Feb 7, 2025

brian-dellabetta commented Feb 7, 2025

brian-dellabetta commented Feb 10, 2025

dsikka left a comment

kylesayrs left a comment

brian-dellabetta commented Feb 11, 2025

rahul-tuli left a comment

[e2e] Update vllm tests with additional datasets #1131

[e2e] Update vllm tests with additional datasets #1131

Conversation

brian-dellabetta commented Feb 7, 2025 • edited Loading

github-actions bot commented Feb 7, 2025

dbarbuzzi commented Feb 7, 2025

brian-dellabetta commented Feb 7, 2025

brian-dellabetta commented Feb 10, 2025

dsikka left a comment

Choose a reason for hiding this comment

kylesayrs left a comment

Choose a reason for hiding this comment

brian-dellabetta commented Feb 11, 2025

rahul-tuli left a comment

Choose a reason for hiding this comment

brian-dellabetta commented Feb 7, 2025 •

edited

Loading