Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[e2e] Update vllm tests with additional datasets #1131

Merged
merged 7 commits into from
Feb 11, 2025

Conversation

brian-dellabetta
Copy link
Collaborator

@brian-dellabetta brian-dellabetta commented Feb 7, 2025

SUMMARY:
Adding a handful more e2e tests with 3 more datasets

  • neuralmagic/LLM_compression_calibration
  • garage-bAInd/Open-Platypus
  • Open-Orca/slimorca-deduped-cleaned-corrected

and a new SLM:

  • Qwen/Qwen2.5-0.5B

I also included an env var flag to skip uploads to HF, defaulting to original behavior. I found this useful for tests.
This adds 15-20 minutes of extra testing (and 1.2GB of HF assets to download) to the nightly runs, which team has said is fine.

To run (you'll have you update your path & device_id):

CADENCE=nightly \
SKIP_HF_UPLOAD=yes \
CUDA_VISIBLE_DEVICES=4 \
TEST_DATA_FILE=~/projects/llm-compressor/tests/e2e/vLLM/configs/fp8_weight_only_channel.yaml \
pytest -s ~/projects/llm-compressor/tests/e2e/vLLM/test_vllm.py

TEST PLAN:
Additional config files for a broader range of datasets and an additional model.

Copy link

github-actions bot commented Feb 7, 2025

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@brian-dellabetta brian-dellabetta force-pushed the bdellabe/e2e-vllm-tests-more-datasets branch from a1abe8a to 35f1d50 Compare February 7, 2025 21:11
@dbarbuzzi
Copy link
Collaborator

If you could determine and note how much extra time these new additions would take when running the e2e tests, that would be super helpful! It would be ideal if we can preempt any required timeout changes to the nightly runs.

We have a custom test run workflow which you can easily configure to run that specific set of tests against this PR's branch (and choose a different timeout) to do the measurements. I am available to help use that workflow, just ping me on Slack.

@brian-dellabetta
Copy link
Collaborator Author

If you could determine and note how much extra time these new additions would take when running the e2e tests, that would be super helpful! It would be ideal if we can preempt any required timeout changes to the nightly runs.

We have a custom test run workflow which you can easily configure to run that specific set of tests against this PR's branch (and choose a different timeout) to do the measurements. I am available to help use that workflow, just ping me on Slack.

@dbarbuzzi sure thing! Worth discussing in standup Monday to see how to balance coverage and time to complete. Right now it's 6 new configs to 22 pre-existing, not sure if we want/need that many. Probably 15-20 more minutes total as is

@brian-dellabetta brian-dellabetta marked this pull request as ready for review February 10, 2025 15:46
@brian-dellabetta
Copy link
Collaborator Author

@dsikka responded an additional 15-20 minutes of nightly run time will not be a concern, so moving this to ready. Just FYI this requires downloading of the following datasets/models.
Open-Orca/slimorca-deduped-cleaned-corrected -- 162MB
neuralmagic/LLM_compression_calibration -- <5MB
garage-bAInd/Open-Platypus -- 15MB
Qwen/Qwen2.5-0.5B -- 1GB

Is there a way to run the nightly tests to confirm, or should we just merge this in today and see how nightly tests go tonight?

@brian-dellabetta brian-dellabetta force-pushed the bdellabe/e2e-vllm-tests-more-datasets branch from 8e7484f to 7b39550 Compare February 10, 2025 16:30
@dsikka dsikka changed the title Bdellabe/e2e vllm tests more datasets [e2e] Update vllm tests with additional datasets Feb 10, 2025
dsikka
dsikka previously approved these changes Feb 11, 2025
Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great job!

kylesayrs
kylesayrs previously approved these changes Feb 11, 2025
Copy link
Collaborator

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to move some of these preprocessing and tokenization function definitions to TextGenerationDataset in the future so that others can use it, but no rush there.

LGTM!

@brian-dellabetta
Copy link
Collaborator Author

All tests are passing except tests/e2e/vLLM/configs/sparse2of4_fp8_dynamic_qwen.yaml, which we are removing because of a runtime CUTLASS error. Once resolved, we can bring this back:

cadence: "nightly"
test_type: "regression"
model: Qwen/Qwen2.5-0.5B
recipe: tests/e2e/vLLM/recipes/Sparse_2of4/recipe_sparse_2of4_fp8_dynamic.yaml
scheme: sparse2of4_fp8_dynamic
dataset_id: garage-bAInd/Open-Platypus
dataset_split: train

@brian-dellabetta brian-dellabetta dismissed stale reviews from kylesayrs and dsikka via ff2cb87 February 11, 2025 18:51
@brian-dellabetta brian-dellabetta force-pushed the bdellabe/e2e-vllm-tests-more-datasets branch from ff2cb87 to adf9210 Compare February 11, 2025 18:52
rahul-tuli
rahul-tuli previously approved these changes Feb 11, 2025
Copy link
Collaborator

@rahul-tuli rahul-tuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

kylesayrs
kylesayrs previously approved these changes Feb 11, 2025
…bration, open-elm and qwen

Signed-off-by: Brian Dellabetta <[email protected]>
@brian-dellabetta brian-dellabetta force-pushed the bdellabe/e2e-vllm-tests-more-datasets branch from 0f90d94 to 0166ffc Compare February 11, 2025 19:18
@brian-dellabetta brian-dellabetta added the ready When a PR is ready for review label Feb 11, 2025
@dsikka dsikka enabled auto-merge (squash) February 11, 2025 20:07
@dsikka dsikka merged commit 6377c30 into main Feb 11, 2025
8 checks passed
@dsikka dsikka deleted the bdellabe/e2e-vllm-tests-more-datasets branch February 11, 2025 20:39
dsikka added a commit that referenced this pull request Feb 19, 2025
SUMMARY:
An e2e test was removed from #1131 as it was failing out at vllm for a
reason that has since been resolved by
vllm-project/vllm#13198. This re-adds the test
shown
[here](#1131 (comment)).

I confirmed this runs with [the nightly vllm wheel built by the testing
CI/CD](https://github.com/neuralmagic/llm-compressor-testing/actions/runs/13360960551).

This adds <2 minutes to the nightly test time.

TEST PLAN:
No new src code to test.

Signed-off-by: Brian Dellabetta <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready When a PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants