[Core] Don't use cache during multi-modal profiling #14336

DarkLight1337 · 2025-03-06T07:48:32Z

Avoid using cache during profiling step so that the dummy outputs (which are intentionally constructed to take up as much memory as possible) can be discarded immediately afterwards.

Signed-off-by: DarkLight1337 <[email protected]>

github-actions · 2025-03-06T07:48:42Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: DarkLight1337 <[email protected]>

[Core] Optimize multi-modal profiling

d1fe607

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 added 2 commits March 6, 2025 13:34

Merge branch 'main' into optimize-profiling

9f74192

Revert padding

6ba2d40

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 6, 2025

DarkLight1337 requested a review from Isotr0py March 6, 2025 13:37

DarkLight1337 marked this pull request as ready for review March 6, 2025 13:37

DarkLight1337 requested a review from ywang96 as a code owner March 6, 2025 13:37

DarkLight1337 changed the title ~~[Core] Optimize multi-modal profiling~~ [Core] Don't use cache during multi-modal profiling Mar 6, 2025

DarkLight1337 mentioned this pull request Mar 6, 2025

[Bug]: When using the VLLM framework to load visual models, CPU memory overflow occurs while continuously processing data with images. #12973

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Don't use cache during multi-modal profiling #14336

[Core] Don't use cache during multi-modal profiling #14336

DarkLight1337 commented Mar 6, 2025 •

edited

Loading

github-actions bot commented Mar 6, 2025

[Core] Don't use cache during multi-modal profiling #14336

Are you sure you want to change the base?

[Core] Don't use cache during multi-modal profiling #14336

Conversation

DarkLight1337 commented Mar 6, 2025 • edited Loading

github-actions bot commented Mar 6, 2025

DarkLight1337 commented Mar 6, 2025 •

edited

Loading