[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case, rename RedundantReshapesPass to NoopEliminationPass #10902

ProExpertProg · 2024-12-04T20:14:46Z

This PR fixes the fp8 case, when cutlass_mm is not available. It contains the following fixes:

Removes the padding for fp8 torch._scaled_mm in the torch.compile case, as branch specialization might not work correctly, and it makes fusion difficult.
Implements redundant slice and slice_scatter elimination, which is implemented in PyTorch but does not cover all cases. It renames the RedundantReshapesPass to NoopEliminationPass.
Minor custom pass improvements.

This PR is a pre-requisite PR to #10836, which enables torch.compile on AMD and uses the non-cutlass-fp8 path.

github-actions · 2024-12-04T20:15:01Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

mergify · 2025-02-15T12:02:29Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ProExpertProg.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

bnellnm · 2025-02-26T19:25:56Z

vllm/compilation/reshapes.py

+            elif is_func(node, torch.ops.aten.slice.Tensor):
+                input, dim_index, start, end = node.args[:4]
+                input_shape = input.meta["val"].shape
+                i_dim = input_shape[dim_index]
+
+                if start == 0 and self.dims_equivalent(end, i_dim):
+                    node.replace_all_uses_with(input)
+                    graph.erase_node(node)
+                    count += 1
+
+            elif is_func(node, torch.ops.aten.slice_scatter.default):


Are these always the right ops to use? e.g. is there a torch.ops.aten.slice.default or a torch.ops.aten.slice_scatter.Tensor?

I haven't seen them so I am not sure - I just went off what I saw. The other overloads could be added easily if we ever see them in the graph

Signed-off-by: luka <[email protected]>

mgoin · 2025-02-28T01:44:48Z

vllm/model_executor/layers/quantization/utils/w8a8_utils.py

@@ -161,10 +162,14 @@ def apply_fp8_linear(
        # Note: we pad the input because torch._scaled_mm is more performant
        # for matrices with batch dimension > 16.
        # This could change in the future.
+        # We also don't pad when using torch.compile,
+        # as it breaks with dynamic shapes.
+        config = get_current_vllm_config().compilation_config


Is this cached? It could be expensive each forward call

Yes, in eager mode this will get called on every forward pass, but it will only happen once when compiled. In eager mode there isn't really a better way that's still correct - the only way is to check the config context. I don't think this getter is significant but I haven't measured it.

We could pass in a allow_input_padding flag? and pass it in? I do think this is annoying though. I think it's woth it to do a quick check for performance regressions on a small model eager mode benchmark with cutlass_scaled_mm disabled?

I think we'd have to pass that flag through the whole call stack though so I don't think it's worth it. I'll run a small model.

tests/compile/test_fusion.py

tlrmchlsmth · 2025-02-28T17:29:54Z

vllm/model_executor/layers/quantization/utils/w8a8_utils.py

@@ -161,10 +162,14 @@ def apply_fp8_linear(
        # Note: we pad the input because torch._scaled_mm is more performant
        # for matrices with batch dimension > 16.
        # This could change in the future.
+        # We also don't pad when using torch.compile,
+        # as it breaks with dynamic shapes.
+        config = get_current_vllm_config().compilation_config


We could pass in a allow_input_padding flag? and pass it in? I do think this is annoying though. I think it's woth it to do a quick check for performance regressions on a small model eager mode benchmark with cutlass_scaled_mm disabled?

vllm/compilation/reshapes.py

tlrmchlsmth

Looks good overall but I had a few minor comments

- rename cutlass_fp8 test flag - rename noop pass - improve some comments Signed-off-by: luka <[email protected]>

tlrmchlsmth

Thanks for the great work! LGTM assuming we don't see any performance regression

ProExpertProg · 2025-02-28T19:33:20Z

Yep will post perf numbers once I have them, thanks!

…e, rename RedundantReshapesPass to NoopEliminationPass (vllm-project#10902) Signed-off-by: luka <[email protected]> Signed-off-by: Johnny <[email protected]>

…e, rename RedundantReshapesPass to NoopEliminationPass (vllm-project#10902) Signed-off-by: luka <[email protected]>

…e, rename RedundantReshapesPass to NoopEliminationPass (vllm-project#10902) Signed-off-by: luka <[email protected]> Signed-off-by: Linkun Chen <[email protected]>

…-fp8 case, rename RedundantReshapesPass to NoopEliminationPass (#10902)" This reverts commit bd56c98.

ProExpertProg force-pushed the luka/fp8-noncutlass-fix branch from b8ab496 to e5ded5c Compare December 4, 2024 20:18

mergify bot added the needs-rebase label Feb 15, 2025

ProExpertProg force-pushed the luka/fp8-noncutlass-fix branch from e5ded5c to a3cb530 Compare February 25, 2025 20:52

ProExpertProg requested review from mgoin, robertgshaw2-redhat and tlrmchlsmth as code owners February 25, 2025 20:52

mergify bot removed the needs-rebase label Feb 25, 2025

ProExpertProg force-pushed the luka/fp8-noncutlass-fix branch 2 times, most recently from 65afeae to c22186b Compare February 26, 2025 19:08

ProExpertProg changed the title ~~Fix for the padding in the non-cutlass-fp8 case~~ [torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case Feb 26, 2025

bnellnm reviewed Feb 26, 2025

View reviewed changes

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 26, 2025

bnellnm approved these changes Feb 26, 2025

View reviewed changes

ProExpertProg added 4 commits February 27, 2025 22:38

Fix for the padding in the non-cutlass-fp8 case (formatted)

5dc6d69

Signed-off-by: luka <[email protected]>

Added redundant slice/slice_scatter elimination

fe74515

Signed-off-by: luka <[email protected]>

Allocate device identity for FP8 _scaled_mm

cd15aca

Signed-off-by: luka <[email protected]>

Only test cutlass path if supported

427bb9d

Signed-off-by: luka <[email protected]>

ProExpertProg force-pushed the luka/fp8-noncutlass-fix branch from 12e173e to 427bb9d Compare February 27, 2025 22:38

mgoin approved these changes Feb 28, 2025

View reviewed changes

tlrmchlsmth reviewed Feb 28, 2025

View reviewed changes

PR comments:

acb8557

- rename cutlass_fp8 test flag - rename noop pass - improve some comments Signed-off-by: luka <[email protected]>

tlrmchlsmth approved these changes Feb 28, 2025

View reviewed changes

ProExpertProg changed the title ~~[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case~~ [torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case, rename RedundantReshapesPass to NoopEliminationPass Feb 28, 2025

mgoin approved these changes Feb 28, 2025

View reviewed changes

mgoin merged commit bd56c98 into vllm-project:main Feb 28, 2025
40 checks passed

Akshat-Tripathi pushed a commit to krai/vllm that referenced this pull request Mar 3, 2025

[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 cas…

27aacf9

…e, rename RedundantReshapesPass to NoopEliminationPass (vllm-project#10902) Signed-off-by: luka <[email protected]>

tlrmchlsmth added a commit that referenced this pull request Mar 5, 2025

Revert "[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass…

eecb574

…-fp8 case, rename RedundantReshapesPass to NoopEliminationPass (#10902)" This reverts commit bd56c98.

tlrmchlsmth mentioned this pull request Mar 5, 2025

Revert "[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass… #14317

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case, rename RedundantReshapesPass to NoopEliminationPass #10902

[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case, rename RedundantReshapesPass to NoopEliminationPass #10902

ProExpertProg commented Dec 4, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 4, 2024

mergify bot commented Feb 15, 2025

bnellnm Feb 26, 2025

ProExpertProg Feb 26, 2025

mgoin Feb 28, 2025

ProExpertProg Feb 28, 2025

tlrmchlsmth Feb 28, 2025 •

edited

Loading

ProExpertProg Feb 28, 2025

tlrmchlsmth Feb 28, 2025 •

edited

Loading

tlrmchlsmth left a comment

tlrmchlsmth left a comment

ProExpertProg commented Feb 28, 2025

[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case, rename RedundantReshapesPass to NoopEliminationPass #10902

[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case, rename RedundantReshapesPass to NoopEliminationPass #10902

Conversation

ProExpertProg commented Dec 4, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 4, 2024

mergify bot commented Feb 15, 2025

bnellnm Feb 26, 2025

Choose a reason for hiding this comment

ProExpertProg Feb 26, 2025

Choose a reason for hiding this comment

mgoin Feb 28, 2025

Choose a reason for hiding this comment

ProExpertProg Feb 28, 2025

Choose a reason for hiding this comment

tlrmchlsmth Feb 28, 2025 • edited Loading

Choose a reason for hiding this comment

ProExpertProg Feb 28, 2025

Choose a reason for hiding this comment

tlrmchlsmth Feb 28, 2025 • edited Loading

Choose a reason for hiding this comment

tlrmchlsmth left a comment

Choose a reason for hiding this comment

tlrmchlsmth left a comment

Choose a reason for hiding this comment

ProExpertProg commented Feb 28, 2025

ProExpertProg commented Dec 4, 2024 •

edited by github-actions bot

Loading

tlrmchlsmth Feb 28, 2025 •

edited

Loading

tlrmchlsmth Feb 28, 2025 •

edited

Loading