feat: add fp8 source files #50

mickaelseznec · 2025-02-19T11:38:14Z

No description provided.

LucasWilkinson · 2025-02-21T18:20:50Z

Thanks for the PR, I will try to test this soon!

Signed-off-by: Mickael Seznec <[email protected]>

LucasWilkinson · 2025-02-27T16:47:26Z

Apologies for the very long delay! the 12.1 build issues is now landed on main and the MLA stuff is landed to so we are cleared to land this.

However, is the hopper/flash_api.cpp changes strictly necessary? im really really trying to reduces diffs with upstream to make sure we can sync easily and this is any upstream file, is there any chance we could move this stuff in to vllm_flash_attn/flash_attn_interface.py and use torch.expand to create a 0 stride?

This reverts commit 375edf5. Signed-off-by: Mickael Seznec <[email protected]>

Signed-off-by: Mickael Seznec <[email protected]>

mickaelseznec · 2025-02-28T11:12:51Z

Great, I didn't know about torch.expand(). I'll use it on vllm side to avoid any magic hidden in the interface layer.

Also, I've seen some API mismatches between torch.ops._vllm_fa3_C.fwd and how it's called in vllm_flash_attn/flash_attn_interface.py (alibi doesn't seem to have support in FA3)

LucasWilkinson · 2025-03-03T07:00:16Z

Great, I didn't know about torch.expand(). I'll use it on vllm side to avoid any magic hidden in the interface layer.

Awesome, thank you! Do you mind removing these bits from the flash_api.cpp in this PR?

Also, I've seen some API mismatches between torch.ops._vllm_fa3_C.fwd and how it's called in vllm_flash_attn/flash_attn_interface.py (alibi doesn't seem to have support in FA3)

oh good catch! oof not sure how that slipped by, PRs welcome :smiling:

mickaelseznec · 2025-03-03T09:52:41Z

Hi @LucasWilkinson, I've updated my PR accordingly. Let me know if you have any further comments :)

mickaelseznec force-pushed the mseznec/add-fp8-kernel-compilation branch from e11ba83 to db8c667 Compare February 19, 2025 12:40

mickaelseznec added 3 commits February 25, 2025 16:39

feat: add fp8 source files

fc7d9e7

Signed-off-by: Mickael Seznec <[email protected]>

fix: add qkv scales

2b48f5f

Signed-off-by: Mickael Seznec <[email protected]>

feat: scalar scaling factor

375edf5

Signed-off-by: Mickael Seznec <[email protected]>

mickaelseznec force-pushed the mseznec/add-fp8-kernel-compilation branch from b6ca541 to 375edf5 Compare February 25, 2025 16:40

mickaelseznec added 2 commits February 28, 2025 11:04

Revert "feat: scalar scaling factor"

c15ce27

This reverts commit 375edf5. Signed-off-by: Mickael Seznec <[email protected]>

fix: flash-attn interface

3884373

Signed-off-by: Mickael Seznec <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add fp8 source files #50

feat: add fp8 source files #50

mickaelseznec commented Feb 19, 2025

LucasWilkinson commented Feb 21, 2025

LucasWilkinson commented Feb 27, 2025

mickaelseznec commented Feb 28, 2025

LucasWilkinson commented Mar 3, 2025

mickaelseznec commented Mar 3, 2025

feat: add fp8 source files #50

Are you sure you want to change the base?

feat: add fp8 source files #50

Conversation

mickaelseznec commented Feb 19, 2025

LucasWilkinson commented Feb 21, 2025

LucasWilkinson commented Feb 27, 2025

mickaelseznec commented Feb 28, 2025

LucasWilkinson commented Mar 3, 2025

mickaelseznec commented Mar 3, 2025