Flash Attention 3 (FA3) Support #12429

mgoin · 2025-01-25T19:48:27Z

As of #12093 Flash Attention 3 is now supported in vLLM for Hopper GPUs (SM 9.0).

It can also be enabled for SM 8.0 and 8.7 using VLLM_FLASH_ATTN_VERSION=3.

For 8.6 and 8.9 its fully disabled since they don't have enough shared memory for the current implementation, some work needs to be done here.

This issue tracks the remaining features that have yet to be implemented

Hardware Support

SM 8.9 Ada Lovelace (L4, L40s) Support
SM 8.6 Ampere (A6000) Support

Optimizations

FP8 Attention

The text was updated successfully, but these errors were encountered:

zhangjiekui · 2025-02-11T05:41:26Z

As of #12093 Flash Attention 3 is now supported in vLLM for Hopper GPUs (SM 9.0).

It can also be enabled for SM 8.0 and 8.7 using VLLM_FLASH_ATTN_VERSION=3.

For 8.6 and 8.9 its fully disabled since they don't have enough shared memory for the current implementation, some work needs to be done here.

This issue tracks the remaining features that have yet to be implemented

Hardware Support

SM 8.9 Ada Lovelace (L4, L40s) Support[ ] SM 8.6 Ampere (A6000) Support

Optimizations

FP8 Attention

Watch

mgoin mentioned this issue Jan 25, 2025

[Roadmap] vLLM Roadmap Q1 2025 #11862

Open

38 tasks

mgoin changed the title ~~FlashAttention3~~ Flash Attention 3 Support Jan 25, 2025

mgoin changed the title ~~Flash Attention 3 Support~~ Flash Attention 3 (FA3) Support Jan 25, 2025

mgoin mentioned this issue Jan 25, 2025

[Feature]: Will vLLM support flash-attention 3 ? #11372

Closed

1 task

WoosukKwon mentioned this issue Jan 26, 2025

[Misc] Add FA2 support to ViT MHA layer #12355

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flash Attention 3 (FA3) Support #12429

Flash Attention 3 (FA3) Support #12429

mgoin commented Jan 25, 2025 •

edited

Loading

zhangjiekui commented Feb 11, 2025

Hardware Support

Optimizations

Flash Attention 3 (FA3) Support #12429

Flash Attention 3 (FA3) Support #12429

Comments

mgoin commented Jan 25, 2025 • edited Loading

Hardware Support

Optimizations

zhangjiekui commented Feb 11, 2025

Hardware Support

Optimizations

mgoin commented Jan 25, 2025 •

edited

Loading