Request: Add Flash Attention 2.0 Support for ViTMAEForPreTraining #36527

noelEOS · 2025-03-04T06:11:36Z

Hi Hugging Face team!

I am currently working on pre-training a Foundation Model using ViTMAEForPreTraining, and I was hoping to use Flash Attention 2.0 to speed up training and reduce memory usage. However, when I attempted to enable Flash Attention, I encountered the following error:

ValueError: ViTMAEForPreTraining does not support Flash Attention 2.0 yet. Please request to add support where the model is hosted, on its model hub page: https://huggingface.co//discussions/new or in the Transformers GitHub repo: https://github.com/huggingface/transformers/issues/new

Since MAE pre-training is heavily dependent on the attention mechanism, adding Flash Attention support would be a valuable enhancement—especially for larger ViT models and high-resolution datasets, like Landsat data we are working with.

Feature Request

Please add support for Flash Attention 2.0 to ViTMAEForPreTraining.
This would help make MAE pre-training more efficient in terms of speed and memory consumption.

Why This Matters

Many users working with large imagery datasets (like remote sensing, medical imaging, etc.) would greatly benefit from this.
Flash Attention has already proven useful in other ViT variants, so bringing this to MAE feels like a natural next step.

Environment Details

Transformers version: v4.41.0.dev0
PyTorch version: 2.5.1
Running on multi-GPU with NCCL backend

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2025-03-04T13:31:48Z

It might take a while before we get to this, but we'd welcome a PR if anyone in the community wants to add it! You can use attention code from other models that do support FlashAttention2 as a template

qubvel · 2025-03-04T18:33:06Z

Hey @noelEOS, please check the PR linked, it should work now if you install from the branch

pip install -U git+https://github.com/qubvel/transformers@refactor-vit-attention

noelEOS · 2025-03-05T13:35:52Z

@qubvel, thanks so much for the quick response and for adding this feature! I really appreciate the fast turnaround. 🙌

I have not had the chance to test the branch yet, but I’ll try it out soon and report back if I encounter anything unexpected. Looking forward to seeing how it performs!!

qubvel added Vision Flash Attention Feature request Request for a new feature labels Mar 4, 2025

Rocketknight1 added the Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! label Mar 4, 2025

qubvel mentioned this issue Mar 4, 2025

Refactor Attention implementation for ViT-based models #36545

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request: Add Flash Attention 2.0 Support for ViTMAEForPreTraining #36527

Request: Add Flash Attention 2.0 Support for ViTMAEForPreTraining #36527

noelEOS commented Mar 4, 2025

Rocketknight1 commented Mar 4, 2025

qubvel commented Mar 4, 2025

noelEOS commented Mar 5, 2025

Request: Add Flash Attention 2.0 Support for ViTMAEForPreTraining #36527

Request: Add Flash Attention 2.0 Support for ViTMAEForPreTraining #36527

Comments

noelEOS commented Mar 4, 2025

Rocketknight1 commented Mar 4, 2025

qubvel commented Mar 4, 2025

noelEOS commented Mar 5, 2025