Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: Add Flash Attention 2.0 Support for ViTMAEForPreTraining #36527

Open
noelEOS opened this issue Mar 4, 2025 · 3 comments
Open

Request: Add Flash Attention 2.0 Support for ViTMAEForPreTraining #36527

noelEOS opened this issue Mar 4, 2025 · 3 comments
Labels
Feature request Request for a new feature Flash Attention Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! Vision

Comments

@noelEOS
Copy link

noelEOS commented Mar 4, 2025

Hi Hugging Face team!

I am currently working on pre-training a Foundation Model using ViTMAEForPreTraining, and I was hoping to use Flash Attention 2.0 to speed up training and reduce memory usage. However, when I attempted to enable Flash Attention, I encountered the following error:

ValueError: ViTMAEForPreTraining does not support Flash Attention 2.0 yet. Please request to add support where the model is hosted, on its model hub page: https://huggingface.co//discussions/new or in the Transformers GitHub repo: https://github.com/huggingface/transformers/issues/new

Since MAE pre-training is heavily dependent on the attention mechanism, adding Flash Attention support would be a valuable enhancement—especially for larger ViT models and high-resolution datasets, like Landsat data we are working with.

Feature Request

  • Please add support for Flash Attention 2.0 to ViTMAEForPreTraining.
  • This would help make MAE pre-training more efficient in terms of speed and memory consumption.

Why This Matters

  • Many users working with large imagery datasets (like remote sensing, medical imaging, etc.) would greatly benefit from this.
  • Flash Attention has already proven useful in other ViT variants, so bringing this to MAE feels like a natural next step.

Environment Details

  • Transformers version: v4.41.0.dev0
  • PyTorch version: 2.5.1
  • Running on multi-GPU with NCCL backend
@Rocketknight1 Rocketknight1 added the Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! label Mar 4, 2025
@Rocketknight1
Copy link
Member

It might take a while before we get to this, but we'd welcome a PR if anyone in the community wants to add it! You can use attention code from other models that do support FlashAttention2 as a template

@qubvel
Copy link
Member

qubvel commented Mar 4, 2025

Hey @noelEOS, please check the PR linked, it should work now if you install from the branch

pip install -U git+https://github.com/qubvel/transformers@refactor-vit-attention

@noelEOS
Copy link
Author

noelEOS commented Mar 5, 2025

@qubvel, thanks so much for the quick response and for adding this feature! I really appreciate the fast turnaround. 🙌

I have not had the chance to test the branch yet, but I’ll try it out soon and report back if I encounter anything unexpected. Looking forward to seeing how it performs!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request Request for a new feature Flash Attention Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! Vision
Projects
None yet
Development

No branches or pull requests

3 participants