Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible to use rotary embedding without flash attention? #691

Open
Ph0rk0z opened this issue Feb 11, 2025 · 3 comments
Open

Possible to use rotary embedding without flash attention? #691

Ph0rk0z opened this issue Feb 11, 2025 · 3 comments

Comments

@Ph0rk0z
Copy link

Ph0rk0z commented Feb 11, 2025

Flash attention is ampere+ and while mamba compiles on turning, rotary embedding doesn't work. So google T4, 2080ti and all those cards are locked out once again. Most of these SSM models are small and can probably afford the memory hit over compatibility.

@tridao
Copy link
Collaborator

tridao commented Feb 11, 2025

Rotary implementation is 1-2 files, written in pytorch and triton. You can copy those files.

@YellowRoseCx
Copy link

Rotary implementation is 1-2 files, written in pytorch and triton. You can copy those files.

Would you mind elaborating on how to do what you're talking about?

@tridao
Copy link
Collaborator

tridao commented Feb 24, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants