Possible to use rotary embedding without flash attention? #691

Ph0rk0z · 2025-02-11T13:38:59Z

Flash attention is ampere+ and while mamba compiles on turning, rotary embedding doesn't work. So google T4, 2080ti and all those cards are locked out once again. Most of these SSM models are small and can probably afford the memory hit over compatibility.

tridao · 2025-02-11T14:57:57Z

Rotary implementation is 1-2 files, written in pytorch and triton. You can copy those files.

YellowRoseCx · 2025-02-24T05:22:23Z

Rotary implementation is 1-2 files, written in pytorch and triton. You can copy those files.

Would you mind elaborating on how to do what you're talking about?

tridao · 2025-02-24T06:19:33Z

This file and the one it imports
https://github.com/Dao-AILab/flash-attention/blob/main/flash_attn/layers/rotary.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible to use rotary embedding without flash attention? #691

Possible to use rotary embedding without flash attention? #691

Ph0rk0z commented Feb 11, 2025

tridao commented Feb 11, 2025

YellowRoseCx commented Feb 24, 2025

tridao commented Feb 24, 2025

Possible to use rotary embedding without flash attention? #691

Possible to use rotary embedding without flash attention? #691

Comments

Ph0rk0z commented Feb 11, 2025

tridao commented Feb 11, 2025

YellowRoseCx commented Feb 24, 2025

tridao commented Feb 24, 2025