New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Support hdimQK=192, hdimV=128 backward #1487

Open

jacob-cursor opened this issue Feb 10, 2025 · 1 comment

jacob-cursor commented Feb 10, 2025

Very happy to see 2a20412 which added support for the Deepseek attention configuration (qk_dim=192, v_dim=128) to FA3.

Currently, backward is not supported:

[rank1]:     dq, dk, dv, softmax_d, *rest = flash_attn_3_cuda.bwd(
[rank1]:                                    ^^^^^^^^^^^^^^^^^^^^^^
[rank1]: RuntimeError: out must have shape (batch_size, seqlen_q, num_heads, head_size)

The backward pass for this op would be really useful!

The text was updated successfully, but these errors were encountered:

Member

tridao commented Feb 11, 2025

I'm personally not working on that soon, but I can review PRs if there's contribution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment