-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flash Attention 3 compile fails (able to compile 2) : TypeError: _write_ninja_file() got an unexpected keyword argument 'sycl_cflags' #1524
Comments
Ok Grok suggested this change I tried and new error
|
same i think it might be a torch nightly issue. |
Yeah i think it's a bug introduced in this commit: pytorch/pytorch@d27ecf8 i'm not sure how this commit really passed review.... it's a whole mess. added so much intel-specific bloat to a very much core part of pytorch |
This patch adds support for sycl kernels build via `torch.utils.cpp_extension.load`, `torch.utils.cpp_extension.load_inline` and (new) `class SyclExtension` APIs. Files having `.sycl` extension are considered to have sycl kernels and are compiled with `icpx` (dpc++ sycl compiler from Intel). Files with other extensions, `.cpp`, `.cu`, are handled as before. API supports building sycl along with other file types into single extension. Note that `.sycl` file extension is a PyTorch convention for files containing sycl code which I propose to adopt. We did follow up with compiler team to introduce such file extension in the compiler, but they are opposed to this. At the same time discussion around sycl file extension and adding sycl language support into such tools as cmake is ongoing. Eventually cmake also considers to introduce some file extension convention for sycl. I hope we can further influence cmake and compiler communities to broader adopt `.sycl` file extension. By default SYCL kernels are compiled for all Intel GPU devices for which pytorch native aten SYCL kernels are compiled. At the moment `pvc,xe-lpg`. This behavior can be overridden by setting `TORCH_XPU_ARCH_LIST` environment variables to the comma separated list of desired devices to compile for. Fixes: #132944 CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5 Pull Request resolved: #132945 Approved by: https://github.com/albanD, https://github.com/guangyey, https://github.com/malfet Co-authored-by: Nikita Shulga <[email protected]>
thanks i linked this issue there since this is 2 weeks old i don't suppose any of cu128 torch would work @cassanof ? |
The latest commit should hopefully fix this |
You should try compiling on Linux. Windows compilation has not been tested. |
Linux is easy stuff I need Windows :D thank you so much but sadly failed again any hints to try? after clone i rename to d since i need shorter paths
|
I'm not familiar with Windows so idk what's wrong, sounds like the macros in |
Flash Attention 2 is working - I am able to compile. I really appreciate if you make it compile on Windows You can tell me to try this or that and i can do |
I'm out of my depth on Windows, you might need to check what FA2 does with static_switch.h that FA3 does differently. |
I believe that this is an issue with pytorch nightly, nothing to do with windows or FA 3. I got this error on Linux as well. They had an update 2 weeks ago where sycl was added the extension build process. Seems like it requires some special ninja package. My fix was to fork pytorch nightly and revert the sycl commit. Another fix is to either not use nightly, or use the version before 2 weeks ago. |
can you tell me exact working version and the pull request that made it broken? thank you so much |
Hello. I am trying to compile Flash Attention 3 and it fails
I am able to compile Flash Attention 2
The text was updated successfully, but these errors were encountered: