Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ABI for f16 builtins on Intel Apple targets #675

Merged
merged 2 commits into from
Aug 25, 2024

Conversation

beetrees
Copy link
Contributor

For builtins on x86 (32-bit and 64-bit) Apple platforms, f16 is passed and returned like a u16 unless the builtin involves f128. This PR adds two #[win64_128bit_abi_hack]-like attributes (#[apple_f16_arg_abi] and #[apple_f16_ret_abi]) to the intrinsics macro to handle this. This should fix the CI failure that's blocking rust-lang/rust#128349.

@beetrees beetrees force-pushed the f16-abi-intel-apple branch from 7def335 to 54d9376 Compare August 23, 2024 12:01
@beetrees beetrees force-pushed the f16-abi-intel-apple branch from 54d9376 to 97179e4 Compare August 23, 2024 17:46
@tgross35
Copy link
Contributor

tgross35 commented Aug 23, 2024

Any clue what the history is here? Maybe f16 as GPRs make sense in some legacy aspect, but changing the ABI when f128 is involved seems like it could be an unintentional choice by LLVM. Are existing runtime libs even achieving this?

Apple's docs say it defers to the psABI, which does specify both _Float16 and __float128 in recent versions. Specifically:

  • Arguments of types _Float16, float, double, _Decimal32, _Decimal64 and __m64 are
    in class SSE.
  • Arguments of types __float128, _Decimal128 and __m128 are split into two halves.
    The least significant ones belong to class SSE, the most significant one to class
    SSEUP

So maybe a case could be made that the C runtime libraries should change?

@beetrees
Copy link
Contributor Author

The Apple docs have this paragraph:

Apple platforms typically follow the data representation and procedure call rules in the standard System V psABI for AMD64, using the LP64 programming model. However, when those rules are in conflict with the longstanding behavior of the Apple LLVM compiler (Clang) on Apple platforms, then the ABI typically diverges from the standard Processor Specific Application Binary Interface (psABI) and instead follows longstanding behavior. Several such divergences are below. If you discover a divergence not described here, please report it to Apple.

Which suggests the list of differences from the psABI standard is known to potentially be incomplete. Regardless, x86(-64) Apple platforms do actually follow the standard when it comes to passing and returning _Float16 (that is, it is passed/returned in XMM registers), it is just these builtins that have a different ABI (hence the need for this PR). Given that Clang has supported _Float16 on x86-64 Apple targets for a while now, it's unlikely to be possible to change this.

Given that the x86(-64) Apple codegen is specifically different from other x86(-64) platforms, it seems unlikely this difference in ABI is accidental. Discussion I've managed to find from when _Float16 support was being expanded to all SSE2-and-later CPUs seems to indicate there were backwards-compatibility concerns due to the already-existing __fp16 type; before _Float16 support was expanded, the LLVM builtins would have been built using uint16_t to represent f16 (Clang won't let __fp16 be passed or returned by value, so the regular C ABI for non-builtins functions wouldn't have been an issue). My guess is that to ensure that existing uses of the builtins weren't broken (I'm presuming that compiler-rt is used to provide the system builtins on Apple platforms), Clang was changed to continue to use the old ABI for the relevant builtins. As Clang on Apple platforms has never supported f128, the builtins involving f128 were unaffected.

Unfortunately I don't personally have access to an Apple system to test this on.

@tgross35
Copy link
Contributor

tgross35 commented Aug 23, 2024

Thanks for the background. I have access to an arm64 mac, I'll try to double check this locally a bit later.

There are apple systems on cfarm if you have access to that, just not x86 either https://portal.cfarm.net/machines/list/ (If you don't have access, it is pretty quick. I think I got mine in about one day. Wish I knew about it when I was doing the powerpc debugging...)

@tgross35
Copy link
Contributor

I didn't get the chance to test this but it looks reasonable to me with everything in mind. I can't really think of a straightforward way to test this in crate, but I think we will be covered by rust-lang/rust.

@tgross35 tgross35 enabled auto-merge (rebase) August 25, 2024 04:01
@tgross35 tgross35 merged commit a113c7c into rust-lang:master Aug 25, 2024
24 checks passed
@tgross35
Copy link
Contributor

#677 should release 0.1.122. rust-lang/rust#129481 is already enqueued but rust-lang/rust can be bumped again after that, and probably enable the x86 apple tests.

@beetrees beetrees deleted the f16-abi-intel-apple branch August 26, 2024 03:12
tgross35 added a commit to tgross35/rust that referenced this pull request Sep 27, 2024
These were disabled because Apple uses a special ABI for `f16`.
`compiler-builtins` merged a fix for this in [1], which has since
propagated to rust-lang/rust. Enable tests since there should be no
remaining issues on these platforms.

[1]: rust-lang/compiler-builtins#675
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 27, 2024
Enable `f16` tests on x86 Apple platforms

These were disabled because Apple uses a special ABI for `f16`. `compiler-builtins` merged a fix for this in [1], which has since propagated to rust-lang/rust. Enable tests since there should be no remaining issues on these platforms.

[1]: rust-lang/compiler-builtins#675

try-job: x86_64-apple-1
try-job: x86_64-apple-2
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 27, 2024
Enable `f16` tests on x86 Apple platforms

These were disabled because Apple uses a special ABI for `f16`. `compiler-builtins` merged a fix for this in [1], which has since propagated to rust-lang/rust. Enable tests since there should be no remaining issues on these platforms.

[1]: rust-lang/compiler-builtins#675

try-job: x86_64-apple-1
try-job: x86_64-apple-2
tgross35 added a commit to tgross35/rust that referenced this pull request Sep 28, 2024
These were disabled because Apple uses a special ABI for `f16`.
`compiler-builtins` merged a fix for this in [1], which has since
propagated to rust-lang/rust. Enable tests since there should be no
remaining issues on these platforms.

[1]: rust-lang/compiler-builtins#675
tgross35 added a commit to tgross35/rust that referenced this pull request Sep 30, 2024
Enable `f16` tests on x86 Apple platforms

These were disabled because Apple uses a special ABI for `f16`. `compiler-builtins` merged a fix for this in [1], which has since propagated to rust-lang/rust. Enable tests since there should be no remaining issues on these platforms.

[1]: rust-lang/compiler-builtins#675

try-job: x86_64-apple-1
try-job: x86_64-apple-2
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Oct 1, 2024
Rollup merge of rust-lang#130961 - tgross35:f16-x86-apple, r=thomcc

Enable `f16` tests on x86 Apple platforms

These were disabled because Apple uses a special ABI for `f16`. `compiler-builtins` merged a fix for this in [1], which has since propagated to rust-lang/rust. Enable tests since there should be no remaining issues on these platforms.

[1]: rust-lang/compiler-builtins#675

try-job: x86_64-apple-1
try-job: x86_64-apple-2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants