Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vector types are passed incorrectly on s390x #135744

Closed
folkertdev opened this issue Jan 19, 2025 · 7 comments · Fixed by #135785
Closed

vector types are passed incorrectly on s390x #135744

folkertdev opened this issue Jan 19, 2025 · 7 comments · Fixed by #135785
Labels
A-ABI Area: Concerning the application binary interface (ABI) C-bug Category: This is a bug. I-miscompile Issue: Correct Rust code lowers to incorrect machine code O-SystemZ Target: SystemZ processors (s390x) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@folkertdev
Copy link
Contributor

I tried this code:

#![feature(s390x_target_feature, portable_simd, simd_ffi, link_llvm_intrinsics)]
use std::simd::*;

#[allow(improper_ctypes)]
extern "C" {
    #[link_name = "llvm.smax.v2i64"]
    fn vmxg(a: i64x2, b: i64x2) -> i64x2;
}

#[no_mangle]
#[target_feature(enable = "vector")]
pub unsafe fn bar(a: i64x2, b: i64x2) -> i64x2 {
    vmxg(a, b)
}

I expected to see this happen: the vmxg instruction is emitted

Instead, this happened: the vmxb instruction is emitted


I did some digging, and I think I have the problem, and maybe a solution.

The problem

https://godbolt.org/z/YxK3Eo66x

in the godbolt, we see that the llvm.smax.v2i64 function gets an incorrect signature

declare <2 x i64> @llvm.smax.v2i64(<16 x i8>, <16 x i8>) unnamed_addr #1

(we would expect this instead)

declare <2 x i64> @llvm.smax.v2i64(<2 x i64>, <2 x i64>) unnamed_addr #1

So, somewhere along the way, type information is lost.

Where the problem shows up

We first see the incorrect argument type here

impl LlvmType for Reg {
fn llvm_type<'ll>(&self, cx: &CodegenCx<'ll, '_>) -> &'ll Type {
match self.kind {
RegKind::Integer => cx.type_ix(self.size.bits()),
RegKind::Float => match self.size.bits() {
16 => cx.type_f16(),
32 => cx.type_f32(),
64 => cx.type_f64(),
128 => cx.type_f128(),
_ => bug!("unsupported float: {:?}", self),
},
RegKind::Vector => cx.type_vector(cx.type_i8(), self.size.bytes()),
}
}
}

The Reg type does not have enough information to recompute the element type of the vector, and hence it defaults to type_i8. That is wrong!

Why Reg

That Reg (with insufficient type information) is used because of this logic in the calling convention code:

let size = arg.layout.size;
if size.bits() <= 128 && arg.layout.is_single_vector_element(cx, size) {
arg.cast_to(Reg { kind: RegKind::Vector, size });
return;

The PassMode for the arguments will be PassMode::Cast.

Solutions

I'm not (yet) familiar with the details of the s390x calling convention, so maybe this Cast has an essential function. But I think it would be fine to pass vector arguments using PassMode::Direct. That works for the example at hand anyway.

If not, then the element type information must be stored somewhere so that it can be retrieved later

cc @taiki-e @uweigand

@rustbot label O-SystemZ

@folkertdev folkertdev added the C-bug Category: This is a bug. label Jan 19, 2025
@rustbot rustbot added needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. O-SystemZ Target: SystemZ processors (s390x) labels Jan 19, 2025
@uweigand
Copy link
Contributor

Ah, that does indeed look wrong. You're right that vector types themselves could be passed "Direct". But this case also handles the case of an aggregate that has a single element which is of vector type - such aggregates should be transparently passed as their single element according to our ABI.

@taiki-e is there some way to extract the original type of that element and use it to perform the cast?

@folkertdev
Copy link
Contributor Author

Currently you can do slightly better by passing the non-wrapped vector types directly:

    // pass vectors (that are not wrapped in another structure) directly.
    // Using the `cast_to` approach below loses type information that is important to LLVM.
    if let BackendRepr::Vector { .. } = arg.layout.backend_repr {
        return;
    }

    let size = arg.layout.size;
    if size.bits() <= 128 && arg.layout.is_single_vector_element(cx, size) {
        // handles non-transparant wrappers around a vector
        arg.cast_to(Reg { kind: RegKind::Vector, size });
        return;
    }

That solves the problem with the LLVM intrinsics.

However, a wrapped vector would still have its element type erased. That still seems technically incorrect, but I'm not sure that would be observable (maybe when stitching LLVM together manually? sounds cursed). In any case, there is currently insufficient information to recover the element type, but the element type could be added to PassMode::Vector (together with the available total size of the argument, the full vector type could then be recovered).

@saethlin saethlin added A-ABI Area: Concerning the application binary interface (ABI) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-miscompile Issue: Correct Rust code lowers to incorrect machine code and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jan 20, 2025
@bjorn3
Copy link
Member

bjorn3 commented Jan 20, 2025

<16 x i8> and <2 x i64> should be indistinguishable on an ABI level, right? It is only LLVM intrinsics having different behavior depending on the exact type being passed that is the issue here. The easy fix would be using extern "unadjusted" when declaring the intrinsics. This causes all arguments and the return value to use PassMode::Direct no matter what the actual type is. It normally shouldn't be used, but calling LLVM intrinsics are the one use case where using them is fine.

@folkertdev
Copy link
Contributor Author

Cool, I didn't know about that trick.

We could still special-case the vectors though to get rid of some casting noise

    if let BackendRepr::Vector { .. } = arg.layout.backend_repr {
        return;
    }

which would get rid of the bitcasts here (and presumable at the call site too)

define <2 x i64> @bar(<16 x i8> %0, <16 x i8> %1) unnamed_addr {
start:
  %2 = bitcast <16 x i8> %0 to <2 x i64>
  %3 = bitcast <16 x i8> %1 to <2 x i64>
  %_0 = call <2 x i64> @llvm.smax.v2i64(<2 x i64> %2, <2 x i64> %3)
  ret <2 x i64> %_0
}

LLVM cleans that right up of course, but why not make rustc's and LLVM's job a little easier.

@bjorn3
Copy link
Member

bjorn3 commented Jan 20, 2025

    let size = arg.layout.size;
    if size.bits() <= 128 && arg.layout.is_single_vector_element(cx, size) {
        // handles non-transparant wrappers around a vector
        arg.cast_to(Reg { kind: RegKind::Vector, size });
        return;
    }

should be enough. is_single_vector_element already handles BackendRepr::Vector. And plain if let BackendRepr::Vector { .. } = arg.layout.backend_repr will attempt to use PassMode::Direct even when the vector is too big to be passed in a vector register.

FWIW clang also uses PassMode::Direct rather than PassMode::Cast for vectors: https://github.com/llvm/llvm-project/blob/5a7a3242639a17b049d70ee00798957ea21eb182/clang/lib/CodeGen/Targets/SystemZ.cpp#L426-L433

@folkertdev
Copy link
Contributor Author

is_single_vector_element already handles BackendRepr::Vector

It handles it, and it's not incorrect, but will emit a cast (and all of the noise that that brings, the llvm IR that rustc emits for it is quite verbose). My suggestion is to special-case non-wrapped vector types (where size.bits() <= 128) so that they are passed directly, so that would be

let size = arg.layout.size;
if size.bits() <= 128 { 
    if let BackendRepr::Vector { .. } = arg.layout.backend_repr {
        // pass non-wrapped vector types using `PassMode::Direct`
        return;
    }

    if arg.layout.is_single_vector_element(cx, size) {
        // pass non-transparant wrappers around a vector as `PassMode::Cast`
        arg.cast_to(Reg { kind: RegKind::Vector, size });
        return;
    }
}

Currently, passing everything caught by size.bits() <= 128 && arg.layout.is_single_vector_element(cx, size) as direct causes some test failures in tests/assembly/s390x-vector-abi.rs.

E.g. vector_wrapper_arg_small with PassMode::Direct uses lg instead of the currently expected vlgvg.

 vector_arg_small: 
           177:  vlgvg %r2, %v24, 0 
           178:  br %r14 
 vector_wrapper_arg_small: 
           213:  lg %r2, 0(%r2) 
           214:  br %r14 
#[repr(C)]
pub struct Wrapper<T>(T);

// CHECK-LABEL: vector_arg_small:
// CHECK: vlgvg %r2, %v24, 0
// CHECK-NEXT: br %r14
#[cfg_attr(no_vector, target_feature(enable = "vector"))]
#[no_mangle]
unsafe extern "C" fn vector_arg_small(x: i8x8) -> i64 {
    unsafe { *(&x as *const i8x8 as *const i64) }
}

// CHECK-LABEL: vector_wrapper_arg_small:
// CHECK: vlgvg %r2, %v24, 0
// CHECK-NEXT: br %r14
#[cfg_attr(no_vector, target_feature(enable = "vector"))]
#[no_mangle]
unsafe extern "C" fn vector_wrapper_arg_small(x: Wrapper<i8x8>) -> i64 {
    unsafe { *(&x as *const Wrapper<i8x8> as *const i64) }
}

@bjorn3
Copy link
Member

bjorn3 commented Jan 20, 2025

I guess it is fine to add a special case.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jan 25, 2025
…irect, r=bjorn3

use `PassMode::Direct` for vector types on `s390x`

closes rust-lang#135744
tracking issue: rust-lang#130869

Previously, all vector types were type erased to `Ni8`, now we pass non-wrapped vector types directly. That skips emitting a bunch of casting logic in rustc, that LLVM then has to clean up. The initial LLVM IR is also a bit more readable.

This calling convention is tested extensively in `tests/assembly/s390x-vector-abi.rs`, showing that this change has no impact on the ABI in practice.

r? `@taiki-e`
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jan 25, 2025
…irect, r=bjorn3

use `PassMode::Direct` for vector types on `s390x`

closes rust-lang#135744
tracking issue: rust-lang#130869

Previously, all vector types were type erased to `Ni8`, now we pass non-wrapped vector types directly. That skips emitting a bunch of casting logic in rustc, that LLVM then has to clean up. The initial LLVM IR is also a bit more readable.

This calling convention is tested extensively in `tests/assembly/s390x-vector-abi.rs`, showing that this change has no impact on the ABI in practice.

r? ``@taiki-e``
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jan 25, 2025
…irect, r=bjorn3

use `PassMode::Direct` for vector types on `s390x`

closes rust-lang#135744
tracking issue: rust-lang#130869

Previously, all vector types were type erased to `Ni8`, now we pass non-wrapped vector types directly. That skips emitting a bunch of casting logic in rustc, that LLVM then has to clean up. The initial LLVM IR is also a bit more readable.

This calling convention is tested extensively in `tests/assembly/s390x-vector-abi.rs`, showing that this change has no impact on the ABI in practice.

r? ```@taiki-e```
@bors bors closed this as completed in 61e572b Jan 26, 2025
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Jan 26, 2025
Rollup merge of rust-lang#135785 - folkertdev:s390x-vector-passmode-direct, r=bjorn3

use `PassMode::Direct` for vector types on `s390x`

closes rust-lang#135744
tracking issue: rust-lang#130869

Previously, all vector types were type erased to `Ni8`, now we pass non-wrapped vector types directly. That skips emitting a bunch of casting logic in rustc, that LLVM then has to clean up. The initial LLVM IR is also a bit more readable.

This calling convention is tested extensively in `tests/assembly/s390x-vector-abi.rs`, showing that this change has no impact on the ABI in practice.

r? ````@taiki-e````
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ABI Area: Concerning the application binary interface (ABI) C-bug Category: This is a bug. I-miscompile Issue: Correct Rust code lowers to incorrect machine code O-SystemZ Target: SystemZ processors (s390x) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants