allow unaligned reads with _mm_loadl_epi64 #584

brian-armstrong · 2018-10-27T19:37:13Z

As discussed in #582 it is "safe" to read unaligned memory locations with _mm_loadl_epi64. This code was suggested by scottmcm in the Rust Discord - I'm not familiar enough with this part of Rust to offer much commentary on it.

But to note, I have tried this with my code that uses this intrinsic and am seeing the behavior I'd expect to see.

gnzlbg · 2018-10-27T19:42:13Z

coresimd/x86/sse2.rs

@@ -1145,7 +1145,7 @@ pub unsafe fn _mm_setzero_si128() -> __m128i {
 )]
 #[stable(feature = "simd_x86", since = "1.27.0")]
 pub unsafe fn _mm_loadl_epi64(mem_addr: *const __m128i) -> __m128i {
-    _mm_set_epi64x(0, simd_extract((*mem_addr).as_i64x2(), 0))
+    _mm_set_epi64x(0, simd_extract(ptr::read_unaligned::<__m128i >(mem_addr).as_i64x2(), 0))


The type annotation here (::<__m128i>) shouldn't be necessary.

Maybe one can just use _mm_loadu_si128 here ?

Depends - in my case, I need to do a shuffle afterwards anyway, so it comes out to be about the same. If you just wanted to load the lower 64 bits though, then adding in an unnecessary shuffle afterwards would likely cost some performance. At any rate, the fact that the instruction, when seemingly used correctly, yields a segfault is fairly surprising.

The ptr::read_unaligned reads 128bits, and so does _mm_loadu_si128, or what am I missing? With a shuffle for the first 64 bits afterwards, LLVM should optimize this to a 64-bit load in both cases.

Oh, yeah, you're right. I actually don't think I know how to write this PR, now that I think about it.

You are doing great! I think you can replace the ptr::read_unaligned(mem_addr) with just _mm_loadu_si128(mem_addr) and that should just work.

gnzlbg

Thanks for the PR. There is only one nitpick, otherwise this LGTM.

brian-armstrong · 2018-10-27T20:07:31Z

So I cleaned up my example and got the right flags so that godbolt shows what I think is the right instructions. I'm not sure how to translate this into Rust though.

https://gcc.godbolt.org/z/EZv0qv

Notably,

_mm_loadl_epi64(...);
_mm_storel_epi64(...);

becomes

        mov     rax, qword ptr [rdi + 24]
        mov     qword ptr [rdi + 40], rax

It's weird that it takes a qword ptr because I think it really only is loading a dword here, though I guess I have nothing to base that on.

gnzlbg · 2018-10-27T20:10:59Z

It's weird that it takes a qword ptr because I think it really only is loading a dword here, though I guess I have nothing to base that on.

So the Intrinsics Guide says these should lower to a movq, but these intrinsics are so simple, that the code that they will lower to will heavily depend on what the code at the call site around them is doing because LLVM can easily reason about them.

When we merge this, you can try them in your app, and if you see unexpected code generation we can revise their implementation.

gnzlbg · 2018-10-27T20:23:28Z

Thanks!

brian-armstrong · 2018-10-27T20:49:49Z

So I'm looking at the trace in my actual program, and I believe the C version which uses _mm_loadl_epi64 is emitting movq while Rust (with this patch) is emitting vmovqdu. It's unclear if this is because llvm is unable to optimize the Rust version for some reason or if the C version has just correctly determined that only a quadword is needed, not a double quadword.

brian-armstrong · 2018-10-27T20:56:26Z

I wonder if the right code here isn't just

    _mm_set_epi64x(0, ptr::read_unaligned(mem_addr as *const i64))

gnzlbg · 2018-10-27T21:19:37Z

That could work too. Could you modify your stdsimd copy locally, and use it in your project, and report whether that works better?

brian-armstrong · 2018-10-27T22:52:04Z

Yes, it does emit movq. It's hard to say whether that's generally faster, but it does seem closer to the original spirit of the intrinsic.

allow unaligned reads with _mm_loadl_epi64

2297b70

gnzlbg reviewed Oct 27, 2018

View reviewed changes

gnzlbg suggested changes Oct 27, 2018

View reviewed changes

no type annotation

f6ae521

gnzlbg merged commit 635a995 into rust-lang:master Oct 27, 2018

marmeladema mentioned this pull request May 16, 2021

Potential inconsistency between _mm_loadl_epi64 and _mm_loadu_si64 #1166

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow unaligned reads with _mm_loadl_epi64 #584

allow unaligned reads with _mm_loadl_epi64 #584

brian-armstrong commented Oct 27, 2018 •

edited

Loading

gnzlbg Oct 27, 2018

brian-armstrong Oct 27, 2018

gnzlbg Oct 27, 2018 •

edited

Loading

brian-armstrong Oct 27, 2018

gnzlbg Oct 27, 2018

gnzlbg left a comment

brian-armstrong commented Oct 27, 2018

gnzlbg commented Oct 27, 2018

gnzlbg commented Oct 27, 2018

brian-armstrong commented Oct 27, 2018

brian-armstrong commented Oct 27, 2018

gnzlbg commented Oct 27, 2018

brian-armstrong commented Oct 27, 2018

allow unaligned reads with _mm_loadl_epi64 #584

allow unaligned reads with _mm_loadl_epi64 #584

Conversation

brian-armstrong commented Oct 27, 2018 • edited Loading

gnzlbg Oct 27, 2018

Choose a reason for hiding this comment

brian-armstrong Oct 27, 2018

Choose a reason for hiding this comment

gnzlbg Oct 27, 2018 • edited Loading

Choose a reason for hiding this comment

brian-armstrong Oct 27, 2018

Choose a reason for hiding this comment

gnzlbg Oct 27, 2018

Choose a reason for hiding this comment

gnzlbg left a comment

Choose a reason for hiding this comment

brian-armstrong commented Oct 27, 2018

gnzlbg commented Oct 27, 2018

gnzlbg commented Oct 27, 2018

brian-armstrong commented Oct 27, 2018

brian-armstrong commented Oct 27, 2018

gnzlbg commented Oct 27, 2018

brian-armstrong commented Oct 27, 2018

brian-armstrong commented Oct 27, 2018 •

edited

Loading

gnzlbg Oct 27, 2018 •

edited

Loading