-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow unaligned reads with _mm_loadl_epi64 #584
allow unaligned reads with _mm_loadl_epi64 #584
Conversation
coresimd/x86/sse2.rs
Outdated
@@ -1145,7 +1145,7 @@ pub unsafe fn _mm_setzero_si128() -> __m128i { | |||
)] | |||
#[stable(feature = "simd_x86", since = "1.27.0")] | |||
pub unsafe fn _mm_loadl_epi64(mem_addr: *const __m128i) -> __m128i { | |||
_mm_set_epi64x(0, simd_extract((*mem_addr).as_i64x2(), 0)) | |||
_mm_set_epi64x(0, simd_extract(ptr::read_unaligned::<__m128i >(mem_addr).as_i64x2(), 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type annotation here (::<__m128i>
) shouldn't be necessary.
Maybe one can just use _mm_loadu_si128
here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends - in my case, I need to do a shuffle afterwards anyway, so it comes out to be about the same. If you just wanted to load the lower 64 bits though, then adding in an unnecessary shuffle afterwards would likely cost some performance. At any rate, the fact that the instruction, when seemingly used correctly, yields a segfault is fairly surprising.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ptr::read_unaligned
reads 128bits, and so does _mm_loadu_si128
, or what am I missing? With a shuffle for the first 64 bits afterwards, LLVM should optimize this to a 64-bit load in both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yeah, you're right. I actually don't think I know how to write this PR, now that I think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are doing great! I think you can replace the ptr::read_unaligned(mem_addr)
with just _mm_loadu_si128(mem_addr)
and that should just work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. There is only one nitpick, otherwise this LGTM.
So I cleaned up my example and got the right flags so that godbolt shows what I think is the right instructions. I'm not sure how to translate this into Rust though. https://gcc.godbolt.org/z/EZv0qv Notably,
becomes
It's weird that it takes a qword ptr because I think it really only is loading a dword here, though I guess I have nothing to base that on. |
So the Intrinsics Guide says these should lower to a When we merge this, you can try them in your app, and if you see unexpected code generation we can revise their implementation. |
Thanks! |
So I'm looking at the trace in my actual program, and I believe the C version which uses |
I wonder if the right code here isn't just
|
That could work too. Could you modify your stdsimd copy locally, and use it in your project, and report whether that works better? |
Yes, it does emit |
As discussed in #582 it is "safe" to read unaligned memory locations with
_mm_loadl_epi64
. This code was suggested by scottmcm in the Rust Discord - I'm not familiar enough with this part of Rust to offer much commentary on it.But to note, I have tried this with my code that uses this intrinsic and am seeing the behavior I'd expect to see.