-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should we / can we make MaybeUninit<T> always preserve all bytes of T (including padding)? #518
Comments
I'll add that a typed copy of an uninitialized variable is UB in C, so there's no need to promise any ABI for FFI compatibility, |
In C I think this is not true for char types. But yeah for most types you cannot pass them uninit by value.
|
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
The PR that introduced the guarantees does not talk about padding, and it seems like that wasn't really understood back then. The t-lang minutes discussing this are lost to time and reorganizations, but it seems doubtful that such a consideration was raised. Discussions from 2018 raise a lack of real-world use cases for ABI compatibility, and I agree with such a sentiment in the present. I don't this this would be approved nowadays, but I am incredibly apprehensive about removing it. There are few places in the Rust documentation that use always for guarantees like this, and the use cases for some weird FFI thunks or bindings would be nigh-impossible to properly test with crater or similar... |
This comment was marked as off-topic.
This comment was marked as off-topic.
@carbotaniuman I think we should consider removing it. If we can't come up with any legitimate usecase, I think we should definitely remove it. I don't like going back on a promise like this, but if we don't have a usecase that could be broken by taking back this promise, then the chances that someone is affected should be very slim. @Diggsey thanks for explaining why you think this belongs in this thread. But I disagree. "MaybeUninit preserves provenance" is not relevant here. You will note that provenance does not appear in the issue description. Furthermore, provenance on CHERI works like it does everywhere else, so even if provenance were relevant, CHERI wouldn't change anything. It is true that you can write code with |
If we do agree to remove the guarantee, I expect it to break 0 uses in practice. My only other concern would be the performance impact of having to copy more bytes. It probably won't affect SIMD or buffers though, so I don't really think that's it's really an issue. |
I think we should preserve the memory layout compatibility, but drop the calling convention compatibility. That could be done using |
FTR, I use However, for compatibility with gcc/clang, they have to expose an ABI equal to the rountines using primitives. |
(And in general, I agree with @carbotaniuman - unless crater is testing all kinds of targets, I'm betting it primarily tests x86_64, where aggregate-of-one-field will get passed the same way as that one field*, so without using miri-crater, the ABI checks won't be found by crater. If the code is used on something like arm32 though, it's going to be very visibility broken) |
Yes, this is super hard to test for. I wonder if it's worth having a blog post asking people whether they need this guarantee...
Yes, concretely the proposal would be:
Or maybe "no padding" should be restricted a bit further, like "if |
Is the only motivation to backing up on that promise is the fact that this is a frequent source of confusion? Which benefits except clarity can Rust gain? |
We never intended |
I'm pretty sure this is still the case, but it might be worth it to enumerate things that ARE still allowed for this wrt FFI/ABI concerns. My primary use of |
Yes, ABI compatibility is about the "by-value" part of a function argument or return type. That's how we've consistently been using this term for a while now, also see our glossary and the documentation on ABI compatibility. In public communication we'll obviously spell out the details more than in internal discussion. ("Internal" not as in "private" but as in "among the team members and anyone else who's willing to participate".) |
I at the very least need target simd types as well - for floating-point types that aren't directly supported by rust (e.g. |
Since that is a compiler-internal concern, you could also do this by providing more ABI guarantees than what Rust provides in general. But that case would be covered by "types without padding", or we could explicitly mention the stdarch SIMD types (since they are all powers of 2). |
Not fully - you don't necessarily need to compile the rtlibs with lccc themselves, they're written in mostly portable rust, and quite deliberately. I'd like to be able to continue providing that guarantee.
You can also now see a formalization in reference#1545, as a note. |
This has caused an actual soundness bug now: rust-lang/rust#134713. I think we should seriously consider restricting the ABI compatibility guarantee to scalar and SIMD types. |
The issue with that is that will make it hard to represent "Maybe Initialized" aggregate types in ABIs. Footnotes
|
Yeah I don't think a by-value ABI-compatible "maybe init" aggregate is common enough to justify the constant stream of surprises and UB that this problem causes. I would suggest not designing your OS around such a facility. |
I had a longer post here that I since felt was too confrontational, but my thoughts have changed and I do not believe that solely changing this is justifiable given that this is not a breaking change for soundness, but merely to make the use of the API better for users. Unsafe Rust already has multiple sharp edges (SB/TB, Box noalias, provenance), and I feel like this is not a particularly sharp one once users understand it. I would also like to echo the alternative of a new type |
I might be missing something obvious, but isn't |
What properties does this type have? |
|
The issue is that the design isn't simply "Whether or not to use |
You would probably also want alignment based on T, at least if the goal is to match C struct layout. |
That's also expressible today AFAICT: type AbstractBytes<T> = [MaybeUninit<u8>; size_of::<T>()];
#[repr(C)]
struct AlignedAbstractBytes<T>
where
[(); size_of::<T>()]:,
{
aligned: [T; 0],
abstract_bytes: AbstractBytes<T>,
} The idea is that Rust already provides a solution for "bag" (array, possibly within a C struct for alignment) and "bytes" ( So forget my previous comment and this one as well. This issue is not about language expressivity, it's about vocabulary types. Do we want to fix |
Most C types must be initialized, so I really don't understand your example here. Can you give the simplest self-contained C signature that you think cannot be bound on Rust without this ABI guarantee?
Do we have an example of in-the-wild usage that relies on the ABI compatibility? |
In general, I already use this to represent the signature of (As mentioned, |
It is not currently possible to implement the semantics of |
@RalfJung Where is the documented guarantee that MaybeUninit preserves padding bytes? |
Sorry, I don't follow. Please strip all the detail that's unnecessary for this thread and focus on the actual question: passing some uninitialized data across the ABI. You seem to be saying you want the If it's just about leaving the data the pointer points to uninitialized, then that's entirely off-topic for this discussion as the ABI does not care about that.
We don't document this. But people still think it is true, I've myself thought it is true, and it is quite reasonable to think it is true. It's also a much more useful guarantee than the ABI compatibility one. We're still looking for even a single use-case that needs the ABI-compatibility for non-scalar types, whereas we already had a standard library correctness bug due to the lack of padding byte preservation. |
So the signature I need to actually call looks more like |
Uh, then, I'd suggest you call it with a Rust function pointer / declaration that has the matching number of parameters? It's anyway UB if caller and callee disagree on the number of parameters. |
Its the same vein as calling |
If you are anyway already causing technical UB by having the signature not match, you shouldn't be worried about venturing into "unspecified" land here either. Just have a version of |
Did these standard library implementations surface this bug before or after you recommended changing the implementations of these things to use MaybeUninit? |
That sounds like you are trying to insinuate something -- I don't know exactly what you are referring I don't know what I recommended when, but using Let's not discuss the history of rust-lang/rust#134713 here please; there's already a separate issue for that. The fact is that that implementation made it in (neither authored nor reviewed by me), which is a pretty clear sign that "bag of bytes" semantics are (a) useful, and (b) easy to assume to already be the current semantics even if they are not. |
Ffor the record here - the call that I'm worried about is going to be to the Rust trampoline - which does always have the same parameters. Then I call the actual But in the case of handler, the ABI level is very well-defined (even if the Rust Level isn't). Whereas wrapping |
Also I speak of a position of being involved in the discussion here - a third party having absolutely zero idea this is happening may have just as much cause to rely on a stable language guarantee. And given much of this code just won't even run in miri (miri won't even run winter-lily, which is my current project touching Lilium), this is probably in the realm of "Breaks silently, until it doesn't". |
Changing how pointers to MaybeUninit works does appear to be in the proposal, and MaybeUninit integers would also specifically not be affected, so I guess you're concerned that people are passing |
Yes I am proposing to take back a documented language guarantee, and replace it with a different, more useful guarantee. I think long-term this will cause less harm. So I was asking if there's any cases where the ABI guarantee is useful or even needed. I am still extremely confused about your example. You keep bringing up more and more concepts you're not explaining and there's too many parties calling each other so it's not even clear which call you are talking about when. When I think I understand is that there's a particular function call where the caller uses signature Could you achieve your goals without the ABI guarantee (and without worsening performance)? And if yes, would that solution be any less "natural" than what you are currently doing? Frankly, based on what you said so far, any possible alternative seems more natural to me. ;) |
(sorry for any confusion Ralf, but my own question was directed at chorman) |
Ok, I'll restate what the flow is:
|
@RalfJung You are the Pope of Rust, or at least the Pope of Rust Safety Models. Whenever you say something, even if you say something blatantly wrong, almost no one calls you out on it. You are the proxy author and reviewer of all std code because everyone is reading everything you are writing and thinking about it when writing unsafe code. You are repeatedly cited in these discussions. If you repeatedly say something wrong, you can convince other people it's true, simply by repeating the wrong thing. |
I do hope people call me out for my nonsense when I lose my marbles. ;) I am confident our libs reviewers don't just take my word as gospel.
Or should I make the argument that if the semantics are too surprising for the Pope, maybe we should change them? ;)
|
That is a reasonable stance, honestly. I just am not surprised incorrect code is written based on something you say again and again, and don't think it should be taken as evidence the confusion is that widespread if it might instead be your confusion spreading widely. |
I don't think I said this "again and again". But maybe I purged that from my memory?
|
Okay. It sounds to me like even if
However, I think we can do better. Even if That should be enough of a guarantee that you could keep doing what you're currently doing. Since you're setting up the call manually, all that matters is that there is some stable ABI, not what it is exactly. The only issue would be if (a) you are worried about breaking an existing stable ABI boundary (for your apparently still-in-development OS), and (b) the stable ABI rustc adopts for Beyond that, if …With all that said, I am definitely sympathetic to the alternative view that |
I think this breaking change is not justified by the stated goal, which is expressly not any inherent unsoundness, but just a desire to reduce a paper cut for unsafe Rust users. I think that's a great goal (and to be clear, I support it myself), but unsafe code is tricky, with many corner cases. Box noalias remains on the books today despite being a far larger footgun. We have not yet ruled out Stacked Borrows as an aliasing model! In addition, this creates a special case in the ABI compatibility rules. Windows until a few months ago, passed Making I think it's also disingenuous to act like the use cases here are contrived and useless - this is a documented language feature, not I am also confused as to why my compatibility ideas seemed to have been (intentionally?) ignored. To me, |
I think you will be hard-pressed to find a case where this makes a measurable difference, even if you specifically microbenchmark the function call. I'm neutral regarding the rest of your post. |
It is a frequent source of confusion that
MaybeUninit<T>
is not just preserving all the underlying bytes of storage, but actually ifT
has padding then those bytes are lost on copies/moves ofMaybeUninit<T>
.This is currently pretty much a necessary consequence of the promise that
MaybeUninit<T>
is ABI-compatible withT
: some ABIs don't preserve the padding ofT
when it is passed to a function. However, this was not part of the intention withMaybeUninit
at all, it is something we discovered later.Maybe we should try to take this back, and make the guarantee only for types without padding?
I am not even sure why we made this a guarantee. We made the type
repr(transparent)
because for performance it is quite important thatMaybeUninit<$int>
becomes just aniN
in LLVM. But that doesn't require a stable guarantee. And in fact it seems like it would almost always be a bug if the caller and callee disagree about whether the value has to be initialized. So I would be curious about real-world examples where this guarantee is needed.The text was updated successfully, but these errors were encountered: