Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Acceptable costs for the removal of unsafe code? #648

Open
dilr opened this issue Jan 20, 2025 · 3 comments
Open

Acceptable costs for the removal of unsafe code? #648

dilr opened this issue Jan 20, 2025 · 3 comments

Comments

@dilr
Copy link
Contributor

dilr commented Jan 20, 2025

I would like to remove instances of unsafe in the code-base that aren't essential for performance or functionality. As per the contributor guidelines, this isn't a bug fix, so I'd like to ask what sort of changes would be acceptable in a PR before I do too much work. I've already done a bit just to see if it is even worthwhile, so I have some idea of what I might ask for:

  1. May I turn transitive dependencies into direct dependencies? See example 1 if this is unclear
  2. May I pull in small direct dependencies that Uiua does not currently transitively depend on? See example 3 for why.
  3. How much may I change to the structure of the code? I ask as I realize this is mostly a one person project, and changes to the structure of the code imposes a burden on you as a reviewer. See example 2 for the sort of thing I am thinking of.
  4. How important is debug mode performance? I fully intend to keep or improve release mode performance by checking that the llvm-ir or assembly does the same thing as the old unsafe code. (for structural changes at least) However, the debug mode code won't always work the same. See example 2 again for details. I can benchmark on the tests to make sure performance doesn't completely tank in debug mode, but I don't think all functionality and usage patterns are covered by the tests.

Examples of changes I've tried or want to try:

  1. bytemuck is directly or transitively used by 'cosmic-text', 'skrifra', 'arboard', and 'viuer'. All of these are optional dependencies either for features or for display to the terminal or graphical window. If I pull bytemuck in as a direct dependency, I can safely remove all but one instance of mem::transmute. This requires no structural changes to the code.
  2. I can avoid the need to do a pointer cast from a value::Value to a value::Repr which is currently done in Value::shape, Value::meta_mut, and several similar functions. I can do this by moving the logic for the management of array::Array.meta into a wrapper object ArrayMetaHolder (to monomorphize the meta pointer management code), and then using val_as_array!(self, |arr| &mut arr.meta).meta_mut() and friends to get the shape and metadata from the array inside the value. I've checked the llvm-ir emitted by the rust compiler in release mode for several of the functions which inline Value::meta_mut and friends. I can confirm that in release mode the compiler realizes that checking the discriminant of Value is unnecessary and unconditionally offsets the Value pointer to the location of the shape or meta fields, which I assume was the point of casting Value to Repr in the first place. However, in debug mode what is generated is a switch on the discriminant to a bunch of identical blocks of code which all set the pointer value to the same thing. After the switch statement ArrayMetaHolder::meta_mut and friends are called if needed.
  3. The last transmute bytemuck cannot eliminate is in Array::convert. It checks if we are converting to an array of the same type via TypeId, and transmutes Array<T> to Array<U> if T == U. This is essentially a specialization of the convert function when T == U. If I could pull in the castaway crate, I could replace this specialization with safe code. (To be clear, I haven't tried this yet) It also means that further specializations of generic functions for concrete types could be added to the codebase without needing to worry that much if the specialization is sound. See the discussion in the rust issue tracker about the 'min-specialization' and 'specialization' unstable features for why this sort of thing is tricky to do soundly in general. The castaway crate implements a known safe subset of specializations.
@kaikalii
Copy link
Member

For example 1: Can you point to where this would allow the removal of unsafe code? It's unclear to me.

For example 2: The whole point of value::Repr is to not do a check of the Value's type in order to get its shape and metadata. Moving that to use var_as_array! would end up doing that check.

In general, while I try to avoid unsafe code, I'm not against using it when it's useful. I don't really see the need to pull in dependencies just to eliminate some uses of it. Having absolutely no uses of unsafe in the codebase is not a goal of mine.

@dilr
Copy link
Contributor Author

dilr commented Jan 20, 2025

First of all, I agree that it probably doesn't make sense to eliminate all uses of unsafe. My interest is more in reducing the number to make the remaining unsafe code easier to audit.

For example 1:

  1. There are 3 float constants which use transmute for creation.
  2. The fft functionality uses transmute to convert slices of Uiua's Complex type to num_complex's Complex64 type.
  3. derive_undicies uses a transmute to convert vecs from i8 to u8.

Again, I think these are sound (as long as num_complex doesn't change), so the sole purpose is just to eliminate the keyword. bytemuck is already used by several libraries Uiua uses, so it kind of is and kind of isn't already a dependency of Uiua. When you say you don't want to take on a dependency, do you mean that you don't want to add previously unvetted code to the supply chain? or do you mean you don't want to think about more APIs? or something else?

For example 2:

In terms of code semantics, the change I described does add an extra switch statement. This is indeed present in the llvm-ir for debug mode. However, in release mode the compiler sees that all branches result in the same instructions, and optimize out the branching. I'd expect this to be stable across compiler versions since it doesn't rely on inlining triggering, and the number of instructions involved is quite small.

So I would expect it to have some small impact in debug mode, and none in release mode. This is why I was asking how important debug mode performance is.

@kaikalii
Copy link
Member

If you find some good uses of bytemuck, I'd likely accept a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants