Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[proof of concept] give String support for CoW wrapping of literals #46993

Closed
wants to merge 4 commits into from

Conversation

Gankra
Copy link
Contributor

@Gankra Gankra commented Dec 25, 2017

This is a proof of concept implementation for people who want to test this out. It isn't fully tested, and the RawVec changes probably need more work. In addition the implementation isn't as optimized as it could be (see FIXME's).

This adds a String::literally constructor that wraps up a literal without allocating. It in turn makes Vec partially support this (with more robust capacity checking) and RawVec partially support this (by copying the source if len > cap). String also gains some make_unique guards for methods which don't naturally do this themselves (ones which make mutable views or mutate the middle of the buffer).

@rust-highfive
Copy link
Collaborator

r? @sfackler

(rust_highfive has picked a reviewer for you, use r? to override)

@Gankra
Copy link
Contributor Author

Gankra commented Dec 25, 2017

cc @withoutboats @aturon

@Gankra
Copy link
Contributor Author

Gankra commented Dec 25, 2017

How to test this: replace instances of:

  • String::from("...")
  • "...".to_string()
  • "...".to_owned()

with String::literally("...")

@kennytm kennytm added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Dec 25, 2017
Ok(ptr) => (new_cap, ptr),
Err(e) => self.a.oom(e),
if used_cap == 0 {
// skip to 4 because tiny Vec's are dumb; but not if that
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it "Vecs", not "Vec's"?

You only need the apostrophe if the "Vec" is possessing something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a pre-existing comment, but yes

@@ -384,6 +384,38 @@ impl String {
String { vec: Vec::new() }
}

#[inline]
#[unstable(feature = "lit_strings", reason = "its TOO lit", issue = "42069")]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to change this to something more informative.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't intended to be landed any time soon, so I opted for a joke :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't intended to be landed any time soon

Do you say this because there is more implementation work to be done, or because it hasn't been decided that this is a good thing? (I'm pretty out of the loop.) If the former, cool. If the latter, has it been discussed somewhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a thread on internals about landing this feature as a coercion (rather than a method): https://internals.rust-lang.org/t/pre-rfc-allowing-string-literals-to-be-either-static-str-or-string-similar-to-numeric-literals/5029

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh hey, that's pretty cool! Thanks.

@Manishearth
Copy link
Member

Manishearth commented Dec 25, 2017

https://github.com/manishearth/rust/tree/literally-sed has it all sed-ed. The sed commands are in each commit (gsed is GNU sed, which is default on most distros but not on OSX)

There's a ^\([^\/]*\)" in it so that we don't do replacements on doc comments.

@shepmaster
Copy link
Member

I don't really know how to triage this. You uh... hmm. Uh, what do you think we should do?

@aturon
Copy link
Member

aturon commented Jan 6, 2018

@shepmaster I'm gonna mark as waiting on (libs) team. cc @rust-lang/libs

@aturon aturon added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 6, 2018
@aturon
Copy link
Member

aturon commented Jan 9, 2018

@gankro

The libs team discussed this PR some today, and the following points came up:

  • What happened to concerns around unsafe code making assumptions about the capacity? Wasn't that the killer last time around?
  • People are generally fine with the DerefMut implementation potentially allocating, but this would mean that String would not implement DerefPure, if that becomes a thing. It was unclear exactly what the implications would be.
  • More generally, are we concerned about code assuming a stable address?

Modulo these concerns, no one saw a blocking objection, and we'd love to see this proceed to an RFC.

@Gankra
Copy link
Contributor Author

Gankra commented Jan 9, 2018

We only found examples of people assuming vec's capacity. We halted progress because it was possible to use String in the same way. It still is. That's one of the reasons this PR exists: so we can test this change out on other crates.

Sadly it's hard to cargo-bomb this since nothing in the ecosystem uses it, so nothing would trip it!

I am vaguely considered about stable address assumptions, but also we could reasonably just phase it out of the ecosystem, like we've done for other implementation-defined assumptions that we've broken (heap::EMPTY).

@ghost
Copy link

ghost commented Jan 9, 2018

What happened to concerns around unsafe code making assumptions about the capacity? Wasn't that the killer last time around?

Unless I misunderstood something, the assumption in question is s.len() <= s.capacity().

This PR breaks the assumption, but I believe it's easy to avoid breaking it. The trick is to implement String::capacity as:

fn capacity(&self) -> usize {
    cmp::max(self.vec.capacity(), self.vec.len())
}

This way the assumption is still valid, and we can still internally disambiguate owned and literal strings by checking self.vec.capacity() == 0.

@Gankra
Copy link
Contributor Author

Gankra commented Jan 9, 2018

No, this will break anyone using the _raw apis.

@withoutboats
Copy link
Contributor

withoutboats commented Jan 9, 2018

More generally, are we concerned about code assuming a stable address?

Rental assumes that string's derefmut derefs to a stable address. I spoke briefly with the author once; rental has a StableDeref trait that would just need to gain a StableDerefMut pair that isn't implemented for String. This would break rental's current assumptions - but its only relevant for the ascii mutation in-place methods, because they're the only methods using &mut str. In other words, there's probably no code in existence which actually makes use of rental's broken assumption about String.

I'm really eager to see this change, but I think it needs to be epoch gated.

@hanna-kruppe
Copy link
Contributor

I thought library changes can't be epoch-specific, because crates of all epochs share the same standard library (and other libraries, for that matter)?

@withoutboats
Copy link
Contributor

withoutboats commented Jan 9, 2018

I thought library changes can't be epoch-specific, because crates of all epochs share the same standard library (and other libraries, for that matter)?

We can introduce new APIs only in a new epoch; in this case String::literally. So it wouldn't be possible to construct a String that violates the current assumptions on the 2015 epoch (even though the internal implementation would still support that possibility).

@CryZe
Copy link
Contributor

CryZe commented Jan 9, 2018

Can't such a String get moved into a library that is written against the 2015 epoch though?

@withoutboats
Copy link
Contributor

Can't such a String get moved into a library that is written against the 2015 epoch though?

Absolutely.

To be clearer, I don't think we need an epoch to change this (from a legalistic perspective). Rental is currently making an assumption that String upholds a particular invariant that we haven't really guaranteed. However, rental - and possibly other unsafe code - is making that assumption, and we need to decide what to do about it.

Often, we will say our hands are tied by the unsafe code that exists and is relying on this thing, so we now guarantee this thing we didn't originally say we would. I don't think that needs to be the case here, since the guarantee failing can only be observed in very particular edge cases - assuming that calling to_ascii_(upper|lower)case will not change the address of the string from what it was before. Rental can make a breaking change for this little detail and all will be well.

However, I'd like to very clearly and loudly message to users about the fact that these assumptions about String are now invalidated. One way to do that is have it be one of the changes we're making in the epoch transition. So I'd like to use the epoch as a social force in amplifying information about this potential incorrect assumption that as far as we know no one is actually relying on.

@bors
Copy link
Contributor

bors commented Jan 20, 2018

☔ The latest upstream changes (presumably #46952) made this pull request unmergeable. Please resolve the merge conflicts.

@carols10cents carols10cents added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Jan 22, 2018
@shepmaster shepmaster added the S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). label Jan 26, 2018
@shepmaster shepmaster added I-nominated and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 26, 2018
@shepmaster
Copy link
Member

shepmaster commented Jan 26, 2018

I'm going to unilaterally move this to be nominated for discussion by the team, otherwise it seems like this will just languish forever.

(or you know, what @aturon said in response to me earlier that I forgot about...)

@aturon
Copy link
Member

aturon commented Jan 27, 2018

@shepmaster I'm going to go a step further and close this PR -- not because we don't want it, but because the next step is an RFC.

I don't have the bandwidth to write such an RFC myself, but I'd really like to see this at least attempted. Is there anyone who would like to take on an RFC with me mentoring/serving as backup?

@aturon aturon closed this Jan 27, 2018
@Manishearth
Copy link
Member

Manishearth commented Jan 28, 2018 via email

@Gankra
Copy link
Contributor Author

Gankra commented Jan 28, 2018

I don't have any interest in working on an RFC. I made this PoC so ergonomics people couldn't complain to me about implementation issues anymore. Yer problem now!

@withoutboats
Copy link
Contributor

I'm interested in shepherding (but not writing) an RFC that adds this as a coercion (rather than the String::literally constructor), gated on the 2018 epoch.

@SimonSapin
Copy link
Contributor

Why not both a coercion and an explicit contructor? Coercion sounds like it’d work from any &'static str value, is that what you had in mind or only from literals like integers? Why epoch-gated?

@withoutboats
Copy link
Contributor

I don't really have an objection to having a constructor. I laid out an argument in favor of epoch gating a few posts up. More generally these seem like good questions for an RFC thread 😉.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.