Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the width function of char return u32 #23539

Closed
wants to merge 1 commit into from

Conversation

tbu-
Copy link
Contributor

@tbu- tbu- commented Mar 20, 2015

The width of a Unicode codepoint is not related to some in-memory buffer.

@rust-highfive
Copy link
Collaborator

r? @huonw

(rust_highfive has picked a reviewer for you, use r? to override)

@@ -7603,13 +7603,13 @@ pub mod charwidth {
}
}

pub fn width(c: char, is_cjk: bool) -> Option<usize> {
pub fn width(c: char, is_cjk: bool) -> Option<u32> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe these files should be changed by changing src/etc/unicode.py, not by hand-editing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@tbu- tbu- force-pushed the pr_unicode_width branch from 5f7ac0f to 66a98ec Compare March 20, 2015 01:13
@@ -77,7 +77,7 @@ impl UnicodeStr for str {
fn is_alphanumeric(&self) -> bool { self.chars().all(|c| c.is_alphanumeric()) }

#[inline]
fn width(&self, is_cjk: bool) -> usize {
fn width(&self, is_cjk: bool) -> u32 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this seems like somewhat of an edge case: the width of a string will typically be fairly similar (at least, proportional) to its size in memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mh, maybe this is true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that calls for u64, for integers that do not clearly fall in the range of u32?

@petrochenkov
Copy link
Contributor

IMO, it's a mistake.
Yes character width is small and can fit even in u8 but it is represents string length in some sense and it is always used in context of string lengths, so people will have to white as usize pretty much every time they use this function.
The same mistake was made with constants like u64::BYTES, which are certainly small but always used in usize context, so their change of the type to u32 led only to as usize noise.

@tbu-
Copy link
Contributor Author

tbu- commented Mar 20, 2015

@petrochenkov I believe that this is different from things like u64::BYTES – this is some logical length and not some length in memory. The one use that was in rustc was better off with u32 anyway, they counted columns, not something in memory. The need for as usize came only from the fact that iterators only accept usize for the .take() parameter (which is btw entirely unrelated to memory size too).

@bors
Copy link
Contributor

bors commented Mar 28, 2015

☔ The latest upstream changes (presumably #23796) made this pull request unmergeable. Please resolve the merge conflicts.

@huonw
Copy link
Member

huonw commented Apr 6, 2015

r? @aturon (transferring reviewership, don't have the bandwidth right now.)

@rust-highfive rust-highfive assigned aturon and unassigned huonw Apr 6, 2015
The width of a Unicode codepoint is not related to some in-memory buffer.
@tbu- tbu- force-pushed the pr_unicode_width branch from 66a98ec to b476bd2 Compare April 6, 2015 15:54
@tbu-
Copy link
Contributor Author

tbu- commented Apr 6, 2015

@aturon Rebased.

@aturon
Copy link
Member

aturon commented Apr 8, 2015

@tbu- I feel like this is definitely an edge case, and probably ergonomic concerns (about the integer contexts in which you might be using this) apply.

Can you think of any way to gather data on this? It's very difficult to make this decision in isolation.

@bors
Copy link
Contributor

bors commented Apr 18, 2015

☔ The latest upstream changes (presumably #24428) made this pull request unmergeable. Please resolve the merge conflicts.

@alexcrichton
Copy link
Member

This function is now deprecated in favor of an external crate, so I'm going to close this for now. Feel free to open a PR against that crate though!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants