-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Number types' FromStr
impl should recognize Unicode minus
#130315
Comments
I don't think this would weigh in as a positive factor for us adding it to FromStr for the signed numeric types. The opposite, actually. |
It means you see this character in the wild. In #49746, someone said:
This could also happen with input users paste into Rust apps. Why would it even be an argument for the opposite?! |
Because if you copy it from a diff, then for a diff that looks like - 20
+ 45 these should not be parsed as the integers |
You have spaces there between sign and number. Even with a hyphen as a minus sign, this currently gives In any case, the end user would need to have a basic sense of what they're copying and where they're pasting it. |
@Enyium That was only for the sake of readability, a diff can also include -20
+45 |
I am only making this observation because I do think your request is reasonable, and I am slightly perplexed why you included extraneous data that seems like it could undermine the strength of your proposal.
There are many alternative numeric notations. There are many alternative "commercial minus signs", not just that one. Almost invariably, such graphemes tend to have many subtle variations or reuses. Extending I think it would be inappropriate for Rust to attempt to guess what exact cultural context that the |
I have no problem with
I can't follow you there. Nobody talked about the percent sign. You'd only see
At least supporting
That's what my issue is about. I don't know whether ⁒ is used as a sign for negative numbers or only an operator between operands. In the first case, and if it's never set off from the number with a space, maybe support would be warranted.
This stood out to me on SoundCloud. The font that they use for remaining play time has a relatively long dash; but it's just a hyphen code-point-wise. But in my perception, my statement holds true for the majority of fonts. |
Does it? I copied it out of the GitHub UI. |
Ah, I see. I suppose I misunderstood, then. Anyway, the problem with the "commercial minus sign" is that the glyphs that semantically mean commercial minus sign include e.g. △ and ▲ if Wikipedia is to be believed. But I know that above and beyond such a meanings, those glyphs definitely have a wide variety of other meanings attributed to them, including in the language which supposedly uses them as commercial minus signs (Japanese). And Wikipedia goes on to state this about the obelus-like symbol in question:
So regarding this:
I, personally, would hesitate to suggest that the Finnish deal in gibberish. |
Okay, it's rather strange that something defined as But could I win you over regarding the support for |
If that's the goal, it might make sense to use the Unicode compatibility equivalence relation between characters. UAX #15 §1.1:
That sounds like the property you're describing, and means we don't have to determine our own set. Instead, we would essentially parse from the NFKC normalization of the input string. I didn't check with any implementation, but visually searching UnicodeData.txt (I did not look at the context-sensitive mappings in SpecialCasing.txt) I believe:
Although further inspection shows that “smart quotes” aren't considered compatible with "straight quotes" either, so despite my first thought maybe NFKC isn't the correct data to be considering for this purpose after all. As a minimal bar, ICU does permit changing the character used as the negative affix for their number formatter; asking to recognize alternate negative signs is within the reality of what Unicode recognizes (but the default is still U+002D). I was, however, unable to locate information on what alternate minus sign affixes are actually in use by locale data, or Unicode information on parsing numbers from text instead of formatting numbers to text. The information probably exists and should be referenced here, but I ran out of time to continue looking for it. Footnotes |
In that regard I cannot help but agree. Human behavior is very strange. |
You can say the following about the character currently exclusively recognized as minus by
FromStr
impls:U+002D HYPHEN-MINUS : hyphen, dash, minus sign
(copied from BabelMap).In comparison, U+2212 is a dedicated minus sign:
U+2212 MINUS SIGN
−
for it.Benefits of adding support:
FromStr
implementations of number types (i32
,f64
etc.) would supportU+2212 MINUS SIGN
in addition toU+002D HYPHEN-MINUS
as a minus sign, UI frameworks, e.g., would have an easier time implementing text boxes that display the typographically more pleasing real minus sign, simply converting the text content to the corresponding number.Result
, wouldn't confuse end users anymore when they pasted a number with Unicode minus into the app and the app showed an error.I'm not familiar with this, but I want to point out that the Wikipedia article "Plus and minus signs" also talks about ⁒ as a minus sign (
U+2052 COMMERCIAL MINUS SIGN
). Perhaps, this should also be supported. But I don't know whether it's regularly set off from the number with some space character.The text was updated successfully, but these errors were encountered: