-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Count char width at most once in Formatter::pad
#136662
Conversation
2069815
to
1d26e00
Compare
Hello! Thanks for the ping, but it's also been ~12 years since I last wrote this so I'm probably no longer any better than anyone else per se to take a look at this. I'm going to reroll this to someone else on the libs team: r? libs |
Thanks for the gracious reply, Alex. @Amanieu, would you mind taking a look at this? |
r? @m-ou-se |
library/core/src/fmt/mod.rs
Outdated
let mut iter = s.chars(); | ||
let char_count = iter.by_ref().take(max_char_count).count(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s.chars().count()
has a special implementation (see https://doc.rust-lang.org/1.84.0/src/core/str/iter.rs.html#46 and https://doc.rust-lang.org/1.84.0/src/core/str/count.rs.html) that is much faster than iterating through all the chars one by one.
s.chars().by_ref().take(_).count()
on the other hand goes through the Take
adapter, which does not use this optimized counting implementation.
This means that with this change, even a simple println!("{}", some_string)
will now pull in the code for UTF8 decoding, blowing up binary size. And it'll be a bit slower, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though perhaps s.char_indices().nth(max)
already pulled in that code anyway. That's still an opportunity for improvement.
Regardless, we should use the optimized char counting algorithm when there is no precision (max length).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Chars
has an optimized implementation of Iterator::advance_by
, but CharIndices
does not, I added an implementation to take advantage of that. This has the nice benefit of not bumping the offset as an induction variable, which I was trying to avoid manually before by just using Chars
.
Using .nth()
introduces an unnecessary .next()
and .take(n).count()
does a fold and iterates by one, so I use .char_indices().advance_by(n)
, which I think does the minimum possible with this API.
I've changed it to switch between .char_indices().advance_by(n)
and .chars().count()
, depending on whether truncation is needed. It would be interesting to benchmark .char_indices().advance_by(usize::MAX)
against .chars().count()
.
1d26e00
to
73030e7
Compare
I added a benchmark, and Also, whether the
|
When both width and precision flags are specified, then the character width is counted twice. Instead, record the character width when truncating it to the precision, so it does not need to be recomputed. Simplify control flow so the cases are more clear.
e5f6852
to
0ca1c9c
Compare
@m-ou-se Hey, friendly ping. Would you mind taking a look at the changes since your review? Thanks! |
Looks good. Thanks for working on this! @bors r+ |
…nt, r=m-ou-se Count char width at most once in `Formatter::pad` When both width and precision flags are specified, then `Formatter::pad` counts the character width twice. Instead, record the character width when truncating it to the precision, so it does not need to be recomputed. Simplify control flow so the cases are more clear. Related: - 6c9e708 (`fmt::Formatter::pad`: don't call chars().count() more than one time, 2021-09-01): Reduce counting chars from thrice to twice in worst case - ede39ae (feat: reinterpret `precision` field for strings, 2016-06-29): Change meaning of precision for strings - b820748 (Implement formatting arguments for strings and integers, 2013-08-10): Implement `Formatter::pad`
Rollup of 10 pull requests Successful merges: - rust-lang#134063 (dec2flt: Clean up float parsing modules) - rust-lang#136662 (Count char width at most once in `Formatter::pad`) - rust-lang#137011 (Promote ohos targets to tier2 with host tools.) - rust-lang#137077 (Postprocess bootstrap metrics into GitHub job summary) - rust-lang#137327 (Undeprecate env::home_dir) - rust-lang#137373 (Compile run-make-support and run-make tests with the bootstrap compiler) - rust-lang#137463 ([illumos] attempt to use posix_spawn to spawn processes) - rust-lang#137477 (uefi: Add Service Binding Protocol abstraction) - rust-lang#137569 (Stablize `string_extend_from_within`) - rust-lang#137667 (Add `dist::Gcc` build step) r? `@ghost` `@rustbot` modify labels: rollup
…nt, r=m-ou-se Count char width at most once in `Formatter::pad` When both width and precision flags are specified, then `Formatter::pad` counts the character width twice. Instead, record the character width when truncating it to the precision, so it does not need to be recomputed. Simplify control flow so the cases are more clear. Related: - 6c9e708 (`fmt::Formatter::pad`: don't call chars().count() more than one time, 2021-09-01): Reduce counting chars from thrice to twice in worst case - ede39ae (feat: reinterpret `precision` field for strings, 2016-06-29): Change meaning of precision for strings - b820748 (Implement formatting arguments for strings and integers, 2013-08-10): Implement `Formatter::pad`
…kingjubilee Rollup of 25 pull requests Successful merges: - rust-lang#134063 (dec2flt: Clean up float parsing modules) - rust-lang#136581 (Retire the legacy `Makefile`-based `run-make` test infra) - rust-lang#136662 (Count char width at most once in `Formatter::pad`) - rust-lang#136798 (Added documentation for flushing per rust-lang#74348) - rust-lang#137240 (Slightly reformat `std::fs::remove_dir_all` error docs) - rust-lang#137303 (Remove `MaybeForgetReturn` suggestion) - rust-lang#137327 (Undeprecate env::home_dir) - rust-lang#137463 ([illumos] attempt to use posix_spawn to spawn processes) - rust-lang#137477 (uefi: Add Service Binding Protocol abstraction) - rust-lang#137565 (Try to point of macro expansion from resolver and method errors if it involves macro var) - rust-lang#137569 (Stabilize `string_extend_from_within`) - rust-lang#137612 (Update bootstrap to edition 2024) - rust-lang#137633 (Only use implied bounds hack if bevy, and use deeply normalize in implied bounds hack) - rust-lang#137643 (Add DWARF test case for non-C-like `repr128` enums) - rust-lang#137679 (Various coretests improvements) - rust-lang#137723 (Make `rust.description` more general-purpose and pass `CFG_VER_DESCRIPTION`) - rust-lang#137758 (fix usage of ty decl macro fragments in attributes) - rust-lang#137764 (Ensure that negative auto impls are always applicable) - rust-lang#137772 (Fix char count in `Display` for `ByteStr`) - rust-lang#137798 (ci: use ubuntu 24 on arm large runner) - rust-lang#137805 (adjust Layout debug printing to match the internal field name) - rust-lang#137808 (Do not require that unsafe fields lack drop glue) - rust-lang#137820 (Clarify why InhabitedPredicate::instantiate_opt exists) - rust-lang#137825 (Provide more context on resolve error caused from incorrect RTN) - rust-lang#138028 (compiler: add `ExternAbi::is_rustic_abi`) r? `@ghost` `@rustbot` modify labels: rollup
Rollup of 20 pull requests Successful merges: - rust-lang#134063 (dec2flt: Clean up float parsing modules) - rust-lang#136581 (Retire the legacy `Makefile`-based `run-make` test infra) - rust-lang#136662 (Count char width at most once in `Formatter::pad`) - rust-lang#136764 (Make `ptr_cast_add_auto_to_object` lint into hard error) - rust-lang#136798 (Added documentation for flushing per rust-lang#74348) - rust-lang#136865 (Perform deeper compiletest path normalization for `$TEST_BUILD_DIR` to account for compare-mode/debugger cases, and normalize long type file filename hashes) - rust-lang#136975 (Look for `python3` first on MacOS, not `py`) - rust-lang#136977 (Upload Datadog metrics with citool) - rust-lang#137240 (Slightly reformat `std::fs::remove_dir_all` error docs) - rust-lang#137298 (Check signature WF when lowering MIR body) - rust-lang#137463 ([illumos] attempt to use posix_spawn to spawn processes) - rust-lang#137477 (uefi: Add Service Binding Protocol abstraction) - rust-lang#137569 (Stabilize `string_extend_from_within`) - rust-lang#137633 (Only use implied bounds hack if bevy, and use deeply normalize in implied bounds hack) - rust-lang#137679 (Various coretests improvements) - rust-lang#137723 (Make `rust.description` more general-purpose and pass `CFG_VER_DESCRIPTION`) - rust-lang#137728 (Remove unsizing coercions for tuples) - rust-lang#137731 (Resume one waiter at once in deadlock handler) - rust-lang#137875 (mir_build: Integrate "simplification" steps into match-pair-tree creation) - rust-lang#138028 (compiler: add `ExternAbi::is_rustic_abi`) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of rust-lang#136662 - thaliaarchi:formatter-pad-char-count, r=m-ou-se Count char width at most once in `Formatter::pad` When both width and precision flags are specified, then `Formatter::pad` counts the character width twice. Instead, record the character width when truncating it to the precision, so it does not need to be recomputed. Simplify control flow so the cases are more clear. Related: - 6c9e708 (`fmt::Formatter::pad`: don't call chars().count() more than one time, 2021-09-01): Reduce counting chars from thrice to twice in worst case - ede39ae (feat: reinterpret `precision` field for strings, 2016-06-29): Change meaning of precision for strings - b820748 (Implement formatting arguments for strings and integers, 2013-08-10): Implement `Formatter::pad`
When both width and precision flags are specified, then
Formatter::pad
counts the character width twice. Instead, record the character width when truncating it to the precision, so it does not need to be recomputed. Simplify control flow so the cases are more clear.Related:
fmt::Formatter::pad
: don't call chars().count() more than one time, 2021-09-01): Reduce counting chars from thrice to twice in worst caseprecision
field for strings, 2016-06-29): Change meaning of precision for stringsFormatter::pad