Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Count char width at most once in Formatter::pad #136662

Merged
merged 1 commit into from
Mar 5, 2025

Conversation

thaliaarchi
Copy link
Contributor

@thaliaarchi thaliaarchi commented Feb 6, 2025

When both width and precision flags are specified, then Formatter::pad counts the character width twice. Instead, record the character width when truncating it to the precision, so it does not need to be recomputed. Simplify control flow so the cases are more clear.

Related:

  • 6c9e708 (fmt::Formatter::pad: don't call chars().count() more than one time, 2021-09-01): Reduce counting chars from thrice to twice in worst case
  • ede39ae (feat: reinterpret precision field for strings, 2016-06-29): Change meaning of precision for strings
  • b820748 (Implement formatting arguments for strings and integers, 2013-08-10): Implement Formatter::pad

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Feb 6, 2025
@thaliaarchi thaliaarchi force-pushed the formatter-pad-char-count branch 2 times, most recently from 2069815 to 1d26e00 Compare February 7, 2025 06:28
@alexcrichton
Copy link
Member

Hello! Thanks for the ping, but it's also been ~12 years since I last wrote this so I'm probably no longer any better than anyone else per se to take a look at this. I'm going to reroll this to someone else on the libs team:

r? libs

@rustbot rustbot assigned Amanieu and unassigned alexcrichton Feb 9, 2025
@thaliaarchi
Copy link
Contributor Author

Thanks for the gracious reply, Alex.

@Amanieu, would you mind taking a look at this?

@Amanieu
Copy link
Member

Amanieu commented Feb 13, 2025

r? @m-ou-se

@rustbot rustbot assigned m-ou-se and unassigned Amanieu Feb 13, 2025
Comment on lines 1704 to 1705
let mut iter = s.chars();
let char_count = iter.by_ref().take(max_char_count).count();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s.chars().count() has a special implementation (see https://doc.rust-lang.org/1.84.0/src/core/str/iter.rs.html#46 and https://doc.rust-lang.org/1.84.0/src/core/str/count.rs.html) that is much faster than iterating through all the chars one by one.

s.chars().by_ref().take(_).count() on the other hand goes through the Take adapter, which does not use this optimized counting implementation.

This means that with this change, even a simple println!("{}", some_string) will now pull in the code for UTF8 decoding, blowing up binary size. And it'll be a bit slower, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though perhaps s.char_indices().nth(max) already pulled in that code anyway. That's still an opportunity for improvement.

Regardless, we should use the optimized char counting algorithm when there is no precision (max length).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Chars has an optimized implementation of Iterator::advance_by, but CharIndices does not, I added an implementation to take advantage of that. This has the nice benefit of not bumping the offset as an induction variable, which I was trying to avoid manually before by just using Chars.

Using .nth() introduces an unnecessary .next() and .take(n).count() does a fold and iterates by one, so I use .char_indices().advance_by(n), which I think does the minimum possible with this API.

I've changed it to switch between .char_indices().advance_by(n) and .chars().count(), depending on whether truncation is needed. It would be interesting to benchmark .char_indices().advance_by(usize::MAX) against .chars().count().

@thaliaarchi thaliaarchi force-pushed the formatter-pad-char-count branch from 1d26e00 to 73030e7 Compare February 20, 2025 00:10
@thaliaarchi
Copy link
Contributor Author

thaliaarchi commented Feb 20, 2025

I added a benchmark, and .chars().advance_by(usize::MAX) compares favorably against .chars().count().

Also, whether the advance_by argument is a constant usize::MAX or black_boxed makes no difference in performance.

x bench library/coretests --stage 1 --test-args str::char_count
str::char_count::emoji_huge::case00_chars_count                26895.51ns/iter  +/- 69143.99
str::char_count::emoji_huge::case01_chars_advance_by           39908.10ns/iter   +/- 4735.18
str::char_count::emoji_huge::case02_filter_count_cont_bytes   143432.06ns/iter   +/- 6447.94
str::char_count::emoji_huge::case03_iter_chars_increment      294685.87ns/iter  +/- 19699.59
str::char_count::emoji_huge::case04_manual_char_len           294636.37ns/iter  +/- 20001.07
str::char_count::emoji_large::case00_chars_count                 374.25ns/iter     +/- 66.28
str::char_count::emoji_large::case01_chars_advance_by            586.73ns/iter    +/- 135.90
str::char_count::emoji_large::case02_filter_count_cont_bytes    2226.21ns/iter    +/- 104.03
str::char_count::emoji_large::case03_iter_chars_increment       4584.05ns/iter    +/- 270.57
str::char_count::emoji_large::case04_manual_char_len            4585.11ns/iter    +/- 258.49
str::char_count::emoji_medium::case00_chars_count                 60.87ns/iter     +/- 26.87
str::char_count::emoji_medium::case01_chars_advance_by            78.37ns/iter      +/- 6.77
str::char_count::emoji_medium::case02_filter_count_cont_bytes    288.41ns/iter     +/- 25.85
str::char_count::emoji_medium::case03_iter_chars_increment       583.22ns/iter     +/- 21.35
str::char_count::emoji_medium::case04_manual_char_len            589.36ns/iter     +/- 33.22
str::char_count::emoji_small::case00_chars_count                  21.08ns/iter      +/- 1.09
str::char_count::emoji_small::case01_chars_advance_by             12.49ns/iter      +/- 0.75
str::char_count::emoji_small::case02_filter_count_cont_bytes      29.74ns/iter      +/- 2.08
str::char_count::emoji_small::case03_iter_chars_increment         36.68ns/iter      +/- 2.66
str::char_count::emoji_small::case04_manual_char_len              37.47ns/iter      +/- 1.30
str::char_count::emoji_tiny::case00_chars_count                    7.77ns/iter      +/- 1.00
str::char_count::emoji_tiny::case01_chars_advance_by               5.81ns/iter      +/- 0.68
str::char_count::emoji_tiny::case02_filter_count_cont_bytes        6.24ns/iter      +/- 0.71
str::char_count::emoji_tiny::case03_iter_chars_increment           3.44ns/iter      +/- 0.16
str::char_count::emoji_tiny::case04_manual_char_len                3.87ns/iter      +/- 0.80
str::char_count::en_huge::case00_chars_count                   25270.06ns/iter   +/- 3517.67
str::char_count::en_huge::case01_chars_advance_by              36994.25ns/iter   +/- 8664.94
str::char_count::en_huge::case02_filter_count_cont_bytes      136098.38ns/iter   +/- 3237.48
str::char_count::en_huge::case03_iter_chars_increment         190540.45ns/iter +/- 375994.04
str::char_count::en_huge::case04_manual_char_len              286710.43ns/iter  +/- 33275.76
str::char_count::en_large::case00_chars_count                    355.26ns/iter     +/- 20.44
str::char_count::en_large::case01_chars_advance_by               571.21ns/iter     +/- 17.40
str::char_count::en_large::case02_filter_count_cont_bytes       2123.16ns/iter    +/- 503.87
str::char_count::en_large::case03_iter_chars_increment          2921.19ns/iter    +/- 356.64
str::char_count::en_large::case04_manual_char_len               4431.56ns/iter    +/- 536.13
str::char_count::en_medium::case00_chars_count                    55.92ns/iter      +/- 8.05
str::char_count::en_medium::case01_chars_advance_by               73.93ns/iter      +/- 4.66
str::char_count::en_medium::case02_filter_count_cont_bytes       282.48ns/iter    +/- 139.73
str::char_count::en_medium::case03_iter_chars_increment          377.55ns/iter     +/- 72.04
str::char_count::en_medium::case04_manual_char_len               556.65ns/iter     +/- 66.79
str::char_count::en_small::case00_chars_count                     18.46ns/iter      +/- 0.93
str::char_count::en_small::case01_chars_advance_by                12.49ns/iter      +/- 6.43
str::char_count::en_small::case02_filter_count_cont_bytes         18.91ns/iter      +/- 0.52
str::char_count::en_small::case03_iter_chars_increment            19.19ns/iter      +/- 1.00
str::char_count::en_small::case04_manual_char_len                 41.81ns/iter     +/- 20.10
str::char_count::en_tiny::case00_chars_count                       7.78ns/iter      +/- 3.34
str::char_count::en_tiny::case01_chars_advance_by                 13.17ns/iter      +/- 0.98
str::char_count::en_tiny::case02_filter_count_cont_bytes           6.23ns/iter      +/- 0.24
str::char_count::en_tiny::case03_iter_chars_increment              5.34ns/iter      +/- 0.79
str::char_count::en_tiny::case04_manual_char_len                   8.90ns/iter      +/- 0.50
str::char_count::ru_huge::case00_chars_count                   23521.49ns/iter   +/- 4525.93
str::char_count::ru_huge::case01_chars_advance_by              34957.96ns/iter  +/- 16442.47
str::char_count::ru_huge::case02_filter_count_cont_bytes      128222.60ns/iter   +/- 9503.80
str::char_count::ru_huge::case03_iter_chars_increment         176097.38ns/iter  +/- 26276.12
str::char_count::ru_huge::case04_manual_char_len              231660.15ns/iter  +/- 28604.21
str::char_count::ru_large::case00_chars_count                    337.69ns/iter     +/- 32.97
str::char_count::ru_large::case01_chars_advance_by               565.93ns/iter     +/- 45.96
str::char_count::ru_large::case02_filter_count_cont_bytes       1989.00ns/iter     +/- 53.45
str::char_count::ru_large::case03_iter_chars_increment          2709.66ns/iter    +/- 241.61
str::char_count::ru_large::case04_manual_char_len               3510.50ns/iter    +/- 617.17
str::char_count::ru_medium::case00_chars_count                    56.29ns/iter      +/- 3.25
str::char_count::ru_medium::case01_chars_advance_by              117.97ns/iter      +/- 3.20
str::char_count::ru_medium::case02_filter_count_cont_bytes       262.44ns/iter     +/- 69.23
str::char_count::ru_medium::case03_iter_chars_increment          341.12ns/iter    +/- 157.75
str::char_count::ru_medium::case04_manual_char_len               395.95ns/iter    +/- 730.95
str::char_count::ru_small::case00_chars_count                     16.23ns/iter      +/- 0.95
str::char_count::ru_small::case01_chars_advance_by                 7.02ns/iter      +/- 0.23
str::char_count::ru_small::case02_filter_count_cont_bytes         16.08ns/iter      +/- 1.56
str::char_count::ru_small::case03_iter_chars_increment            17.23ns/iter      +/- 1.29
str::char_count::ru_small::case04_manual_char_len                 17.67ns/iter      +/- 1.08
str::char_count::ru_tiny::case00_chars_count                       8.52ns/iter      +/- 0.40
str::char_count::ru_tiny::case01_chars_advance_by                  8.51ns/iter      +/- 1.06
str::char_count::ru_tiny::case02_filter_count_cont_bytes           7.39ns/iter      +/- 0.40
str::char_count::ru_tiny::case03_iter_chars_increment              6.33ns/iter      +/- 0.82
str::char_count::ru_tiny::case04_manual_char_len                   6.79ns/iter      +/- 0.59
str::char_count::zh_huge::case00_chars_count                   21915.29ns/iter   +/- 1553.56
str::char_count::zh_huge::case01_chars_advance_by              32533.42ns/iter   +/- 2794.91
str::char_count::zh_huge::case02_filter_count_cont_bytes      119467.06ns/iter   +/- 2970.87
str::char_count::zh_huge::case03_iter_chars_increment         319110.20ns/iter  +/- 15128.61
str::char_count::zh_huge::case04_manual_char_len              316619.37ns/iter  +/- 14316.98
str::char_count::zh_large::case00_chars_count                    316.41ns/iter     +/- 45.45
str::char_count::zh_large::case01_chars_advance_by               492.66ns/iter     +/- 20.85
str::char_count::zh_large::case02_filter_count_cont_bytes       1850.94ns/iter     +/- 70.57
str::char_count::zh_large::case03_iter_chars_increment          4927.22ns/iter    +/- 145.58
str::char_count::zh_large::case04_manual_char_len               4944.76ns/iter    +/- 480.41
str::char_count::zh_medium::case00_chars_count                    55.57ns/iter      +/- 5.57
str::char_count::zh_medium::case01_chars_advance_by               67.27ns/iter      +/- 1.85
str::char_count::zh_medium::case02_filter_count_cont_bytes       242.38ns/iter     +/- 20.20
str::char_count::zh_medium::case03_iter_chars_increment          599.00ns/iter     +/- 57.02
str::char_count::zh_medium::case04_manual_char_len               593.71ns/iter     +/- 19.71
str::char_count::zh_small::case00_chars_count                     19.19ns/iter      +/- 0.66
str::char_count::zh_small::case01_chars_advance_by                 9.03ns/iter      +/- 3.14
str::char_count::zh_small::case02_filter_count_cont_bytes         17.37ns/iter      +/- 1.16
str::char_count::zh_small::case03_iter_chars_increment            23.59ns/iter      +/- 0.93
str::char_count::zh_small::case04_manual_char_len                 24.84ns/iter      +/- 1.11
str::char_count::zh_tiny::case00_chars_count                       8.12ns/iter      +/- 0.32
str::char_count::zh_tiny::case01_chars_advance_by                  7.63ns/iter      +/- 0.48
str::char_count::zh_tiny::case02_filter_count_cont_bytes           6.65ns/iter      +/- 0.44
str::char_count::zh_tiny::case03_iter_chars_increment              5.00ns/iter      +/- 1.01
str::char_count::zh_tiny::case04_manual_char_len                   5.26ns/iter      +/- 0.96

@thaliaarchi thaliaarchi requested a review from m-ou-se February 26, 2025 19:21
When both width and precision flags are specified, then the character
width is counted twice. Instead, record the character width when
truncating it to the precision, so it does not need to be recomputed.
Simplify control flow so the cases are more clear.
@thaliaarchi thaliaarchi force-pushed the formatter-pad-char-count branch from e5f6852 to 0ca1c9c Compare February 28, 2025 00:42
@thaliaarchi
Copy link
Contributor Author

At @orlp's suggestion, I've split out the CharIndices::advance_by optimization into #137761.

@thaliaarchi
Copy link
Contributor Author

@m-ou-se Hey, friendly ping. Would you mind taking a look at the changes since your review? Thanks!

@m-ou-se
Copy link
Member

m-ou-se commented Mar 4, 2025

Looks good. Thanks for working on this!

@bors r+

@bors
Copy link
Contributor

bors commented Mar 4, 2025

📌 Commit 0ca1c9c has been approved by m-ou-se

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 4, 2025
jieyouxu added a commit to jieyouxu/rust that referenced this pull request Mar 4, 2025
…nt, r=m-ou-se

Count char width at most once in `Formatter::pad`

When both width and precision flags are specified, then `Formatter::pad` counts the character width twice. Instead, record the character width when truncating it to the precision, so it does not need to be recomputed. Simplify control flow so the cases are more clear.

Related:
- 6c9e708 (`fmt::Formatter::pad`: don't call chars().count() more than one time, 2021-09-01): Reduce counting chars from thrice to twice in worst case
- ede39ae (feat: reinterpret `precision` field for strings, 2016-06-29): Change meaning of precision for strings
- b820748 (Implement formatting arguments for strings and integers, 2013-08-10): Implement `Formatter::pad`
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 4, 2025
Rollup of 10 pull requests

Successful merges:

 - rust-lang#134063 (dec2flt: Clean up float parsing modules)
 - rust-lang#136662 (Count char width at most once in `Formatter::pad`)
 - rust-lang#137011 (Promote ohos targets to tier2 with host tools.)
 - rust-lang#137077 (Postprocess bootstrap metrics into GitHub job summary)
 - rust-lang#137327 (Undeprecate env::home_dir)
 - rust-lang#137373 (Compile run-make-support and run-make tests with the bootstrap compiler)
 - rust-lang#137463 ([illumos] attempt to use posix_spawn to spawn processes)
 - rust-lang#137477 (uefi: Add Service Binding Protocol abstraction)
 - rust-lang#137569 (Stablize `string_extend_from_within`)
 - rust-lang#137667 (Add `dist::Gcc` build step)

r? `@ghost`
`@rustbot` modify labels: rollup
workingjubilee added a commit to workingjubilee/rustc that referenced this pull request Mar 5, 2025
…nt, r=m-ou-se

Count char width at most once in `Formatter::pad`

When both width and precision flags are specified, then `Formatter::pad` counts the character width twice. Instead, record the character width when truncating it to the precision, so it does not need to be recomputed. Simplify control flow so the cases are more clear.

Related:
- 6c9e708 (`fmt::Formatter::pad`: don't call chars().count() more than one time, 2021-09-01): Reduce counting chars from thrice to twice in worst case
- ede39ae (feat: reinterpret `precision` field for strings, 2016-06-29): Change meaning of precision for strings
- b820748 (Implement formatting arguments for strings and integers, 2013-08-10): Implement `Formatter::pad`
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 5, 2025
…kingjubilee

Rollup of 25 pull requests

Successful merges:

 - rust-lang#134063 (dec2flt: Clean up float parsing modules)
 - rust-lang#136581 (Retire the legacy `Makefile`-based `run-make` test infra)
 - rust-lang#136662 (Count char width at most once in `Formatter::pad`)
 - rust-lang#136798 (Added documentation for flushing per rust-lang#74348)
 - rust-lang#137240 (Slightly reformat `std::fs::remove_dir_all` error docs)
 - rust-lang#137303 (Remove `MaybeForgetReturn` suggestion)
 - rust-lang#137327 (Undeprecate env::home_dir)
 - rust-lang#137463 ([illumos] attempt to use posix_spawn to spawn processes)
 - rust-lang#137477 (uefi: Add Service Binding Protocol abstraction)
 - rust-lang#137565 (Try to point of macro expansion from resolver and method errors if it involves macro var)
 - rust-lang#137569 (Stabilize `string_extend_from_within`)
 - rust-lang#137612 (Update bootstrap to edition 2024)
 - rust-lang#137633 (Only use implied bounds hack if bevy, and use deeply normalize in implied bounds hack)
 - rust-lang#137643 (Add DWARF test case for non-C-like `repr128` enums)
 - rust-lang#137679 (Various coretests improvements)
 - rust-lang#137723 (Make `rust.description` more general-purpose and pass `CFG_VER_DESCRIPTION`)
 - rust-lang#137758 (fix usage of ty decl macro fragments in attributes)
 - rust-lang#137764 (Ensure that negative auto impls are always applicable)
 - rust-lang#137772 (Fix char count in `Display` for `ByteStr`)
 - rust-lang#137798 (ci: use ubuntu 24 on arm large runner)
 - rust-lang#137805 (adjust Layout debug printing to match the internal field name)
 - rust-lang#137808 (Do not require that unsafe fields lack drop glue)
 - rust-lang#137820 (Clarify why InhabitedPredicate::instantiate_opt exists)
 - rust-lang#137825 (Provide more context on resolve error caused from incorrect RTN)
 - rust-lang#138028 (compiler: add `ExternAbi::is_rustic_abi`)

r? `@ghost`
`@rustbot` modify labels: rollup
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 5, 2025
Rollup of 20 pull requests

Successful merges:

 - rust-lang#134063 (dec2flt: Clean up float parsing modules)
 - rust-lang#136581 (Retire the legacy `Makefile`-based `run-make` test infra)
 - rust-lang#136662 (Count char width at most once in `Formatter::pad`)
 - rust-lang#136764 (Make `ptr_cast_add_auto_to_object` lint into hard error)
 - rust-lang#136798 (Added documentation for flushing per rust-lang#74348)
 - rust-lang#136865 (Perform deeper compiletest path normalization for `$TEST_BUILD_DIR` to account for compare-mode/debugger cases, and normalize long type file filename hashes)
 - rust-lang#136975 (Look for `python3` first on MacOS, not `py`)
 - rust-lang#136977 (Upload Datadog metrics with citool)
 - rust-lang#137240 (Slightly reformat `std::fs::remove_dir_all` error docs)
 - rust-lang#137298 (Check signature WF when lowering MIR body)
 - rust-lang#137463 ([illumos] attempt to use posix_spawn to spawn processes)
 - rust-lang#137477 (uefi: Add Service Binding Protocol abstraction)
 - rust-lang#137569 (Stabilize `string_extend_from_within`)
 - rust-lang#137633 (Only use implied bounds hack if bevy, and use deeply normalize in implied bounds hack)
 - rust-lang#137679 (Various coretests improvements)
 - rust-lang#137723 (Make `rust.description` more general-purpose and pass `CFG_VER_DESCRIPTION`)
 - rust-lang#137728 (Remove unsizing coercions for tuples)
 - rust-lang#137731 (Resume one waiter at once in deadlock handler)
 - rust-lang#137875 (mir_build: Integrate "simplification" steps into match-pair-tree creation)
 - rust-lang#138028 (compiler: add `ExternAbi::is_rustic_abi`)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 1b9b515 into rust-lang:master Mar 5, 2025
6 checks passed
@rustbot rustbot added this to the 1.87.0 milestone Mar 5, 2025
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Mar 5, 2025
Rollup merge of rust-lang#136662 - thaliaarchi:formatter-pad-char-count, r=m-ou-se

Count char width at most once in `Formatter::pad`

When both width and precision flags are specified, then `Formatter::pad` counts the character width twice. Instead, record the character width when truncating it to the precision, so it does not need to be recomputed. Simplify control flow so the cases are more clear.

Related:
- 6c9e708 (`fmt::Formatter::pad`: don't call chars().count() more than one time, 2021-09-01): Reduce counting chars from thrice to twice in worst case
- ede39ae (feat: reinterpret `precision` field for strings, 2016-06-29): Change meaning of precision for strings
- b820748 (Implement formatting arguments for strings and integers, 2013-08-10): Implement `Formatter::pad`
@thaliaarchi thaliaarchi deleted the formatter-pad-char-count branch March 5, 2025 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants