Count char width at most once in `Formatter::pad` #136662

thaliaarchi · 2025-02-06T21:39:08Z

When both width and precision flags are specified, then Formatter::pad counts the character width twice. Instead, record the character width when truncating it to the precision, so it does not need to be recomputed. Simplify control flow so the cases are more clear.

6c9e708 (fmt::Formatter::pad: don't call chars().count() more than one time, 2021-09-01): Reduce counting chars from thrice to twice in worst case
ede39ae (feat: reinterpret precision field for strings, 2016-06-29): Change meaning of precision for strings
b820748 (Implement formatting arguments for strings and integers, 2013-08-10): Implement Formatter::pad

alexcrichton · 2025-02-09T03:01:22Z

Hello! Thanks for the ping, but it's also been ~12 years since I last wrote this so I'm probably no longer any better than anyone else per se to take a look at this. I'm going to reroll this to someone else on the libs team:

r? libs

thaliaarchi · 2025-02-13T19:30:40Z

Thanks for the gracious reply, Alex.

@Amanieu, would you mind taking a look at this?

Amanieu · 2025-02-13T23:05:46Z

r? @m-ou-se

m-ou-se · 2025-02-19T11:35:49Z

library/core/src/fmt/mod.rs

+        let mut iter = s.chars();
+        let char_count = iter.by_ref().take(max_char_count).count();


s.chars().count() has a special implementation (see https://doc.rust-lang.org/1.84.0/src/core/str/iter.rs.html#46 and https://doc.rust-lang.org/1.84.0/src/core/str/count.rs.html) that is much faster than iterating through all the chars one by one.

s.chars().by_ref().take(_).count() on the other hand goes through the Take adapter, which does not use this optimized counting implementation.

This means that with this change, even a simple println!("{}", some_string) will now pull in the code for UTF8 decoding, blowing up binary size. And it'll be a bit slower, too.

Though perhaps s.char_indices().nth(max) already pulled in that code anyway. That's still an opportunity for improvement.

Regardless, we should use the optimized char counting algorithm when there is no precision (max length).

Since Chars has an optimized implementation of Iterator::advance_by, but CharIndices does not, I added an implementation to take advantage of that. This has the nice benefit of not bumping the offset as an induction variable, which I was trying to avoid manually before by just using Chars.

Using .nth() introduces an unnecessary .next() and .take(n).count() does a fold and iterates by one, so I use .char_indices().advance_by(n), which I think does the minimum possible with this API.

I've changed it to switch between .char_indices().advance_by(n) and .chars().count(), depending on whether truncation is needed. It would be interesting to benchmark .char_indices().advance_by(usize::MAX) against .chars().count().

thaliaarchi · 2025-02-20T01:25:52Z

I added a benchmark, and .chars().advance_by(usize::MAX) compares favorably against .chars().count().

Also, whether the advance_by argument is a constant usize::MAX or black_boxed makes no difference in performance.

x bench library/coretests --stage 1 --test-args str::char_count

str::char_count::emoji_huge::case00_chars_count                26895.51ns/iter  +/- 69143.99
str::char_count::emoji_huge::case01_chars_advance_by           39908.10ns/iter   +/- 4735.18
str::char_count::emoji_huge::case02_filter_count_cont_bytes   143432.06ns/iter   +/- 6447.94
str::char_count::emoji_huge::case03_iter_chars_increment      294685.87ns/iter  +/- 19699.59
str::char_count::emoji_huge::case04_manual_char_len           294636.37ns/iter  +/- 20001.07
str::char_count::emoji_large::case00_chars_count                 374.25ns/iter     +/- 66.28
str::char_count::emoji_large::case01_chars_advance_by            586.73ns/iter    +/- 135.90
str::char_count::emoji_large::case02_filter_count_cont_bytes    2226.21ns/iter    +/- 104.03
str::char_count::emoji_large::case03_iter_chars_increment       4584.05ns/iter    +/- 270.57
str::char_count::emoji_large::case04_manual_char_len            4585.11ns/iter    +/- 258.49
str::char_count::emoji_medium::case00_chars_count                 60.87ns/iter     +/- 26.87
str::char_count::emoji_medium::case01_chars_advance_by            78.37ns/iter      +/- 6.77
str::char_count::emoji_medium::case02_filter_count_cont_bytes    288.41ns/iter     +/- 25.85
str::char_count::emoji_medium::case03_iter_chars_increment       583.22ns/iter     +/- 21.35
str::char_count::emoji_medium::case04_manual_char_len            589.36ns/iter     +/- 33.22
str::char_count::emoji_small::case00_chars_count                  21.08ns/iter      +/- 1.09
str::char_count::emoji_small::case01_chars_advance_by             12.49ns/iter      +/- 0.75
str::char_count::emoji_small::case02_filter_count_cont_bytes      29.74ns/iter      +/- 2.08
str::char_count::emoji_small::case03_iter_chars_increment         36.68ns/iter      +/- 2.66
str::char_count::emoji_small::case04_manual_char_len              37.47ns/iter      +/- 1.30
str::char_count::emoji_tiny::case00_chars_count                    7.77ns/iter      +/- 1.00
str::char_count::emoji_tiny::case01_chars_advance_by               5.81ns/iter      +/- 0.68
str::char_count::emoji_tiny::case02_filter_count_cont_bytes        6.24ns/iter      +/- 0.71
str::char_count::emoji_tiny::case03_iter_chars_increment           3.44ns/iter      +/- 0.16
str::char_count::emoji_tiny::case04_manual_char_len                3.87ns/iter      +/- 0.80
str::char_count::en_huge::case00_chars_count                   25270.06ns/iter   +/- 3517.67
str::char_count::en_huge::case01_chars_advance_by              36994.25ns/iter   +/- 8664.94
str::char_count::en_huge::case02_filter_count_cont_bytes      136098.38ns/iter   +/- 3237.48
str::char_count::en_huge::case03_iter_chars_increment         190540.45ns/iter +/- 375994.04
str::char_count::en_huge::case04_manual_char_len              286710.43ns/iter  +/- 33275.76
str::char_count::en_large::case00_chars_count                    355.26ns/iter     +/- 20.44
str::char_count::en_large::case01_chars_advance_by               571.21ns/iter     +/- 17.40
str::char_count::en_large::case02_filter_count_cont_bytes       2123.16ns/iter    +/- 503.87
str::char_count::en_large::case03_iter_chars_increment          2921.19ns/iter    +/- 356.64
str::char_count::en_large::case04_manual_char_len               4431.56ns/iter    +/- 536.13
str::char_count::en_medium::case00_chars_count                    55.92ns/iter      +/- 8.05
str::char_count::en_medium::case01_chars_advance_by               73.93ns/iter      +/- 4.66
str::char_count::en_medium::case02_filter_count_cont_bytes       282.48ns/iter    +/- 139.73
str::char_count::en_medium::case03_iter_chars_increment          377.55ns/iter     +/- 72.04
str::char_count::en_medium::case04_manual_char_len               556.65ns/iter     +/- 66.79
str::char_count::en_small::case00_chars_count                     18.46ns/iter      +/- 0.93
str::char_count::en_small::case01_chars_advance_by                12.49ns/iter      +/- 6.43
str::char_count::en_small::case02_filter_count_cont_bytes         18.91ns/iter      +/- 0.52
str::char_count::en_small::case03_iter_chars_increment            19.19ns/iter      +/- 1.00
str::char_count::en_small::case04_manual_char_len                 41.81ns/iter     +/- 20.10
str::char_count::en_tiny::case00_chars_count                       7.78ns/iter      +/- 3.34
str::char_count::en_tiny::case01_chars_advance_by                 13.17ns/iter      +/- 0.98
str::char_count::en_tiny::case02_filter_count_cont_bytes           6.23ns/iter      +/- 0.24
str::char_count::en_tiny::case03_iter_chars_increment              5.34ns/iter      +/- 0.79
str::char_count::en_tiny::case04_manual_char_len                   8.90ns/iter      +/- 0.50
str::char_count::ru_huge::case00_chars_count                   23521.49ns/iter   +/- 4525.93
str::char_count::ru_huge::case01_chars_advance_by              34957.96ns/iter  +/- 16442.47
str::char_count::ru_huge::case02_filter_count_cont_bytes      128222.60ns/iter   +/- 9503.80
str::char_count::ru_huge::case03_iter_chars_increment         176097.38ns/iter  +/- 26276.12
str::char_count::ru_huge::case04_manual_char_len              231660.15ns/iter  +/- 28604.21
str::char_count::ru_large::case00_chars_count                    337.69ns/iter     +/- 32.97
str::char_count::ru_large::case01_chars_advance_by               565.93ns/iter     +/- 45.96
str::char_count::ru_large::case02_filter_count_cont_bytes       1989.00ns/iter     +/- 53.45
str::char_count::ru_large::case03_iter_chars_increment          2709.66ns/iter    +/- 241.61
str::char_count::ru_large::case04_manual_char_len               3510.50ns/iter    +/- 617.17
str::char_count::ru_medium::case00_chars_count                    56.29ns/iter      +/- 3.25
str::char_count::ru_medium::case01_chars_advance_by              117.97ns/iter      +/- 3.20
str::char_count::ru_medium::case02_filter_count_cont_bytes       262.44ns/iter     +/- 69.23
str::char_count::ru_medium::case03_iter_chars_increment          341.12ns/iter    +/- 157.75
str::char_count::ru_medium::case04_manual_char_len               395.95ns/iter    +/- 730.95
str::char_count::ru_small::case00_chars_count                     16.23ns/iter      +/- 0.95
str::char_count::ru_small::case01_chars_advance_by                 7.02ns/iter      +/- 0.23
str::char_count::ru_small::case02_filter_count_cont_bytes         16.08ns/iter      +/- 1.56
str::char_count::ru_small::case03_iter_chars_increment            17.23ns/iter      +/- 1.29
str::char_count::ru_small::case04_manual_char_len                 17.67ns/iter      +/- 1.08
str::char_count::ru_tiny::case00_chars_count                       8.52ns/iter      +/- 0.40
str::char_count::ru_tiny::case01_chars_advance_by                  8.51ns/iter      +/- 1.06
str::char_count::ru_tiny::case02_filter_count_cont_bytes           7.39ns/iter      +/- 0.40
str::char_count::ru_tiny::case03_iter_chars_increment              6.33ns/iter      +/- 0.82
str::char_count::ru_tiny::case04_manual_char_len                   6.79ns/iter      +/- 0.59
str::char_count::zh_huge::case00_chars_count                   21915.29ns/iter   +/- 1553.56
str::char_count::zh_huge::case01_chars_advance_by              32533.42ns/iter   +/- 2794.91
str::char_count::zh_huge::case02_filter_count_cont_bytes      119467.06ns/iter   +/- 2970.87
str::char_count::zh_huge::case03_iter_chars_increment         319110.20ns/iter  +/- 15128.61
str::char_count::zh_huge::case04_manual_char_len              316619.37ns/iter  +/- 14316.98
str::char_count::zh_large::case00_chars_count                    316.41ns/iter     +/- 45.45
str::char_count::zh_large::case01_chars_advance_by               492.66ns/iter     +/- 20.85
str::char_count::zh_large::case02_filter_count_cont_bytes       1850.94ns/iter     +/- 70.57
str::char_count::zh_large::case03_iter_chars_increment          4927.22ns/iter    +/- 145.58
str::char_count::zh_large::case04_manual_char_len               4944.76ns/iter    +/- 480.41
str::char_count::zh_medium::case00_chars_count                    55.57ns/iter      +/- 5.57
str::char_count::zh_medium::case01_chars_advance_by               67.27ns/iter      +/- 1.85
str::char_count::zh_medium::case02_filter_count_cont_bytes       242.38ns/iter     +/- 20.20
str::char_count::zh_medium::case03_iter_chars_increment          599.00ns/iter     +/- 57.02
str::char_count::zh_medium::case04_manual_char_len               593.71ns/iter     +/- 19.71
str::char_count::zh_small::case00_chars_count                     19.19ns/iter      +/- 0.66
str::char_count::zh_small::case01_chars_advance_by                 9.03ns/iter      +/- 3.14
str::char_count::zh_small::case02_filter_count_cont_bytes         17.37ns/iter      +/- 1.16
str::char_count::zh_small::case03_iter_chars_increment            23.59ns/iter      +/- 0.93
str::char_count::zh_small::case04_manual_char_len                 24.84ns/iter      +/- 1.11
str::char_count::zh_tiny::case00_chars_count                       8.12ns/iter      +/- 0.32
str::char_count::zh_tiny::case01_chars_advance_by                  7.63ns/iter      +/- 0.48
str::char_count::zh_tiny::case02_filter_count_cont_bytes           6.65ns/iter      +/- 0.44
str::char_count::zh_tiny::case03_iter_chars_increment              5.00ns/iter      +/- 1.01
str::char_count::zh_tiny::case04_manual_char_len                   5.26ns/iter      +/- 0.96

When both width and precision flags are specified, then the character width is counted twice. Instead, record the character width when truncating it to the precision, so it does not need to be recomputed. Simplify control flow so the cases are more clear.

thaliaarchi · 2025-02-28T00:43:05Z

At @orlp's suggestion, I've split out the CharIndices::advance_by optimization into #137761.

thaliaarchi · 2025-03-03T20:01:20Z

@m-ou-se Hey, friendly ping. Would you mind taking a look at the changes since your review? Thanks!

m-ou-se · 2025-03-04T15:17:47Z

Looks good. Thanks for working on this!

@bors r+

bors · 2025-03-04T15:17:50Z

📌 Commit 0ca1c9c has been approved by m-ou-se

It is now in the queue for this repository.

…nt, r=m-ou-se Count char width at most once in `Formatter::pad` When both width and precision flags are specified, then `Formatter::pad` counts the character width twice. Instead, record the character width when truncating it to the precision, so it does not need to be recomputed. Simplify control flow so the cases are more clear. Related: - 6c9e708 (`fmt::Formatter::pad`: don't call chars().count() more than one time, 2021-09-01): Reduce counting chars from thrice to twice in worst case - ede39ae (feat: reinterpret `precision` field for strings, 2016-06-29): Change meaning of precision for strings - b820748 (Implement formatting arguments for strings and integers, 2013-08-10): Implement `Formatter::pad`

Rollup of 10 pull requests Successful merges: - rust-lang#134063 (dec2flt: Clean up float parsing modules) - rust-lang#136662 (Count char width at most once in `Formatter::pad`) - rust-lang#137011 (Promote ohos targets to tier2 with host tools.) - rust-lang#137077 (Postprocess bootstrap metrics into GitHub job summary) - rust-lang#137327 (Undeprecate env::home_dir) - rust-lang#137373 (Compile run-make-support and run-make tests with the bootstrap compiler) - rust-lang#137463 ([illumos] attempt to use posix_spawn to spawn processes) - rust-lang#137477 (uefi: Add Service Binding Protocol abstraction) - rust-lang#137569 (Stablize `string_extend_from_within`) - rust-lang#137667 (Add `dist::Gcc` build step) r? `@ghost` `@rustbot` modify labels: rollup

…nt, r=m-ou-se Count char width at most once in `Formatter::pad` When both width and precision flags are specified, then `Formatter::pad` counts the character width twice. Instead, record the character width when truncating it to the precision, so it does not need to be recomputed. Simplify control flow so the cases are more clear. Related: - 6c9e708 (`fmt::Formatter::pad`: don't call chars().count() more than one time, 2021-09-01): Reduce counting chars from thrice to twice in worst case - ede39ae (feat: reinterpret `precision` field for strings, 2016-06-29): Change meaning of precision for strings - b820748 (Implement formatting arguments for strings and integers, 2013-08-10): Implement `Formatter::pad`

…kingjubilee Rollup of 25 pull requests Successful merges: - rust-lang#134063 (dec2flt: Clean up float parsing modules) - rust-lang#136581 (Retire the legacy `Makefile`-based `run-make` test infra) - rust-lang#136662 (Count char width at most once in `Formatter::pad`) - rust-lang#136798 (Added documentation for flushing per rust-lang#74348) - rust-lang#137240 (Slightly reformat `std::fs::remove_dir_all` error docs) - rust-lang#137303 (Remove `MaybeForgetReturn` suggestion) - rust-lang#137327 (Undeprecate env::home_dir) - rust-lang#137463 ([illumos] attempt to use posix_spawn to spawn processes) - rust-lang#137477 (uefi: Add Service Binding Protocol abstraction) - rust-lang#137565 (Try to point of macro expansion from resolver and method errors if it involves macro var) - rust-lang#137569 (Stabilize `string_extend_from_within`) - rust-lang#137612 (Update bootstrap to edition 2024) - rust-lang#137633 (Only use implied bounds hack if bevy, and use deeply normalize in implied bounds hack) - rust-lang#137643 (Add DWARF test case for non-C-like `repr128` enums) - rust-lang#137679 (Various coretests improvements) - rust-lang#137723 (Make `rust.description` more general-purpose and pass `CFG_VER_DESCRIPTION`) - rust-lang#137758 (fix usage of ty decl macro fragments in attributes) - rust-lang#137764 (Ensure that negative auto impls are always applicable) - rust-lang#137772 (Fix char count in `Display` for `ByteStr`) - rust-lang#137798 (ci: use ubuntu 24 on arm large runner) - rust-lang#137805 (adjust Layout debug printing to match the internal field name) - rust-lang#137808 (Do not require that unsafe fields lack drop glue) - rust-lang#137820 (Clarify why InhabitedPredicate::instantiate_opt exists) - rust-lang#137825 (Provide more context on resolve error caused from incorrect RTN) - rust-lang#138028 (compiler: add `ExternAbi::is_rustic_abi`) r? `@ghost` `@rustbot` modify labels: rollup

Rollup of 20 pull requests Successful merges: - rust-lang#134063 (dec2flt: Clean up float parsing modules) - rust-lang#136581 (Retire the legacy `Makefile`-based `run-make` test infra) - rust-lang#136662 (Count char width at most once in `Formatter::pad`) - rust-lang#136764 (Make `ptr_cast_add_auto_to_object` lint into hard error) - rust-lang#136798 (Added documentation for flushing per rust-lang#74348) - rust-lang#136865 (Perform deeper compiletest path normalization for `$TEST_BUILD_DIR` to account for compare-mode/debugger cases, and normalize long type file filename hashes) - rust-lang#136975 (Look for `python3` first on MacOS, not `py`) - rust-lang#136977 (Upload Datadog metrics with citool) - rust-lang#137240 (Slightly reformat `std::fs::remove_dir_all` error docs) - rust-lang#137298 (Check signature WF when lowering MIR body) - rust-lang#137463 ([illumos] attempt to use posix_spawn to spawn processes) - rust-lang#137477 (uefi: Add Service Binding Protocol abstraction) - rust-lang#137569 (Stabilize `string_extend_from_within`) - rust-lang#137633 (Only use implied bounds hack if bevy, and use deeply normalize in implied bounds hack) - rust-lang#137679 (Various coretests improvements) - rust-lang#137723 (Make `rust.description` more general-purpose and pass `CFG_VER_DESCRIPTION`) - rust-lang#137728 (Remove unsizing coercions for tuples) - rust-lang#137731 (Resume one waiter at once in deadlock handler) - rust-lang#137875 (mir_build: Integrate "simplification" steps into match-pair-tree creation) - rust-lang#138028 (compiler: add `ExternAbi::is_rustic_abi`) r? `@ghost` `@rustbot` modify labels: rollup

Rollup merge of rust-lang#136662 - thaliaarchi:formatter-pad-char-count, r=m-ou-se Count char width at most once in `Formatter::pad` When both width and precision flags are specified, then `Formatter::pad` counts the character width twice. Instead, record the character width when truncating it to the precision, so it does not need to be recomputed. Simplify control flow so the cases are more clear. Related: - 6c9e708 (`fmt::Formatter::pad`: don't call chars().count() more than one time, 2021-09-01): Reduce counting chars from thrice to twice in worst case - ede39ae (feat: reinterpret `precision` field for strings, 2016-06-29): Change meaning of precision for strings - b820748 (Implement formatting arguments for strings and integers, 2013-08-10): Implement `Formatter::pad`

rustbot assigned alexcrichton Feb 6, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Feb 6, 2025

thaliaarchi force-pushed the formatter-pad-char-count branch 2 times, most recently from 2069815 to 1d26e00 Compare February 7, 2025 06:28

thaliaarchi mentioned this pull request Feb 7, 2025

Fix Display for invalid UTF-8 in OsStr/Path #136677

Open

rustbot assigned Amanieu and unassigned alexcrichton Feb 9, 2025

rustbot assigned m-ou-se and unassigned Amanieu Feb 13, 2025

m-ou-se reviewed Feb 19, 2025

View reviewed changes

thaliaarchi force-pushed the formatter-pad-char-count branch from 1d26e00 to 73030e7 Compare February 20, 2025 00:10

thaliaarchi requested a review from m-ou-se February 26, 2025 19:21

thaliaarchi force-pushed the formatter-pad-char-count branch from e5f6852 to 0ca1c9c Compare February 28, 2025 00:42

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 4, 2025

jieyouxu mentioned this pull request Mar 4, 2025

Rollup of 10 pull requests #138006

Closed

workingjubilee mentioned this pull request Mar 5, 2025

Rollup of 25 pull requests #138044

Closed

jieyouxu mentioned this pull request Mar 5, 2025

Rollup of 20 pull requests #138058

Merged

bors merged commit 1b9b515 into rust-lang:master Mar 5, 2025
6 checks passed

rustbot added this to the 1.87.0 milestone Mar 5, 2025

thaliaarchi deleted the formatter-pad-char-count branch March 5, 2025 19:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Count char width at most once in `Formatter::pad` #136662

Count char width at most once in `Formatter::pad` #136662

thaliaarchi commented Feb 6, 2025 •

edited

Loading

alexcrichton commented Feb 9, 2025

thaliaarchi commented Feb 13, 2025

Amanieu commented Feb 13, 2025

m-ou-se Feb 19, 2025

m-ou-se Feb 19, 2025

thaliaarchi Feb 20, 2025

thaliaarchi commented Feb 20, 2025 •

edited

Loading

thaliaarchi commented Feb 28, 2025

thaliaarchi commented Mar 3, 2025

m-ou-se commented Mar 4, 2025

bors commented Mar 4, 2025

		let mut iter = s.chars();
		let char_count = iter.by_ref().take(max_char_count).count();

Count char width at most once in Formatter::pad #136662

Count char width at most once in Formatter::pad #136662

Conversation

thaliaarchi commented Feb 6, 2025 • edited Loading

alexcrichton commented Feb 9, 2025

thaliaarchi commented Feb 13, 2025

Amanieu commented Feb 13, 2025

m-ou-se Feb 19, 2025

Choose a reason for hiding this comment

m-ou-se Feb 19, 2025

Choose a reason for hiding this comment

thaliaarchi Feb 20, 2025

Choose a reason for hiding this comment

thaliaarchi commented Feb 20, 2025 • edited Loading

thaliaarchi commented Feb 28, 2025

thaliaarchi commented Mar 3, 2025

m-ou-se commented Mar 4, 2025

bors commented Mar 4, 2025

Count char width at most once in `Formatter::pad` #136662

Count char width at most once in `Formatter::pad` #136662

thaliaarchi commented Feb 6, 2025 •

edited

Loading

thaliaarchi commented Feb 20, 2025 •

edited

Loading