Batch proc_macro RPC for TokenStream iteration and combination operations #98186

mystor · 2022-06-17T04:27:23Z

This is the first part of #86822, split off as requested in #86822 (review). It reduces the number of RPC calls required for common operations such as iterating over and concatenating TokenStreams.

mystor · 2022-06-17T04:27:36Z

r? @eddyb

eddyb · 2022-06-17T04:29:34Z

@bors try @rust-timer queue

rust-timer · 2022-06-17T04:29:35Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-06-17T04:29:43Z

⌛ Trying commit 8494bf02bcef1af99174294c22e520e82af5a18e with merge a6e406c9dd73d5556dcb04c2d7471ddc2c1faf0e...

… and iterate TokenStreams This significantly reduces the cost of common interactions with TokenStream when running with the CrossThread execution strategy, by reducing the number of RPC calls required.

…tend impls This is an experimental patch to try to reduce the codegen complexity of TokenStream's FromIterator and Extend implementations for downstream crates, by moving the core logic into a helper type. This might help improve build performance of crates which depend on proc_macro as iterators are used less, and the compiler may take less time to do things like attempt specializations or other iterator optimizations. The change intentionally sacrifices some optimization opportunities, such as using the specializations for collecting iterators derived from Vec::into_iter() into Vec. This is one of the simpler potential approaches to reducing the amount of code generated in crates depending on proc_macro, so it seems worth trying before other more-involved changes.

eddyb · 2022-06-17T09:09:27Z

Ugh, did the force push break bors/perf?

@bors try @rust-timer queue

rust-timer · 2022-06-17T09:09:29Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-06-17T09:09:38Z

⌛ Trying commit 4d45af9 with merge dfb02cb3ced64e8fe898f03c414d3654c4b59d88...

bjorn3 · 2022-06-17T09:37:46Z

library/proc_macro/src/bridge/client.rs

+                run_client(bridge, |input| {
+                    f(crate::TokenStream(Some(input)))
+                        .0
+                        .unwrap_or_else(|| TokenStream::concat_streams(None, vec![]))


It's not quite clear to me which close paren matches which open paren. Could you please use a temp variable somewhere to break this expression up?

Pushed a new commit which moves this empty stream handling to the server side of the bridge while keeping the interface to callers the same, which should be more performant and hopefully easier to read.

bors · 2022-06-17T10:38:49Z

☀️ Try build successful - checks-actions
Build commit: dfb02cb3ced64e8fe898f03c414d3654c4b59d88 (dfb02cb3ced64e8fe898f03c414d3654c4b59d88)

rust-timer · 2022-06-17T10:38:51Z

Queued dfb02cb3ced64e8fe898f03c414d3654c4b59d88 with parent 0423e06, future comparison URL.

rust-timer · 2022-06-17T12:55:12Z

Finished benchmarking commit (dfb02cb3ced64e8fe898f03c414d3654c4b59d88): comparison url.

Instruction count

Primary benchmarks: 🎉 relevant improvements found
Secondary benchmarks: 🎉 relevant improvements found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	1.2%	1.2%	1
Improvements 🎉 (primary)	-1.0%	-3.9%	26
Improvements 🎉 (secondary)	-12.1%	-37.7%	15
All 😿🎉 (primary)	-1.0%	-3.9%	26

Max RSS (memory usage)

Results

Primary benchmarks: 🎉 relevant improvement found
Secondary benchmarks: mixed results

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	3.3%	3.3%	1
Improvements 🎉 (primary)	-2.1%	-2.1%	1
Improvements 🎉 (secondary)	-2.7%	-2.9%	2
All 😿🎉 (primary)	-2.1%	-2.1%	1

Cycles

Results

Primary benchmarks: mixed results
Secondary benchmarks: 🎉 relevant improvements found

	mean¹	max	count²
Regressions 😿 (primary)	4.0%	5.1%	2
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	-4.5%	-4.5%	1
Improvements 🎉 (secondary)	-21.0%	-43.6%	10
All 😿🎉 (primary)	1.2%	5.1%	3

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

lqd · 2022-06-17T13:14:29Z

btw the lone regression mentioned by perfbot is noise: the coercions debug benchmark seems to be jumpy recently.

eddyb

LGTM, most of the comments are nits, pretty much r=me when you run out of actionable/agreeable ones.

library/proc_macro/src/bridge/mod.rs

eddyb · 2022-06-17T18:00:04Z

library/proc_macro/src/bridge/mod.rs

+impl<T: Mark> Mark for Vec<T> {
+    type Unmarked = Vec<T::Unmarked>;
+    fn mark(unmarked: Self::Unmarked) -> Self {
+        // Should be a no-op due to std's in-place collect optimizations.
+        unmarked.into_iter().map(T::mark).collect()
+    }
+}
+impl<T: Unmark> Unmark for Vec<T> {
+    type Unmarked = Vec<T::Unmarked>;
+    fn unmark(self) -> Self::Unmarked {
+        // Should be a no-op due to std's in-place collect optimizations.
+        self.into_iter().map(T::unmark).collect()
+    }
+}


This marking stuff is sad, I keep forgetting to look into whether the RPC traits can get another parameter (defaulting to Self) to allow dispatching on more than one type (assuming that's the problem in the first place - maybe long term such reshuffle could also allow passing sequences without allocating a Vec etc.).

library/proc_macro/src/bridge/mod.rs

eddyb · 2022-06-17T18:43:42Z

library/proc_macro/src/lib.rs

+    trees: Vec<
+        bridge::TokenTree<
+            bridge::client::Group,
+            bridge::client::Punct,
+            bridge::client::Ident,
+            bridge::client::Literal,
+        >,
+    >,


Random note: we should do some perf experiments using SmallVec (or a hacked up copy of it) with various sizes - since this is only ever on the stack in a "leaf" (client) function, that then passes that data to RPC, we might be able to set some large inline capacity like 32, and have it be an overall win.

Quite possibly. It would be really nice if we could easily depend on crates in proc_macro so we don't need to make a small copy of SmallVec to use it, but that's a problem for another PR.

eddyb · 2022-06-17T18:51:10Z

library/proc_macro/src/bridge/client.rs

                run_client(bridge, |(input, input2)| {
-                    f(crate::TokenStream(input), crate::TokenStream(input2)).0
+                    f(crate::TokenStream(Some(input)), crate::TokenStream(Some(input2))).0


Would it make sense to pass Option on the RPC to the client, as well, not just back to the server?

(Also I'm imagining we might be able to get rid of the split between single-input and dual-input entry-points, and assert!(input2.is_none()) somewhere seems like it would be easier than dealing with handles.)

We could I suppose, but it's mildly inconvenient and introduces a bunch of extra code without much benefit so I'm inclined not to for now.

compiler/rustc_expand/src/proc_macro_server.rs

eddyb · 2022-06-17T19:09:32Z

compiler/rustc_expand/src/proc_macro_server.rs

+        // FIXME: This is a raw port of the previous approach, and can probably
+        // be optimized.
+        let mut cursor = stream.into_trees();
+        let mut stack = Vec::new();


I was going to say that this can be SmallVec with an inline capacity of 2 or so, but it looks like what's actually happening is TokenTree::from_internal could actually be returning ArrayVec<_, 3> (I suppose at that point it's not even a FromInternal impl heh).

That is, despite being ominously named "stack" (as if it's some kind of vertical tree traversal), it's a weird FILO queue of pending additional TokenTrees (which your implementation could consume right away with e.g. a nested for loop).

(you can add a comment noting this, or just ignore it as a note to self)

eddyb · 2022-06-17T19:10:21Z

compiler/rustc_expand/src/proc_macro_server.rs

-                    continue;
+            let next = stack.pop().or_else(|| {
+                let next = cursor.next_with_spacing()?;
+                Some(TokenTree::from_internal((next, &mut stack, self)))


Random note: these from/to "internal" things should probably be s/internal/rustc, for clarity.

eddyb · 2022-06-18T06:13:02Z

@bors r+

bors · 2022-06-18T06:13:04Z

📌 Commit df925fd has been approved by eddyb

bors · 2022-06-18T07:37:17Z

⌛ Testing commit df925fd with merge 0182fd9...

bors · 2022-06-18T09:56:55Z

☀️ Test successful - checks-actions
Approved by: eddyb
Pushing 0182fd9 to master...

rust-timer · 2022-06-18T11:14:45Z

Finished benchmarking commit (0182fd9): comparison url.

Instruction count

Primary benchmarks: 🎉 relevant improvements found
Secondary benchmarks: 🎉 relevant improvements found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	-1.0%	-4.0%	26
Improvements 🎉 (secondary)	-12.1%	-37.5%	15
All 😿🎉 (primary)	-1.0%	-4.0%	26

Max RSS (memory usage)

Results

Primary benchmarks: no relevant changes found
Secondary benchmarks: 🎉 relevant improvement found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	-2.5%	-2.5%	1
All 😿🎉 (primary)	N/A	N/A	0

Cycles

Results

Primary benchmarks: 🎉 relevant improvement found
Secondary benchmarks: mixed results

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	4.9%	4.9%	1
Improvements 🎉 (primary)	-3.4%	-3.4%	1
Improvements 🎉 (secondary)	-19.4%	-42.3%	11
All 😿🎉 (primary)	-3.4%	-3.4%	1

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

…idge This is done by having the crossbeam dependency inserted into the proc_macro server code from the server side, to avoid adding a dependency to proc_macro. In addition, this introduces a -Z command-line option which will switch rustc to run proc-macros using this cross-thread executor. With the changes to the bridge in rust-lang#98186, rust-lang#98187, rust-lang#98188 and rust-lang#98189, the performance of the executor should be much closer to same-thread execution. In local testing, the crossbeam executor was substantially more performant than either of the two existing CrossThread strategies, so they have been removed to keep things simple.

proc_macro: use crossbeam channels for the proc_macro cross-thread bridge This is done by having the crossbeam dependency inserted into the `proc_macro` server code from the server side, to avoid adding a dependency to `proc_macro`. In addition, this introduces a -Z command-line option which will switch rustc to run proc-macros using this cross-thread executor. With the changes to the bridge in rust-lang#98186, rust-lang#98187, rust-lang#98188 and rust-lang#98189, the performance of the executor should be much closer to same-thread execution. In local testing, the crossbeam executor was substantially more performant than either of the two existing `CrossThread` strategies, so they have been removed to keep things simple. r? `@eddyb`

mystor added 3 commits June 14, 2022 22:12

proc_macro: support encoding/decoding structs with type parameters

7678e6a

proc_macro: support encoding/decoding Vec<T>

1793ee0

proc_macro: use macros to simplify aggregate Mark/Unmark definitions

2b17219

rust-highfive assigned petrochenkov Jun 17, 2022

This comment was marked as off-topic.

Sign in to view

rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jun 17, 2022

This comment was marked as outdated.

Sign in to view

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 17, 2022

rust-highfive assigned eddyb and unassigned petrochenkov Jun 17, 2022

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 17, 2022

This comment has been minimized.

Sign in to view

mystor added 2 commits June 17, 2022 00:42

proc_macro: reduce the number of messages required to create, extend,…

0a049fd

… and iterate TokenStreams This significantly reduces the cost of common interactions with TokenStream when running with the CrossThread execution strategy, by reducing the number of RPC calls required.

mystor force-pushed the tokenstream_as_vec_tt branch from 8494bf0 to 4d45af9 Compare June 17, 2022 04:42

bjorn3 reviewed Jun 17, 2022

View reviewed changes

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 17, 2022

Move empty final TokenStream handling to server side of bridge

af51424

eddyb suggested changes Jun 17, 2022

View reviewed changes

review fixups

df925fd

eddyb approved these changes Jun 18, 2022

View reviewed changes

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 18, 2022

bors added the merged-by-bors This PR was explicitly merged by bors. label Jun 18, 2022

bors merged commit 0182fd9 into rust-lang:master Jun 18, 2022

rustbot added this to the 1.63.0 milestone Jun 18, 2022

This was referenced Jun 18, 2022

More proc macro tweaks #97445

Closed

proc_macro/bridge: remove client->server &HandleCounters passing. #98223

Closed

mystor mentioned this pull request Jul 10, 2022

proc_macro: use crossbeam channels for the proc_macro cross-thread bridge #99123

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch proc_macro RPC for TokenStream iteration and combination operations #98186

Batch proc_macro RPC for TokenStream iteration and combination operations #98186

mystor commented Jun 17, 2022

This comment was marked as off-topic.

This comment was marked as outdated.

mystor commented Jun 17, 2022

eddyb commented Jun 17, 2022

rust-timer commented Jun 17, 2022

bors commented Jun 17, 2022

This comment has been minimized.

eddyb commented Jun 17, 2022

rust-timer commented Jun 17, 2022

bors commented Jun 17, 2022

bjorn3 Jun 17, 2022

mystor Jun 17, 2022

bors commented Jun 17, 2022

rust-timer commented Jun 17, 2022

rust-timer commented Jun 17, 2022

lqd commented Jun 17, 2022

eddyb left a comment

eddyb Jun 17, 2022

eddyb Jun 17, 2022

mystor Jun 17, 2022

eddyb Jun 17, 2022

mystor Jun 18, 2022

eddyb Jun 17, 2022

eddyb Jun 17, 2022

eddyb commented Jun 18, 2022

bors commented Jun 18, 2022

bors commented Jun 18, 2022

bors commented Jun 18, 2022

rust-timer commented Jun 18, 2022

Batch proc_macro RPC for TokenStream iteration and combination operations #98186

Batch proc_macro RPC for TokenStream iteration and combination operations #98186

Conversation

mystor commented Jun 17, 2022

This comment was marked as off-topic.

This comment was marked as outdated.

mystor commented Jun 17, 2022

eddyb commented Jun 17, 2022

rust-timer commented Jun 17, 2022

bors commented Jun 17, 2022

This comment has been minimized.

eddyb commented Jun 17, 2022

rust-timer commented Jun 17, 2022

bors commented Jun 17, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bors commented Jun 17, 2022

rust-timer commented Jun 17, 2022

rust-timer commented Jun 17, 2022

Footnotes

lqd commented Jun 17, 2022

eddyb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eddyb commented Jun 18, 2022

bors commented Jun 18, 2022

bors commented Jun 18, 2022

bors commented Jun 18, 2022

rust-timer commented Jun 18, 2022

Footnotes