-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch proc_macro RPC for TokenStream iteration and combination operations #98186
Conversation
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as outdated.
This comment was marked as outdated.
r? @eddyb |
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 8494bf02bcef1af99174294c22e520e82af5a18e with merge a6e406c9dd73d5556dcb04c2d7471ddc2c1faf0e... |
This comment has been minimized.
This comment has been minimized.
… and iterate TokenStreams This significantly reduces the cost of common interactions with TokenStream when running with the CrossThread execution strategy, by reducing the number of RPC calls required.
…tend impls This is an experimental patch to try to reduce the codegen complexity of TokenStream's FromIterator and Extend implementations for downstream crates, by moving the core logic into a helper type. This might help improve build performance of crates which depend on proc_macro as iterators are used less, and the compiler may take less time to do things like attempt specializations or other iterator optimizations. The change intentionally sacrifices some optimization opportunities, such as using the specializations for collecting iterators derived from Vec::into_iter() into Vec. This is one of the simpler potential approaches to reducing the amount of code generated in crates depending on proc_macro, so it seems worth trying before other more-involved changes.
8494bf0
to
4d45af9
Compare
Ugh, did the force push break bors/perf? @bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 4d45af9 with merge dfb02cb3ced64e8fe898f03c414d3654c4b59d88... |
run_client(bridge, |input| { | ||
f(crate::TokenStream(Some(input))) | ||
.0 | ||
.unwrap_or_else(|| TokenStream::concat_streams(None, vec![])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not quite clear to me which close paren matches which open paren. Could you please use a temp variable somewhere to break this expression up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed a new commit which moves this empty stream handling to the server side of the bridge while keeping the interface to callers the same, which should be more performant and hopefully easier to read.
☀️ Try build successful - checks-actions |
Queued dfb02cb3ced64e8fe898f03c414d3654c4b59d88 with parent 0423e06, future comparison URL. |
Finished benchmarking commit (dfb02cb3ced64e8fe898f03c414d3654c4b59d88): comparison url. Instruction count
Max RSS (memory usage)Results
CyclesResults
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Footnotes |
btw the lone regression mentioned by perfbot is noise: the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, most of the comments are nits, pretty much r=me when you run out of actionable/agreeable ones.
library/proc_macro/src/bridge/mod.rs
Outdated
impl<T: Mark> Mark for Vec<T> { | ||
type Unmarked = Vec<T::Unmarked>; | ||
fn mark(unmarked: Self::Unmarked) -> Self { | ||
// Should be a no-op due to std's in-place collect optimizations. | ||
unmarked.into_iter().map(T::mark).collect() | ||
} | ||
} | ||
impl<T: Unmark> Unmark for Vec<T> { | ||
type Unmarked = Vec<T::Unmarked>; | ||
fn unmark(self) -> Self::Unmarked { | ||
// Should be a no-op due to std's in-place collect optimizations. | ||
self.into_iter().map(T::unmark).collect() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This marking stuff is sad, I keep forgetting to look into whether the RPC traits can get another parameter (defaulting to Self
) to allow dispatching on more than one type (assuming that's the problem in the first place - maybe long term such reshuffle could also allow passing sequences without allocating a Vec
etc.).
trees: Vec< | ||
bridge::TokenTree< | ||
bridge::client::Group, | ||
bridge::client::Punct, | ||
bridge::client::Ident, | ||
bridge::client::Literal, | ||
>, | ||
>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random note: we should do some perf experiments using SmallVec
(or a hacked up copy of it) with various sizes - since this is only ever on the stack in a "leaf" (client) function, that then passes that data to RPC, we might be able to set some large inline capacity like 32, and have it be an overall win.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite possibly. It would be really nice if we could easily depend on crates in proc_macro
so we don't need to make a small copy of SmallVec
to use it, but that's a problem for another PR.
run_client(bridge, |(input, input2)| { | ||
f(crate::TokenStream(input), crate::TokenStream(input2)).0 | ||
f(crate::TokenStream(Some(input)), crate::TokenStream(Some(input2))).0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to pass Option
on the RPC to the client, as well, not just back to the server?
(Also I'm imagining we might be able to get rid of the split between single-input and dual-input entry-points, and assert!(input2.is_none())
somewhere seems like it would be easier than dealing with handles.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could I suppose, but it's mildly inconvenient and introduces a bunch of extra code without much benefit so I'm inclined not to for now.
// FIXME: This is a raw port of the previous approach, and can probably | ||
// be optimized. | ||
let mut cursor = stream.into_trees(); | ||
let mut stack = Vec::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to say that this can be SmallVec
with an inline capacity of 2 or so, but it looks like what's actually happening is TokenTree::from_internal
could actually be returning ArrayVec<_, 3>
(I suppose at that point it's not even a FromInternal
impl heh).
That is, despite being ominously named "stack" (as if it's some kind of vertical tree traversal), it's a weird FILO queue of pending additional TokenTree
s (which your implementation could consume right away with e.g. a nested for
loop).
(you can add a comment noting this, or just ignore it as a note to self)
continue; | ||
let next = stack.pop().or_else(|| { | ||
let next = cursor.next_with_spacing()?; | ||
Some(TokenTree::from_internal((next, &mut stack, self))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random note: these from/to "internal" things should probably be s/internal/rustc, for clarity.
@bors r+ |
📌 Commit df925fd has been approved by |
☀️ Test successful - checks-actions |
Finished benchmarking commit (0182fd9): comparison url. Instruction count
Max RSS (memory usage)Results
CyclesResults
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. @rustbot label: -perf-regression Footnotes |
…idge This is done by having the crossbeam dependency inserted into the proc_macro server code from the server side, to avoid adding a dependency to proc_macro. In addition, this introduces a -Z command-line option which will switch rustc to run proc-macros using this cross-thread executor. With the changes to the bridge in rust-lang#98186, rust-lang#98187, rust-lang#98188 and rust-lang#98189, the performance of the executor should be much closer to same-thread execution. In local testing, the crossbeam executor was substantially more performant than either of the two existing CrossThread strategies, so they have been removed to keep things simple.
proc_macro: use crossbeam channels for the proc_macro cross-thread bridge This is done by having the crossbeam dependency inserted into the `proc_macro` server code from the server side, to avoid adding a dependency to `proc_macro`. In addition, this introduces a -Z command-line option which will switch rustc to run proc-macros using this cross-thread executor. With the changes to the bridge in rust-lang#98186, rust-lang#98187, rust-lang#98188 and rust-lang#98189, the performance of the executor should be much closer to same-thread execution. In local testing, the crossbeam executor was substantially more performant than either of the two existing `CrossThread` strategies, so they have been removed to keep things simple. r? `@eddyb`
This is the first part of #86822, split off as requested in #86822 (review). It reduces the number of RPC calls required for common operations such as iterating over and concatenating TokenStreams.