-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store ContextId in ParseState instead of &Context #190
Store ContextId in ParseState instead of &Context #190
Conversation
This means ParseState doesn't contain a reference to SyntaxSet anymore and doesn't need a lifetime. This makes it simpler for code that wants to cache it, but it also means parse_line needs to be called with the SyntaxSet as an argument so that it can look up contexts.
@trishume Separate PR to make it easier to review. Feel free to merge into other PR if you decide to go ahead with this :). I'm now also in favor of doing this. |
(Haven't had a chance to benchmark yet, ran into an issue with criterion: bheisler/criterion.rs#182) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing all this rejiggering, this looks great!
I'll wait for the benchmarks but my perf intuition tells me it won't make a big difference, and I'm willing to accept a small regression for not saddling people with borrow checker pain.
/// | ||
/// The `SyntaxSet` has to be the one that contained the syntax that was | ||
/// used to construct this `ParseState`, or an extended version of it. | ||
/// Otherwise the parsing would return the wrong result or even panic. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the docs!
One way we could mitigate this is by creating a "token" for each syntax set builder as well as a generation that gets bumped on each syntax added. The SyntaxReference
would also keep the token and generation from its creation. parse_line
would then assert!
that the token of the SyntaxSet
matches and the generation was greater or equal. This would lead to better error messages and never mysterious incorrect results, and the cost should be negligible.
The token could either be a random number (requiring a rand
dependency) or a Box<u8>
in the SyntaxSet
and a *const u8
in the SyntaxReference
because comparing pointers is safe and doesn't need lifetimes, and the box guarantees uniqueness and non-movingness. The u8
is just because I'm unsure if Box<()>
actually allocates a different address for each one.
You very much don't have to do this. I think the current state of dropping the SyntaxSet
causing panics is easier to get into and worse than this. Just bonus if you feel like it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think this is something we can easily do after we merge the big PR :).
As for the implementation, why wouldn't we just store a copy of the same u8
on the SyntaxReference
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the u8
would actually always be 0
, it's just there to make it nonzero sized. The Box
would be doing all the work by using the system allocator to allocate a globally unique value. Actually a static AtomicUsize
would also work for allocating unique tokens and probably make more sense.
&mut res) { | ||
} | ||
while self.parse_next_token( | ||
line, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I also prefer this style, but only because my editor isn't set up to do the deep indents properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's the new standard rustfmt style too. I think we should run rustfmt
on the whole codebase once the big PR is in.
} | ||
|
||
#[test] | ||
fn can_compare_parse_states() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
So I ran the benchmark with the base branch and this, here's the results:
Note that I ran the jquery one again (so both "before" and "after" is from this branch and it changed):
So not very conclusive unfortunately. But yeah, given that the overwhelming amount of time is spent doing regex matching, I'm not concerned either. Will merge this! |
This means ParseState doesn't contain a reference to SyntaxSet anymore
and doesn't need a lifetime.
This makes it simpler for code that wants to cache it, but it also means
parse_line needs to be called with the SyntaxSet as an argument so that
it can look up contexts.