Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store ContextId in ParseState instead of &Context #190

Merged

Conversation

robinst
Copy link
Collaborator

@robinst robinst commented Jul 27, 2018

This means ParseState doesn't contain a reference to SyntaxSet anymore
and doesn't need a lifetime.

This makes it simpler for code that wants to cache it, but it also means
parse_line needs to be called with the SyntaxSet as an argument so that
it can look up contexts.

This means ParseState doesn't contain a reference to SyntaxSet anymore
and doesn't need a lifetime.

This makes it simpler for code that wants to cache it, but it also means
parse_line needs to be called with the SyntaxSet as an argument so that
it can look up contexts.
@robinst
Copy link
Collaborator Author

robinst commented Jul 27, 2018

@trishume Separate PR to make it easier to review. Feel free to merge into other PR if you decide to go ahead with this :). I'm now also in favor of doing this.

@robinst robinst mentioned this pull request Jul 27, 2018
9 tasks
@robinst
Copy link
Collaborator Author

robinst commented Jul 27, 2018

(Haven't had a chance to benchmark yet, ran into an issue with criterion: bheisler/criterion.rs#182)

Copy link
Owner

@trishume trishume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing all this rejiggering, this looks great!

I'll wait for the benchmarks but my perf intuition tells me it won't make a big difference, and I'm willing to accept a small regression for not saddling people with borrow checker pain.

///
/// The `SyntaxSet` has to be the one that contained the syntax that was
/// used to construct this `ParseState`, or an extended version of it.
/// Otherwise the parsing would return the wrong result or even panic. The
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the docs!

One way we could mitigate this is by creating a "token" for each syntax set builder as well as a generation that gets bumped on each syntax added. The SyntaxReference would also keep the token and generation from its creation. parse_line would then assert! that the token of the SyntaxSet matches and the generation was greater or equal. This would lead to better error messages and never mysterious incorrect results, and the cost should be negligible.

The token could either be a random number (requiring a rand dependency) or a Box<u8> in the SyntaxSet and a *const u8 in the SyntaxReference because comparing pointers is safe and doesn't need lifetimes, and the box guarantees uniqueness and non-movingness. The u8 is just because I'm unsure if Box<()> actually allocates a different address for each one.

You very much don't have to do this. I think the current state of dropping the SyntaxSet causing panics is easier to get into and worse than this. Just bonus if you feel like it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think this is something we can easily do after we merge the big PR :).

As for the implementation, why wouldn't we just store a copy of the same u8 on the SyntaxReference directly?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the u8 would actually always be 0, it's just there to make it nonzero sized. The Box would be doing all the work by using the system allocator to allocate a globally unique value. Actually a static AtomicUsize would also work for allocating unique tokens and probably make more sense.

&mut res) {
}
while self.parse_next_token(
line,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I also prefer this style, but only because my editor isn't set up to do the deep indents properly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's the new standard rustfmt style too. I think we should run rustfmt on the whole codebase once the big PR is in.

}

#[test]
fn can_compare_parse_states() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@robinst
Copy link
Collaborator Author

robinst commented Jul 30, 2018

So I ran the benchmark with the base branch and this, here's the results:

parse/"highlight_test.erb"
                        time:   [1.4563 ms 1.4663 ms 1.4828 ms]
                        change: [-29.224% -18.677% -7.9199%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 20 measurements (10.00%)
  1 (5.00%) high mild
  1 (5.00%) high severe
parse/"InspiredGitHub.tmTheme"
                        time:   [17.825 ms 18.104 ms 18.336 ms]
                        change: [-12.909% -10.030% -7.3582%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high mild
parse/"Ruby.sublime-syntax"
                        time:   [73.421 ms 77.776 ms 82.579 ms]
                        change: [-16.879% -11.979% -6.6834%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high severe
parse/"jquery.js"       time:   [605.69 ms 667.48 ms 756.20 ms]
                        change: [+1.7598% +11.291% +22.238%] (p = 0.02 < 0.05)
                        Performance has regressed.
Found 2 outliers among 20 measurements (10.00%)
  1 (5.00%) high mild
  1 (5.00%) high severe
parse/"parser.rs"       time:   [363.98 ms 374.33 ms 388.59 ms]
                        change: [+4.2024% +7.3198% +11.463%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 20 measurements (10.00%)
  1 (5.00%) high mild
  1 (5.00%) high severe
parse/"scope.rs"        time:   [31.097 ms 31.270 ms 31.466 ms]
                        change: [-9.1224% -4.2079% -0.7480%] (p = 0.06 > 0.05)
                        No change in performance detected.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high severe

Note that I ran the jquery one again (so both "before" and "after" is from this branch and it changed):

parse/"jquery.js"       time:   [554.73 ms 557.23 ms 560.57 ms]
                        change: [-19.530% -12.552% -6.4291%] (p = 0.00 < 0.05)
                        Performance has improved.

So not very conclusive unfortunately.

But yeah, given that the overwhelming amount of time is spent doing regex matching, I'm not concerned either. Will merge this!

@robinst robinst merged commit 913aaf5 into use-arena-for-contexts Jul 30, 2018
@robinst robinst deleted the use-arena-for-contexts-parsestate-ids branch July 30, 2018 04:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants