Store ContextId in ParseState instead of &Context #190

robinst · 2018-07-27T08:44:42Z

This means ParseState doesn't contain a reference to SyntaxSet anymore
and doesn't need a lifetime.

This makes it simpler for code that wants to cache it, but it also means
parse_line needs to be called with the SyntaxSet as an argument so that
it can look up contexts.

This means ParseState doesn't contain a reference to SyntaxSet anymore and doesn't need a lifetime. This makes it simpler for code that wants to cache it, but it also means parse_line needs to be called with the SyntaxSet as an argument so that it can look up contexts.

robinst · 2018-07-27T08:45:56Z

@trishume Separate PR to make it easier to review. Feel free to merge into other PR if you decide to go ahead with this :). I'm now also in favor of doing this.

robinst · 2018-07-27T09:00:06Z

(Haven't had a chance to benchmark yet, ran into an issue with criterion: bheisler/criterion.rs#182)

trishume

Thanks for doing all this rejiggering, this looks great!

I'll wait for the benchmarks but my perf intuition tells me it won't make a big difference, and I'm willing to accept a small regression for not saddling people with borrow checker pain.

trishume · 2018-07-28T20:55:22Z

src/parsing/parser.rs

+    ///
+    /// The `SyntaxSet` has to be the one that contained the syntax that was
+    /// used to construct this `ParseState`, or an extended version of it.
+    /// Otherwise the parsing would return the wrong result or even panic. The


Thanks for the docs!

One way we could mitigate this is by creating a "token" for each syntax set builder as well as a generation that gets bumped on each syntax added. The SyntaxReference would also keep the token and generation from its creation. parse_line would then assert! that the token of the SyntaxSet matches and the generation was greater or equal. This would lead to better error messages and never mysterious incorrect results, and the cost should be negligible.

The token could either be a random number (requiring a rand dependency) or a Box<u8> in the SyntaxSet and a *const u8 in the SyntaxReference because comparing pointers is safe and doesn't need lifetimes, and the box guarantees uniqueness and non-movingness. The u8 is just because I'm unsure if Box<()> actually allocates a different address for each one.

You very much don't have to do this. I think the current state of dropping the SyntaxSet causing panics is easier to get into and worse than this. Just bonus if you feel like it.

Yeah I think this is something we can easily do after we merge the big PR :).

As for the implementation, why wouldn't we just store a copy of the same u8 on the SyntaxReference directly?

Because the u8 would actually always be 0, it's just there to make it nonzero sized. The Box would be doing all the work by using the system allocator to allocate a globally unique value. Actually a static AtomicUsize would also work for allocating unique tokens and probably make more sense.

trishume · 2018-07-28T20:56:21Z

src/parsing/parser.rs

-                                    &mut res) {
-        }
+        while self.parse_next_token(
+            line,


👍 I also prefer this style, but only because my editor isn't set up to do the deep indents properly.

Yeah, that's the new standard rustfmt style too. I think we should run rustfmt on the whole codebase once the big PR is in.

trishume · 2018-07-28T20:59:41Z

src/parsing/parser.rs

+    }
+
+    #[test]
+    fn can_compare_parse_states() {


robinst · 2018-07-30T04:17:57Z

So I ran the benchmark with the base branch and this, here's the results:

parse/"highlight_test.erb"
                        time:   [1.4563 ms 1.4663 ms 1.4828 ms]
                        change: [-29.224% -18.677% -7.9199%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 20 measurements (10.00%)
  1 (5.00%) high mild
  1 (5.00%) high severe
parse/"InspiredGitHub.tmTheme"
                        time:   [17.825 ms 18.104 ms 18.336 ms]
                        change: [-12.909% -10.030% -7.3582%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high mild
parse/"Ruby.sublime-syntax"
                        time:   [73.421 ms 77.776 ms 82.579 ms]
                        change: [-16.879% -11.979% -6.6834%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high severe
parse/"jquery.js"       time:   [605.69 ms 667.48 ms 756.20 ms]
                        change: [+1.7598% +11.291% +22.238%] (p = 0.02 < 0.05)
                        Performance has regressed.
Found 2 outliers among 20 measurements (10.00%)
  1 (5.00%) high mild
  1 (5.00%) high severe
parse/"parser.rs"       time:   [363.98 ms 374.33 ms 388.59 ms]
                        change: [+4.2024% +7.3198% +11.463%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 20 measurements (10.00%)
  1 (5.00%) high mild
  1 (5.00%) high severe
parse/"scope.rs"        time:   [31.097 ms 31.270 ms 31.466 ms]
                        change: [-9.1224% -4.2079% -0.7480%] (p = 0.06 > 0.05)
                        No change in performance detected.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high severe

Note that I ran the jquery one again (so both "before" and "after" is from this branch and it changed):

parse/"jquery.js"       time:   [554.73 ms 557.23 ms 560.57 ms]
                        change: [-19.530% -12.552% -6.4291%] (p = 0.00 < 0.05)
                        Performance has improved.

So not very conclusive unfortunately.

But yeah, given that the overwhelming amount of time is spent doing regex matching, I'm not concerned either. Will merge this!

robinst mentioned this pull request Jul 27, 2018

Use arena for contexts #182

Merged

9 tasks

trishume approved these changes Jul 28, 2018

View reviewed changes

robinst merged commit 913aaf5 into use-arena-for-contexts Jul 30, 2018

robinst deleted the use-arena-for-contexts-parsestate-ids branch July 30, 2018 04:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store ContextId in ParseState instead of &Context #190

Store ContextId in ParseState instead of &Context #190

robinst commented Jul 27, 2018

robinst commented Jul 27, 2018

robinst commented Jul 27, 2018

trishume left a comment

trishume Jul 28, 2018

robinst Jul 30, 2018

trishume Jul 30, 2018

trishume Jul 28, 2018

robinst Jul 30, 2018

trishume Jul 28, 2018

robinst commented Jul 30, 2018

Store ContextId in ParseState instead of &Context #190

Store ContextId in ParseState instead of &Context #190

Conversation

robinst commented Jul 27, 2018

robinst commented Jul 27, 2018

robinst commented Jul 27, 2018

trishume left a comment

Choose a reason for hiding this comment

trishume Jul 28, 2018

Choose a reason for hiding this comment

robinst Jul 30, 2018

Choose a reason for hiding this comment

trishume Jul 30, 2018

Choose a reason for hiding this comment

trishume Jul 28, 2018

Choose a reason for hiding this comment

robinst Jul 30, 2018

Choose a reason for hiding this comment

trishume Jul 28, 2018

Choose a reason for hiding this comment

robinst commented Jul 30, 2018