Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slow output, with pauses, with c# source code #677

Closed
vvuk opened this issue Oct 2, 2019 · 17 comments · Fixed by #912
Closed

slow output, with pauses, with c# source code #677

vvuk opened this issue Oct 2, 2019 · 17 comments · Fixed by #912
Assignees
Labels
help wanted Extra attention is needed syntax-highlighting upstream-error A bug in an upstream component
Milestone

Comments

@vvuk
Copy link

vvuk commented Oct 2, 2019

Something strange is going on. On a 1500-line .cs file, bat takes ~7s to list the file. --pager none, both with and without -p (not much difference with -p). I can see a pause after every1-2 dozen lines are output. --colors never also doesn't make a difference (bench via hyperfine):

Command Mean [ms] Min [ms] Max [ms] Relative
cmd /c type EntityComponentStore.cs 86.1 ± 7.3 75.9 101.1 1.0
bat --pager none EntityComponentStore.cs 7734.8 ± 586.6 6857.4 8700.1 89.9
bat --pager none -p EntityComponentStore.cs 6982.9 ± 464.3 6290.5 7889.4 81.1
bat --color never --pager none EntityComponentStore.cs 7155.5 ± 460.3 6586.1 7804.1 83.1
bat --color never --pager none -p EntityComponentStore.cs 7053.2 ± 492.3 6409.1 8076.1 82.0

bat 0.12.1, msvc build on Windows, in a regular cmd.exe console window.

I'd expect some slowdown but it seems like something buggy is happening. If I do the same on a cpp file, also ~1500 lines:

Command Mean [ms] Min [ms] Max [ms] Relative
cmd /c type ApiDispatchers.cpp 98.4 ± 10.0 83.6 115.9 1.0
bat --pager none ApiDispatchers.cpp 768.5 ± 88.2 677.9 995.4 7.8
bat --pager none -p ApiDispatchers.cpp 683.8 ± 128.9 549.4 959.0 6.9
bat --color never --pager none ApiDispatchers.cpp 480.1 ± 46.7 440.2 584.2 4.9
bat --color never --pager none -p ApiDispatchers.cpp 447.6 ± 42.5 414.6 542.0 4.5

Which is much more in line with what I'd expect. Is there something different about how the syntax highlighting for C# is handled? Is it all regexp or something else expensive based?

@vvuk
Copy link
Author

vvuk commented Oct 2, 2019

This isn't the C# file I used for benchmarking, but it's a long one (~3k loc) and behaves the same. It doesn't seem to be file dependent, just more obvious on longer files. CSharpCompilation.cs

@vvuk
Copy link
Author

vvuk commented Oct 2, 2019

I should say both of these are with --show-output to hyperfine. Without it, things are very different, unless I force --color always:

Command Mean [ms] Min [ms] Max [ms] Relative
cmd /c type EntityComponentStore.cs 14.6 ± 0.9 13.8 22.1 1.0
bat --pager none EntityComponentStore.cs 77.5 ± 1.4 75.8 83.1 5.3
bat --color always --pager none EntityComponentStore.cs 6209.7 ± 582.2 5519.8 7555.6 418.2

@eth-p
Copy link
Collaborator

eth-p commented Oct 2, 2019

I can confirm this bug on Windows 10 with bat 0.12.1.
I can also confirm this on macOS.

The following behaviours have been identified with bat and the file provided:

Command Choppy
bat file.cs Yes
bat file.cs -pp Yes
bat file.cs --color=never Yes
bat file.cs --color=never > NUL No
bat file.cs --color=never --decorations=never Yes
bat file.cs --color=never --decorations=never > file.txt Yes
bat file.cs --color=always --decorations=always > file.txt Yes

Trying with a large file of a different language (e.g. the saved HTML from here), I don't notice this issue in any of the above cases.

If we consider the commands used for the C# file in the above table, the only times where the issue wasn't present is when the simple printer is used (i.e. when the highlighter isn't used at all).

Given the above, I agree that this appears to be an issue specifically with C# syntax highlighting. Since I'm not too familiar with the Syntect library, @sharkdp is likely going to need to be the one to figure out the reason why this is happening.

@eth-p eth-p added bug Something isn't working syntax-highlighting labels Oct 2, 2019
Repository owner deleted a comment from parikshitgupta1 Oct 6, 2019
Repository owner deleted a comment from parikshitgupta1 Oct 6, 2019
@sharkdp
Copy link
Owner

sharkdp commented Oct 6, 2019

@vvuk Thank you for the detailed bug report!

If we consider the commands used for the C# file in the above table, the only times where the issue wasn't present is when the simple printer is used (i.e. when the highlighter isn't used at all).

Yes, exactly. Thank you for looking into this.

Given the above, I agree that this appears to be an issue specifically with C# syntax highlighting. Since I'm not too familiar with the Syntect library, @sharkdp is likely going to need to be the one to figure out the reason why this is happening.

There have been issues with the C# syntax in the past (e.g. trishume/syntect#63), but this seems to be a new issue (we already include the updated C# syntax - at least on master, where the problem is still present).

This should probably be reported upstream. The same thing happens when using syncat, which is bundled with syntect.

@keith-hall Notifying you, just in case you are interested (please let me know if you are not, in which case I am sorry and will stop pinging you!)

@keith-hall
Copy link
Collaborator

Most likely it is another instance of a regex pattern used (perhaps often) in the C# syntax definition which has poor performance under the Oniguruma regex engine with many (perhaps simple) inputs. Probably the best way to start debugging this would be to play around in syntect to temporarily add some timers to see which patterns are problematic and then identify if there is a way to tweak those for better performance and make a PR to the sublimehq/Packages repository with those improvements. I will endeavor to do this when I am able to make time, but it may not be soon, so anyone else whom wishes to experiment/investigate: please do :)

otherwise, potentially relevant (currently open) upstream issues are:

@vvuk
Copy link
Author

vvuk commented Oct 7, 2019

Ah, so I was going to look into this in syntect, however current master there (3.3.0) doesn't show the problem with syncat (this is on a good bit slower machine than the earlier benchmarks):

Command Mean [s] Min [s] Max [s] Relative
cargo run --example syncat C:\proj\tmp\CSharpCompilation.cs 2.122 ± 0.013 2.110 2.151 1.0
bat --color always --pager none c:\proj\tmp\CSharpCompilation.cs 12.697 ± 0.886 11.395 14.000 6.0

I'm not sure what version of syntect my bat was built with, but either this was fixed upstream already, or there's actually something that bat is doing that's causing the slowdown?

@vvuk
Copy link
Author

vvuk commented Oct 7, 2019

Oh I forgot to mention -- the above was a debug build of syntec. A release build takes 500ms.

@keith-hall
Copy link
Collaborator

keith-hall commented Oct 7, 2019

@vvuk just to check, are you testing syntect/syncat with the submodules as of the master branch on that repo, or have you updated them to match the version/state/commit which bat is using? (and rebuilt the syntaxset "dump"?)

@vvuk
Copy link
Author

vvuk commented Oct 7, 2019

@keith-hall with the submodules as of the master branch.

 870081c447a61df93290932e307ba5d1890f7e8e syntect
 18ddb271179e118cfc2dd83abf88b915b7328a25 testdata/InspiredGitHub.tmtheme (v1.3.0)
 fa6b8629c95041bf262d4c1dab95c456a0530122 testdata/Packages (v3154-4-gfa6b8629)
 bcd6234b4f5f96d3fd27db079268b5757053072a testdata/Solarized (1.5.11)
 2703e93f559e212ef3895edd10d861a4383ce93d testdata/spacegray (1.3.2-1-g2703e93)

That syntect commit (870081c) is tagged as v3.3.0, which is what bat is using. I didn't confirm that they're using the exact same bits though, my bat build was pulling syntect from cargo. I'll rebuild locally for a sanity check.

I didn't rebuild the syntaxset (just cloned syntect, and did a cargo build --examples)

@keith-hall
Copy link
Collaborator

syntect v3.3.0 and the master branch are both referencing a 2 year old Packages submodule (where bat is now using a recent version), so likely some change to the C# syntax definition since then has caused this difference

@vvuk
Copy link
Author

vvuk commented Oct 9, 2019

Definitely an issue with the C# syntax files. Using the C# syntaxes that bat is using (via -e -s), I'm getting the 12s times with syncat. There are a bunch of changes in the syntax file; I can attempt to narrow down which in particular is causing the problem.

@sharkdp sharkdp added upstream-error A bug in an upstream component and removed bug Something isn't working labels Nov 24, 2019
@sharkdp sharkdp added the help wanted Extra attention is needed label Dec 22, 2019
@vvuk
Copy link
Author

vvuk commented Jan 29, 2020

(I haven't had a chance to do the narrowing down unfortunately.)

For upstream reporting, should this be reported to the sublimehq/Packages repo, or to syntect? If sublime had the same performance issues I'd guess someone would have complained/noticed, so maybe this is a syntect issue?

@sharkdp sharkdp added this to the v0.14 milestone Apr 11, 2020
@sharkdp
Copy link
Owner

sharkdp commented Apr 11, 2020

I've looked into this today.

First, I tried to minimize the example file above. I found that lines like this:

return a(bbbbbbbbbbbbbbbbbbbbbbbbbbbb.c(dddddddddddddddddddddddddddddddddddd));

take a very long time to parse. This particular one takes ~500 ms and the parsing time depends on the length of the b…b and d…d strings (in a somewhat complex way).

Next, I took a look at the C#.sublime-syntax. The problematic pattern is in the line_of_code_in_no_semicolon context:

    - match: \((?=(?:[^,)(]*|\([^\)]*\))*,)
      scope: punctuation.section.group.begin.cs
      push:
        # [...]

My (completely non-expert) guess would be that the nested (something*)* pattern causes some kind of combinatorial explosion.

Maybe @keith-hall could take a look at this (only if you are interested, of course)? 😊

If this pattern is disabled, C# files render reasonably fast:

bat CSharpCompilation.cs --color=always
Command Mean [s] Min [s] Max [s] Relative
master 9.555 ± 0.038 9.487 9.615 1.00
modified syntax 0.306 ± 0.006 0.297 0.319 1.00

@sharkdp
Copy link
Owner

sharkdp commented Apr 11, 2020

If I enter this pattern into regex101 and use the appropriate part of the line above as a test string, regex101 actually says "Catastrophic backtracking has been detected".

https://regex101.com/r/O2AcN7/1

@keith-hall
Copy link
Collaborator

keith-hall commented Apr 11, 2020

may I suggest changing that pattern to \((?=(?:[^,)(]|\([^\)]*\))*,), it has the same functionality but without the catastrophic backtracking. I'm not sure when I'll get time to open a PR on the sublimehq/Packages repo, so if this change works for you, feel free to beat me to it :)

(I proved that the C# syntax tests aren't adversely affected, nor is the performance in Sublime Text)

@sharkdp
Copy link
Owner

sharkdp commented Apr 11, 2020

Fantastic, thank you very much for the fast response!

This leads to the same time (slightly faster here because I was using a user-specified cache-file above - here, the patched C# syntax is built into the bat binary):

Command Mean [s] Min [s] Max [s] Relative
bat CSharpCompilation.cs --color=always 0.256 ± 0.003 0.251 0.263 1.00

I'm going to open a PR in sublimehq/Packages.

Edit: done: sublimehq/Packages#2331

sharkdp added a commit to sharkdp/Packages that referenced this issue Apr 11, 2020
This simplifies a pattern in the C# syntax while leaving it functionally unchanged.

The old pattern susceptible to catastrophic backtracking due to a combinatorial explosion caused by
the two equivalent `*` modifiers.

Sublime text didn't really show any performance problems but we did experience problems downstream
in syntect/bat: sharkdp/bat#677

This change was originally proposed by @keith-hall here:
sharkdp/bat#677 (comment)

Performance in Sublime text is not adversely affected.
sharkdp added a commit to sharkdp/Packages that referenced this issue Apr 11, 2020
This simplifies a pattern in the C# syntax while leaving it functionally unchanged.

The old is pattern susceptible to catastrophic backtracking due to a combinatorial explosion caused
by the inner and outer `*` modifier.

Sublime text didn't really show any performance problems, but we did experience problems downstream
in syntect/bat: sharkdp/bat#677

This change was originally proposed by @keith-hall here:
sharkdp/bat#677 (comment)

Performance in Sublime text is not adversely affected.
sharkdp added a commit to sharkdp/Packages that referenced this issue Apr 11, 2020
This simplifies a pattern in the C# syntax while leaving it functionally unchanged.

The old pattern is susceptible to catastrophic backtracking due to a combinatorial explosion caused
by the inner and outer `*` modifier.

Sublime text didn't really show any performance problems, but we did experience problems downstream
in syntect/bat: sharkdp/bat#677

This change was originally proposed by @keith-hall here:
sharkdp/bat#677 (comment)

Performance in Sublime text is not adversely affected.
@sharkdp
Copy link
Owner

sharkdp commented Apr 22, 2020

This has been fixed in bat v0.14.

wbond pushed a commit to sublimehq/Packages that referenced this issue Jul 17, 2020
This simplifies a pattern in the C# syntax while leaving it functionally unchanged.

The old pattern is susceptible to catastrophic backtracking due to a combinatorial explosion caused
by the inner and outer `*` modifier.

Sublime text didn't really show any performance problems, but we did experience problems downstream
in syntect/bat: sharkdp/bat#677

This change was originally proposed by @keith-hall here:
sharkdp/bat#677 (comment)

Performance in Sublime text is not adversely affected.
mitranim pushed a commit to mitranim/Packages that referenced this issue Mar 25, 2022
This simplifies a pattern in the C# syntax while leaving it functionally unchanged.

The old pattern is susceptible to catastrophic backtracking due to a combinatorial explosion caused
by the inner and outer `*` modifier.

Sublime text didn't really show any performance problems, but we did experience problems downstream
in syntect/bat: sharkdp/bat#677

This change was originally proposed by @keith-hall here:
sharkdp/bat#677 (comment)

Performance in Sublime text is not adversely affected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed syntax-highlighting upstream-error A bug in an upstream component
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants