-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
slow output, with pauses, with c# source code #677
Comments
This isn't the C# file I used for benchmarking, but it's a long one (~3k loc) and behaves the same. It doesn't seem to be file dependent, just more obvious on longer files. CSharpCompilation.cs |
I should say both of these are with
|
I can confirm this bug on Windows 10 with bat The following behaviours have been identified with bat and the file provided:
Trying with a large file of a different language (e.g. the saved HTML from here), I don't notice this issue in any of the above cases. If we consider the commands used for the C# file in the above table, the only times where the issue wasn't present is when the simple printer is used (i.e. when the highlighter isn't used at all). Given the above, I agree that this appears to be an issue specifically with C# syntax highlighting. Since I'm not too familiar with the Syntect library, @sharkdp is likely going to need to be the one to figure out the reason why this is happening. |
@vvuk Thank you for the detailed bug report!
Yes, exactly. Thank you for looking into this.
There have been issues with the C# syntax in the past (e.g. trishume/syntect#63), but this seems to be a new issue (we already include the updated C# syntax - at least on This should probably be reported upstream. The same thing happens when using @keith-hall Notifying you, just in case you are interested (please let me know if you are not, in which case I am sorry and will stop pinging you!) |
Most likely it is another instance of a regex pattern used (perhaps often) in the C# syntax definition which has poor performance under the Oniguruma regex engine with many (perhaps simple) inputs. Probably the best way to start debugging this would be to play around in syntect to temporarily add some timers to see which patterns are problematic and then identify if there is a way to tweak those for better performance and make a PR to the sublimehq/Packages repository with those improvements. I will endeavor to do this when I am able to make time, but it may not be soon, so anyone else whom wishes to experiment/investigate: please do :) otherwise, potentially relevant (currently open) upstream issues are: |
Ah, so I was going to look into this in syntect, however current master there (3.3.0) doesn't show the problem with
I'm not sure what version of |
Oh I forgot to mention -- the above was a debug build of |
@vvuk just to check, are you testing syntect/syncat with the submodules as of the master branch on that repo, or have you updated them to match the version/state/commit which |
@keith-hall with the submodules as of the master branch.
That syntect commit (870081c) is tagged as v3.3.0, which is what bat is using. I didn't confirm that they're using the exact same bits though, my bat build was pulling syntect from cargo. I'll rebuild locally for a sanity check. I didn't rebuild the syntaxset (just cloned syntect, and did a cargo build --examples) |
syntect v3.3.0 and the master branch are both referencing a 2 year old Packages submodule (where bat is now using a recent version), so likely some change to the C# syntax definition since then has caused this difference |
Definitely an issue with the C# syntax files. Using the C# syntaxes that |
(I haven't had a chance to do the narrowing down unfortunately.) For upstream reporting, should this be reported to the |
I've looked into this today. First, I tried to minimize the example file above. I found that lines like this: return a(bbbbbbbbbbbbbbbbbbbbbbbbbbbb.c(dddddddddddddddddddddddddddddddddddd)); take a very long time to parse. This particular one takes ~500 ms and the parsing time depends on the length of the Next, I took a look at the - match: \((?=(?:[^,)(]*|\([^\)]*\))*,)
scope: punctuation.section.group.begin.cs
push:
# [...] My (completely non-expert) guess would be that the nested Maybe @keith-hall could take a look at this (only if you are interested, of course)? 😊 If this pattern is disabled, C# files render reasonably fast: bat CSharpCompilation.cs --color=always
|
If I enter this pattern into regex101 and use the appropriate part of the line above as a test string, regex101 actually says "Catastrophic backtracking has been detected". |
may I suggest changing that pattern to (I proved that the C# syntax tests aren't adversely affected, nor is the performance in Sublime Text) |
Fantastic, thank you very much for the fast response! This leads to the same time (slightly faster here because I was using a user-specified cache-file above - here, the patched C# syntax is built into the
I'm going to open a PR in sublimehq/Packages. Edit: done: sublimehq/Packages#2331 |
This simplifies a pattern in the C# syntax while leaving it functionally unchanged. The old pattern susceptible to catastrophic backtracking due to a combinatorial explosion caused by the two equivalent `*` modifiers. Sublime text didn't really show any performance problems but we did experience problems downstream in syntect/bat: sharkdp/bat#677 This change was originally proposed by @keith-hall here: sharkdp/bat#677 (comment) Performance in Sublime text is not adversely affected.
This simplifies a pattern in the C# syntax while leaving it functionally unchanged. The old is pattern susceptible to catastrophic backtracking due to a combinatorial explosion caused by the inner and outer `*` modifier. Sublime text didn't really show any performance problems, but we did experience problems downstream in syntect/bat: sharkdp/bat#677 This change was originally proposed by @keith-hall here: sharkdp/bat#677 (comment) Performance in Sublime text is not adversely affected.
This simplifies a pattern in the C# syntax while leaving it functionally unchanged. The old pattern is susceptible to catastrophic backtracking due to a combinatorial explosion caused by the inner and outer `*` modifier. Sublime text didn't really show any performance problems, but we did experience problems downstream in syntect/bat: sharkdp/bat#677 This change was originally proposed by @keith-hall here: sharkdp/bat#677 (comment) Performance in Sublime text is not adversely affected.
This has been fixed in bat v0.14. |
This simplifies a pattern in the C# syntax while leaving it functionally unchanged. The old pattern is susceptible to catastrophic backtracking due to a combinatorial explosion caused by the inner and outer `*` modifier. Sublime text didn't really show any performance problems, but we did experience problems downstream in syntect/bat: sharkdp/bat#677 This change was originally proposed by @keith-hall here: sharkdp/bat#677 (comment) Performance in Sublime text is not adversely affected.
This simplifies a pattern in the C# syntax while leaving it functionally unchanged. The old pattern is susceptible to catastrophic backtracking due to a combinatorial explosion caused by the inner and outer `*` modifier. Sublime text didn't really show any performance problems, but we did experience problems downstream in syntect/bat: sharkdp/bat#677 This change was originally proposed by @keith-hall here: sharkdp/bat#677 (comment) Performance in Sublime text is not adversely affected.
Something strange is going on. On a 1500-line .cs file, bat takes ~7s to list the file.
--pager none
, both with and without-p
(not much difference with -p). I can see a pause after every1-2 dozen lines are output.--colors never
also doesn't make a difference (bench viahyperfine
):cmd /c type EntityComponentStore.cs
bat --pager none EntityComponentStore.cs
bat --pager none -p EntityComponentStore.cs
bat --color never --pager none EntityComponentStore.cs
bat --color never --pager none -p EntityComponentStore.cs
bat 0.12.1, msvc build on Windows, in a regular cmd.exe console window.
I'd expect some slowdown but it seems like something buggy is happening. If I do the same on a cpp file, also ~1500 lines:
cmd /c type ApiDispatchers.cpp
bat --pager none ApiDispatchers.cpp
bat --pager none -p ApiDispatchers.cpp
bat --color never --pager none ApiDispatchers.cpp
bat --color never --pager none -p ApiDispatchers.cpp
Which is much more in line with what I'd expect. Is there something different about how the syntax highlighting for C# is handled? Is it all regexp or something else expensive based?
The text was updated successfully, but these errors were encountered: