Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler service: file parsing should use caching. #14848

Open
safesparrow opened this issue Mar 4, 2023 · 15 comments
Open

Compiler service: file parsing should use caching. #14848

safesparrow opened this issue Mar 4, 2023 · 15 comments

Comments

@safesparrow
Copy link
Contributor

safesparrow commented Mar 4, 2023

Is your feature request related to a problem? Please describe.

IDE interactions are slow, partly because the same files with unmodified contents are parsed multiple times, with no caching mechanism.

Given a project with three files: A.fs, B.fs, C.fs, every time I:

  1. Modify A.fs,
  2. Open C.fs,
    file B.fs is parsed again. Files A.fs and C.fs are probably parsed again too, although I'm not sure exactly when they are, and the main point is that all files for which typechecking is performed, are parsed again.

Note that I've only tested this in Rider.

Describe the solution you'd like

Introduce a caching mechanism with size configurable by consumers of FCS, which avoids parsing the same files again if nothing relevant has changed.

Additional context

Note that there is a parsing cache, but it's only utilised by the dedicated file parsing endpoint, and is not used when the typechecking endpoint is called instead.

Also note that there seem to be multiple ways in which parsing is called, with different stack traces.
The cache should address all of them if possible.

Also note that if this ends up using SourceText.GetHashCode, that method is quite slow as it has to scan the whole source.
It might be worth considering calculating the hash once upon creation of SourceText.

Related discussion I started: #14199

Related: Idea of parallel parsing in the FCS.
I briefly looked at the possibility of parsing project files in parallel in FCS. However, the complexity of IncrementalBuilder makes it rather difficult.
Also, once we have a caching mechanism with a large-enough size, the marginal gain of parallel parsing would be limited to speeding up the first-time typechecking after starting the IDE.

@majocha
Copy link
Contributor

majocha commented Mar 5, 2023

Hi @safesparrow, can you try to run my draft #14849? This is VS only now.

@safesparrow
Copy link
Contributor Author

safesparrow commented Mar 5, 2023

@majocha Thanks for looking.

  1. I'm using Rider with a locally built FCS, so can't easily test with VS.
  2. AFAICS the PR only increases size of existing caches, but doesn't change the way they are used. As mentioned above, the existing parsing cache is only used in one case, when a dedicated parseFile request is received by the checker. However, when type-checking a file is requested, that will trigger type-checking of all preceeding files, which will in turn trigger parsing of those files - and that mechanism does not involve any caching. The parsing happens in this line: https://github.com/JetBrains/fsharp/blob/35e742b334353de45b11d1d5c4cfecb02e6e58fe/src/Compiler/Service/IncrementalBuild.fs#L484 . I think there is also a third way parsing can be triggered, but haven't looked that closely.
  3. I understand that there was some worry about large(r) cache sizes vs memory usage and GC pressure. I don't necessarily think that 100 ASTs take up a significant amount of memory, but would be nice to do some tests and make an informed decision. Maybe size of 1000 is fine too, which would be great - but if not, leaving it to be configured by end-users/IDEs might be best.

@majocha
Copy link
Contributor

majocha commented Mar 5, 2023

AFAICS the PR only increases size of existing caches, but doesn't change the way they are used.

It does do a little bit more 😉, cache sizes are increased just to test it.

Anyway, looks like Rider doesn't use FCS endpoints like TryGetRecentCheckResultsForFile but maintains an internal version of the service and does it's own thing with it?

@majocha
Copy link
Contributor

majocha commented Mar 5, 2023

It should reuse cached parse results throughout now when TryGetRecentCheckResultsForFile receives an already parsed file version.

@safesparrow
Copy link
Contributor Author

safesparrow commented Mar 5, 2023

It does do a little bit more 😉, cache sizes are increased just to test it.

Can you elaborate on whether it affects this particular callsite from point 2. above:
https://github.com/JetBrains/fsharp/blob/35e742b334353de45b11d1d5c4cfecb02e6e58fe/src/Compiler/Service/IncrementalBuild.fs#L484
and deeper down this piece of code: https://github.com/JetBrains/fsharp/blob/35e742b334353de45b11d1d5c4cfecb02e6e58fe/src/Compiler/Service/IncrementalBuild.fs#L157

As I understand and as tested in Rider, this callsite is responsible for most indirect parsing actions when type-checking of a chain of files is triggered.
Note that I'm describing a situation when the IDE asks to type-check a file somewhere down in the project - through the BoundModel infrastructure this effectively does sequential parsing and type-checking of all files above it that are deemed no longer valid.
So again, given three files A.fs, B.fs, C.fs:

  1. a change to A.fs invalidates all three files.
  2. When ParseAndCheckFileInProject(C.fs) is then called, it will go through BoundModels of all the invalidated files and parse (here ) and type-check them.

Maybe I'm missing something, but I don't see anything in the PR that would use the cache in 2.

@majocha
Copy link
Contributor

majocha commented Mar 5, 2023

@safesparrow I'll have to look into it when I'm back at my PC 🙂. From what I see , what I tinkered with was much closer to the surface..

@majocha
Copy link
Contributor

majocha commented Mar 5, 2023

So IIUC the parse cache we have in service.fs is almost useless, because GetCheckResultsBeforeFileInProject sends us on a path that doesn't use it, and we need caching somewhere at a deeper level?

@safesparrow
Copy link
Contributor Author

So IIUC the parse cache we have in service.fs is almost useless, because GetCheckResultsBeforeFileInProject sends us on a path that doesn't use it, and we need caching somewhere at a deeper level?

Almost, yes. Specifically, Rider uses the ParseAndCheckFileInProject endpoint instead and that triggers the cache-free parsing. Not sure if GetCheckResultsBeforeFileInProject results in the same, probably yes.

@majocha
Copy link
Contributor

majocha commented Mar 5, 2023

This caching mechanism does not seem to be working at all:
Compiler/Service/IncrementalBuild.fs#L116

This is never ever hit
Compiler/Service/IncrementalBuild.fs#L174

Instead a new SyntaxTree is being instantiated on every edit.

Some parse calls are there for diagnostics and there is a lot of setting things up involved (side-effects, too, I believe). It does not look practical to cache every call in a centralized way.

@safesparrow
Copy link
Contributor Author

safesparrow commented Mar 6, 2023

Instead a new SyntaxTree is being instantiated on every edit.
Some parse calls are there for diagnostics and there is a lot of setting things up involved (side-effects, too, I believe). It does not look practical to cache every call in a centralized way.

I think we need to get on the same page.
This issue is primarily about parsing files as part of a chain of parse+typecheck operations. Please have a look at the example in the description with the three file project.

Parsing on edit is different. There is little point caching those requests as the chance of seeing exactly the same source code upon multiple edits is small.
Compare it to parsing all files in the project after only one of them has changed - all but one request could be cached.
I believe the "active file edit" scenario is also why some endpoints have the cache argument which is sometimes set to false by the caller.

@auduchinok
Copy link
Member

It would be nice if ParseAndCheckFileInProject could cache previous files trees in some cache indeed.

I believe the "active file edit" scenario is also why some endpoints have the cache argument which is sometimes set to false by the caller.

Yes, exactly. We've added it when started doing additional parsing that we didn't want to be cached anywhere.

@majocha
Copy link
Contributor

majocha commented Mar 6, 2023

This issue is primarily about parsing files as part of a chain of parse+typecheck operations.

@safesparrow that's exactly the scenario I stepped through in the debugger. I modify file A and see SyntaxTree ctored for file B each time. weakCache is always None in Parse method, although I'd expect it to be used as it is being set for some reason.

@safesparrow
Copy link
Contributor Author

SyntaxTree object might be constructed, but hopefully the actual parsing (SyntaxTree.Parse) happens only if I open B.fs or C.fs and doesn't happen on every edit of A.fs - right? That's what I observed. Parsing all the files every time A.fs is modified would be even worse.

I don't think the weakCache mechanism is relevant here.
It's not really a cache if I understand it correctly. It can only be used if nothing above changed as it is not keyed in any way and the whole SyntaxTree object, alongside the weakCache, is destroyed when things above change.

You'll want the SyntaxTree.Parse method to use a cache similar to the one in service.fs, keyed only by file's contents and certain parsing options, but not dependent upon contents of files above it.

@majocha
Copy link
Contributor

majocha commented Mar 6, 2023

SyntaxTree object might be constructed, but hopefully the actual parsing (SyntaxTree.Parse) happens only if I open B.fs or C.fs and doesn't happen on every edit of A.fs - right? That's what I observed. Parsing all the files every time A.fs is modified would be even worse.

Yep, that's what I meant.

You'll want the SyntaxTree.Parse method to use a cache similar to the one in service.fs, keyed only by file's contents and certain parsing options, but not dependent upon contents of files above it.

Yes, I just tried this. I was hoping to use FSharpSource.TimeStamp to avoid hashing file contents, but it's useless. It changes on every call, not on actual source change like one would expect.

@majocha
Copy link
Contributor

majocha commented Mar 6, 2023

@safesparrow I did a quick and dirty draft to start something. This should already work.

OK, maybe it's placebo but editing service.fs no longer feels like wading through mud.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: New
Development

No branches or pull requests

4 participants