Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(linter): a new multi-file analysis runtime #9383

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

branchseer
Copy link
Member

@branchseer branchseer commented Feb 26, 2025

fixes #9077, #7996, #8631.

Results of running oxlint -c oxlint.json on AFFiNE@97cc814a

1.3x slower, takes 0.86x memory.

Before

$ /usr/bin/time -al ../oxc/target/release/oxlint_main -c oxlint.json
[lint error messages omitted]
Found 0 warnings and 3 errors.
Finished in 588ms on 4990 files with 135 rules using 10 threads.
        0.61 real         3.69 user         0.94 sys
           402784256  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               31315  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   2  voluntary context switches
                6127  involuntary context switches
         42060793125  instructions retired
         13307821784  cycles elapsed
           398427968  peak memory footprint

After

$ /usr/bin/time -al ../oxc/target/release/oxlint -c oxlint.json
[lint error messages omitted]
Found 0 warnings and 6 errors.
Finished in 764ms on 4990 files with 135 rules using 10 threads.
        0.77 real         4.07 user         1.49 sys
           348061696  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               31412  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   5  voluntary context switches
                6625  involuntary context switches
         44886241869  instructions retired
         15977031204  cycles elapsed
           343606976  peak memory footprint

Copy link

graphite-app bot commented Feb 26, 2025

How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

@github-actions github-actions bot added A-linter Area - Linter C-bug Category - Bug labels Feb 26, 2025
Copy link

codspeed-hq bot commented Feb 26, 2025

CodSpeed Performance Report

Merging #9383 will not alter performance

Comparing branchseer:fix-linter-module-graph (49c7f18) with main (bd90ce6)

Summary

✅ 39 untouched benchmarks

dedup pending paths

Use OsStr instead of Path for faster hashing

move file reads to io thread pool

Revert "move file reads to io thread pool"

This reverts commit f4deb10759bd252b0d4bf20c630ed517fd2e582d.

simplify module task scheduling

group

wip

wip

wip

code cleanup
@branchseer branchseer force-pushed the fix-linter-module-graph branch from 5774fc2 to ca69203 Compare March 8, 2025 03:57
@branchseer branchseer changed the title fix(lint): build module graph before running lints fix(linter): build module graph in groups Mar 8, 2025
@github-actions github-actions bot added the A-cli Area - CLI label Mar 8, 2025
@branchseer branchseer marked this pull request as ready for review March 8, 2025 04:10
@branchseer branchseer requested a review from Boshen March 8, 2025 04:11
@Boshen Boshen self-assigned this Mar 8, 2025
// deeper paths are more likely to be leaf modules (src/very/deep/path/baz.js is likely to have
// fewer dependencies than src/index.js).
// This heuristic is not always true, but it works well enough for real world codebases.
self.paths.par_sort_unstable_by(|a, b| Path::new(b).cmp(Path::new(a)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort by Path is going to be slow. Maybe sort by number of / or \\ with sort_by_cached_key?

/// A module ready for linting. A `EntryModule` is generated for each path in `runtime.paths`
///
/// It's basically the same as `ProcessedModule`, except `content` is non-Option.
struct EntryModule {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entry is a really vague term ... We need a longer and more descriptive name.

// only stores entry modules in current group
let mut entry_modules: Vec<EntryModule> = Vec::with_capacity(entry_group_size);

// downgrade self to immutable reference so it can be shared among spawned tasks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording "downgrade" feels weird, may "set to ..."?

///
/// Modules with special extensions such as .vue could contain multiple source sections (see `PartialLoader::PartialLoader`).
/// Plain ts/js modules have one section. Using `SmallVec` to avoid allocations for plain modules.
section_module_records: SmallVec<[Result<ResolvedModuleRecord, Vec<OxcDiagnostic>>; 1]>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term section is vague, what is source sections?

let mut encountered_paths =
FxHashSet::<Arc<OsStr>>::with_capacity_and_hasher(me.paths.len(), FxBuildHasher);

let mut module_relationships =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is module_relationships? Probably need a better naming and a comment.

{
let mut loaded_modules = record.loaded_modules.write().unwrap();
for (specifier, dep_path) in requested_module_paths {
// TODO: revise how to store multiple sections in loaded_modules
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's figure out out the TODO, it's unlikely we'll revisit this part of code and fix it later.

.par_bridge()
.map_with(self.resolver.as_ref().unwrap(), |resolver, specifier| {
resolver.resolve(dir, specifier).ok().map(|r| (specifier, r))
.par_iter()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

par_iter is probably not required anymore, since there's no further processing other than resolver.resolve. Spawning a new thread here may be more expensive than resolver.resolve it self. We need the threads for heavy duties such as parse + lint.

@Boshen Boshen changed the title fix(linter): build module graph in groups feat(linter): a new multi-file analysis runtime Mar 8, 2025
@github-actions github-actions bot added the C-enhancement Category - New feature or request label Mar 8, 2025
let mut module_relationships =
Vec::<(Arc<OsStr>, SmallVec<[Vec<(/*specifier*/ CompactStr, Arc<OsStr>)>; 1]>)>::new();

let (tx_resolve_output, rx_resolve_output) = mpsc::channel::<ModuleProcessOutput>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best to write an algorithm overview here, because there are so many loops and multi-thread logic here.

}

// Writing to `loaded_modules` based on `module_relationships`
module_relationships.par_drain(..).for_each(|(path, requested_module_paths)| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why par_drain? I'm not seeing any heavy processing, spawning threads may be slower than the acutaly work done here.

return;
}
let records = &modules_by_path[&path];
assert_eq!(records.len(), requested_module_paths.len());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using assert_eq will crash in release build, we probably need a panic message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cli Area - CLI A-linter Area - Linter C-bug Category - Bug C-enhancement Category - New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

linter: no-cycle rule is unstable
2 participants