Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JS: Add ECMAScript 2024 v Flag Operators for Regex Parsing #18899

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

Napalys
Copy link
Contributor

@Napalys Napalys commented Feb 28, 2025

This pull request adds support for parsing ECMAScript 2024 v flag operators, including:

  • Nested Classes: Enables using nested character classes in regexes.
    Example: /[[abc][cz]]/v
  • Intersection (&&): Matches characters common to both sets.
    Example: /[[abc]&&[cz]]/v
  • Subtraction (--): Removes characters from a set.
    Example: /[[abc]--[cz]]/v
    Mixing operations at the same level is not allowed:
    • Invalid: /[[abc]&&[cz]--[zz]]/v
    • Valid: /[[abc]&&[[cz]--[zz]]]/v
  • Union: Combines multiple sets.
    Example: /[[abc][cz]]/v
  • Quoted Strings (\q{}): Allows matching exact sequences.
    Example: /[\q{ab|cb|db}]/v

Commit by commit review encouraged.

Useful links:

With correct parsing, this no longer produces an false positive in Closes #18854.

@github-actions github-actions bot added the JS label Feb 28, 2025
@Napalys Napalys force-pushed the js/ecma-2024-regex branch from 84fddf1 to 94adaf8 Compare March 2, 2025 15:56
@Napalys Napalys force-pushed the js/ecma-2024-regex branch 2 times, most recently from 605456f to f93419e Compare March 2, 2025 18:24
@Napalys Napalys changed the title JS: WIP: Ecma 2024 regex JS: Add ECMAScript 2024 v Flag Operators for Regex Parsing Mar 3, 2025
@Napalys Napalys force-pushed the js/ecma-2024-regex branch from 6fe7753 to 430514b Compare March 3, 2025 12:00
@Napalys Napalys marked this pull request as ready for review March 3, 2025 13:17
@Copilot Copilot bot review requested due to automatic review settings March 3, 2025 13:17
@Napalys Napalys requested a review from a team as a code owner March 3, 2025 13:17
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This pull request introduces support for ECMAScript 2024 regex constructs under the new "v" flag. Key changes include:

  • New AST node classes for character class operations (Subtraction, QuotedString, Intersection, Union)
  • Enhancements to RegExpParser to conditionally enable nested character classes, new operators, and quoted string parsing with a fallback mechanism when errors are encountered
  • New test inputs covering quoted strings, unions, intersections, subtractions, and nested character classes

Reviewed Changes

File Description
javascript/extractor/src/com/semmle/js/ast/regexp/CharacterClassSubtraction.java New AST node for subtraction operator in character classes
javascript/extractor/src/com/semmle/js/ast/regexp/CharacterClassQuotedString.java New AST node for handling quoted string escapes
javascript/extractor/src/com/semmle/js/ast/regexp/CharacterClassIntersection.java New AST node for intersection operator in character classes
javascript/extractor/src/com/semmle/js/ast/regexp/CharacterClassUnion.java New AST node for union operator in character classes
javascript/extractor/src/com/semmle/js/parser/RegExpParser.java Extended parser functionality to support the new "v" flag and corresponding regex operations
javascript/extractor/src/com/semmle/js/extractor/ASTExtractor.java and RegExpExtractor.java Updated extraction logic to accommodate new AST node types and conditional flag handling

Copilot reviewed 31 out of 31 changed files in this pull request and generated 2 comments.

Tip: Leave feedback on Copilot's review comments with the 👎 and 👍 buttons to help improve review quality. Learn more

@Napalys Napalys force-pushed the js/ecma-2024-regex branch from 78aa5dc to 9e1f050 Compare March 3, 2025 13:38
Copy link
Contributor

@asgerf asgerf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work! I have a couple of comments to keep you busy during the week 😄

@Napalys Napalys force-pushed the js/ecma-2024-regex branch from d6df34e to 8558ead Compare March 5, 2025 08:33
@Napalys Napalys force-pushed the js/ecma-2024-regex branch from 8558ead to 8099423 Compare March 5, 2025 08:34
@Napalys Napalys requested a review from asgerf March 5, 2025 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JavaScript: false positive with unicode sets for character classes that contain brackets
2 participants