Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New regex tester (runs in a Docker container) #6

Merged
merged 20 commits into from
May 4, 2020
Merged

Conversation

aureliojargas
Copy link
Owner

Txt2regex needs to know regex-related information for each program it
supports. For example: the list of metacharacters, how to escape a
metacharacter to match it literally and the availability of POSIX
character classes.

Instead of relying in documentation to get that information, there's a
new tests/regex-tester.sh script that calls the real programs with
specially crafted regexes and sample texts, verifying how those programs
behave in "real life".

To have a trackable and public record, the output of this tester is also
saved to this repository, in a readable and grepable plain text file.
This way we can detect changes in behavior when a program version is
updated.

To avoid having to install specific software in the developer machine, a
Docker image is used to isolate all the necessary software and this
script is run inside that image (via make test-regex).

  • Remove all the obsoleted files from the old tester:

    • test-suite/javascript.html
    • test-suite/procmail-re-test.sh
    • test-suite/result.txt
    • test-suite/test-suite.sh
  • New tests/regex-tester.sh script

  • New tests/regex-tester.txt script output record

  • New tests/Dockerfile image with all the txt2regex-supported programs
    installed and ready to be used

  • New make target: test-regex to run the tester and save to the output
    file

  • New make target: test-regex-shell to enter the interactive shell
    inside the test container.

Txt2regex needs to know regex-related information for each program it
supports. For example: the list of metacharacters, how to escape a
metacharacter to match it literally, availability of POSIX character
classes.

Instead of relying in documentation to get that information, there's a
new `tests/regex-tester.sh` script that calls the real programs with
specially crafted regexes and sample texts, verifying how those programs
behave in "real life".

To have a permanent record, the output of this script is also saved to
this repository. This way we can detect changes in behavior when a
program version is updated.

To avoid having to install specific software in the developer machine, a
Docker image is used to isolate all the necessary software and this
script is run inside that image (via `make test-regex`).

- Remove all the obsoleted files from the old tester:
  - test-suite/javascript.html
  - test-suite/procmail-re-test.sh
  - test-suite/result.txt
  - test-suite/test-suite.sh

- New `tests/regex-tester.sh` script

- New `tests/regex-tester.txt` script output record

- New `tests/Dockerfile` image with all the txt2regex-supported programs
  installed and ready to be used

- New make target: `test-regex` to run the tester and save to the output
  file

In this commit, the new regex tester is supporting all the programs that
the previous `test-suite.sh` script used to support. New programs will
be added in following commits.
This new option accepts a program name to be skipped.

Useful to test "all but one". This will be handy in the next commit,
when adding support for vi.
Thanks Mario Domenech Goulart for the magical command and guidance.
Thanks Mario Domenech Goulart for the magical command and guidance.
All new topic about the new regex tester.

Now the list of programs versions is the output of a command, and that
command is checked to be correct by clitest (which is run in the CI). In
other words: that list will always be up-to-date now.
This could mask problems, since it could be expanded to some special
char in both sides (regex and string), giving false positives.

This is also a metacharacter for border in some tools.
- Always use raw strings for the "string" argument, if the program
  supports it.
- As a fallback, use the new `escape()` function to escape the '\'
  chars.

Note that the "regex" argument should not be escaped, since the goal
of the "brute force" tests is exactly discovering how many '\' are
necessary to properly match a pattern.
It was a mistake not enforcing full matches since the start. Partial
matches are a problem when test_type=match.

This is just an intermediary step to make the .txt diff easier to see
if there are behavior changes (none detected on visual inspection).

The next commit will change all the actual regexes to be fully
anchored (explicit is better than implicit) and the .txt file should
not change in behavior (only the regexes will have the $ added)
Now add $ to the end of all the test regexes.
See parent commit for details.
Now add ^ to the start of all the test regexes.
See "part 1" commit for details.
Now that all the regexes are anchored, some tests became irrelevant.
Also remove the tests for ^ and $ being at regex start/end, since they
do not fit in the new "^...$" format for all regexes.
@aureliojargas aureliojargas merged commit 3def062 into master May 4, 2020
@aureliojargas aureliojargas deleted the regex-tester branch May 4, 2020 22:20
aureliojargas added a commit that referenced this pull request Sep 13, 2022
Since #6, the regex
rules are based on the results of actually running the programs in a
Docker container, using a special script.

So now it's mandatory to first add the new program to that container
and properly test it. The process became even more complex as before,
but now it's reliable and tested.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant