Skip to content

Commit

Permalink
Remove duplicated chars from user input in []
Browse files Browse the repository at this point in the history
Since `[aabbcc]` is the same as `[abc]`, txt2regex will remove the
repeated characters the user may have typed to put inside a list.

It's safe to do it because in `getCharList()` we only expect a list of
literal characters, not ranges or negated list identifier `^`.
  • Loading branch information
aureliojargas committed May 10, 2020
1 parent 4b41298 commit 4b98e2b
Show file tree
Hide file tree
Showing 3 changed files with 31 additions and 1 deletion.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,8 @@ The format is based on [Keep a Changelog].
- lex regexes: now using GNU flex
- PHP regexes: switch from old `ereg` to `preg` (PCRE)
- Changed the default programs: +egrep +grep -perl -php -postgres
- Remove repeated characters inside a list `[]` (if the user has typed
`abbbca`, make it `[abc]`)
- Now `--showmeta` also shows the version for each program
- Now the "!! not supported" legend only appears when there are
unsupported metacharacters in the current regex
Expand Down
11 changes: 11 additions & 0 deletions tests/features.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,17 @@ $ txt2regex --prog egrep --history '215¤a5!6'
$
```

## User input: Remove duplicated chars from [] — getCharList()

When informing literal characters to be put inside a `[]` list, txt2regex will deduplicate those characters, because the repetition in this case is not meaningful (`[aabbcc]` is the same as `[abc]`).

```console
$ txt2regex --prog egrep --history '24¤aabbbcab'
Regex egrep: [abc]

$
```

## User input: Rearrange [] special elements — getCharList()

When informing literal characters to be put inside a `[]` list, some special cases have to be handled:
Expand Down
19 changes: 18 additions & 1 deletion txt2regex.sh
Original file line number Diff line number Diff line change
Expand Up @@ -621,6 +621,19 @@ charInText() {
return 1
}

# Remove all duplicated chars from the $1 text
uniqChars() {
local text="$1"
local text_uniq=''
local i

for ((i = 0; i < ${#text}; i++)); do
charInText "${text:$i:1}" "$text_uniq" ||
text_uniq="$text_uniq${text:$i:1}"
done
printf '%s\n' "$text_uniq"
}

# Escape each $1 in $2 using $3
escapeChars() {
local special_chars="$1"
Expand Down Expand Up @@ -987,7 +1000,6 @@ getChar() {
F_ESCCHAR=1
}

#TODO 1st of all, take out repeated chars
getCharList() {
gotoxy $x_prompt2 $y_prompt

Expand All @@ -999,6 +1011,10 @@ getCharList() {
doNextHistArg
uin=$histarg
fi

# dedup is safe because $uin contains only literal chars (no ranges)
uin="$(uniqChars "$uin")"

uins="${uins}¤$uin"

# putting not special chars in not special places: [][^-]
Expand All @@ -1009,6 +1025,7 @@ getCharList() {
# if any $1, negated list
[ -n "$1" ] && uin="^$uin"

# make it a list
uin="[$uin]"
F_ESCCHARLIST=1
}
Expand Down

0 comments on commit 4b98e2b

Please sign in to comment.