Skip to content

grep: honor whitespace escapes in BRE#37

Closed
wondr-wclabs wants to merge 1 commit into
uutils:mainfrom
wondr-wclabs:codex/bre-whitespace-escapes
Closed

grep: honor whitespace escapes in BRE#37
wondr-wclabs wants to merge 1 commit into
uutils:mainfrom
wondr-wclabs:codex/bre-whitespace-escapes

Conversation

@wondr-wclabs
Copy link
Copy Markdown
Contributor

Fixes #31.

This enables Oniguruma's \s/\S whitespace shorthand operator for basic-regexp mode. The rest of the BRE syntax stays unchanged; this only fills the documented gap where \w/\W already worked in BRE but \s/\S were treated as literals.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 5, 2026

GNU grep testsuite comparison:

Test results comparison:
  Current:   TOTAL: 128 / PASSED: 71 / FAILED: 36 / SKIPPED: 21
  Reference: TOTAL: 128 / PASSED: 70 / FAILED: 37 / SKIPPED: 21

Changes from main branch:
  TOTAL: +0
  PASSED: +1
  FAILED: -1

New test failures (1):
  - backslash-s-vs-invalid-multibyte

Test improvements (2):
  + backslash-s-and-repetition-operators
  + multibyte-white-space

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Jun 5, 2026

Merging this PR will not alter performance

✅ 10 untouched benchmarks
⏩ 17 skipped benchmarks1


Comparing wondr-wclabs:codex/bre-whitespace-escapes (b2dd516) with main (d28bf76)

Open in CodSpeed

Footnotes

  1. 17 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@wondr-wclabs wondr-wclabs force-pushed the codex/bre-whitespace-escapes branch from 2a164b2 to b2dd516 Compare June 5, 2026 08:32
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 5, 2026

GNU grep testsuite comparison:

Test results comparison:
  Current:   TOTAL: 128 / PASSED: 74 / FAILED: 33 / SKIPPED: 21
  Reference: TOTAL: 128 / PASSED: 72 / FAILED: 35 / SKIPPED: 21

Changes from main branch:
  TOTAL: +0
  PASSED: +2
  FAILED: -2

Test improvements (2):
  + backslash-s-and-repetition-operators
  + multibyte-white-space

@wondr-wclabs
Copy link
Copy Markdown
Contributor Author

Closing this older duplicate in favor of #55. The replacement PR keeps the same GNU \s/\S BRE compatibility goal, but its final implementation accounts for the GNU invalid-multibyte test: it expands BRE whitespace shorthands to POSIX space classes instead of enabling Oniguruma's shorthand operator directly, so \S does not match invalid UTF-8 bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

\s and \S are not honored in basic-regexp (-G) mode like GNU

1 participant