Skip to content

grep: honor GNU buffer anchors#56

Merged
lhecker merged 2 commits into
uutils:mainfrom
wondr-wclabs:codex/gnu-buffer-anchors
Jun 5, 2026
Merged

grep: honor GNU buffer anchors#56
lhecker merged 2 commits into
uutils:mainfrom
wondr-wclabs:codex/gnu-buffer-anchors

Conversation

@wondr-wclabs
Copy link
Copy Markdown
Contributor

Fixes #33.

GNU grep recognizes \` and \' as start/end buffer anchors in both BRE and ERE mode. In this implementation, grep searches one record/line at a time, so those GNU buffer anchors behave like the record-local start/end anchors for the cases in this issue. The current Syntax::grep() and Syntax::gnu_regex() setup did not enable Oniguruma's GNU buffer-anchor operator, so the escapes were treated as literal backtick/apostrophe characters.

This enables SYNTAX_OPERATOR_ESC_GNU_BUF_ANCHOR only for RegexMode::Basic and RegexMode::Extended. I left Fixed untouched because escapes are literals there, and left Perl untouched because -P should follow the PCRE-style syntax path rather than GNU BRE/ERE extensions.

Validation:

  • cargo fmt --all -- --check
  • cargo test
  • cargo clippy --all-targets --workspace -puu_grep -- -D warnings
  • git diff --check
  • printf 'cat\ndog\n' | cargo run --quiet -- -e "t\\'" now prints cat
  • printf 'cat\ndog\n' | cargo run --quiet -- -e '\c'now printscat`

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Jun 5, 2026

Merging this PR will not alter performance

✅ 10 untouched benchmarks
⏩ 17 skipped benchmarks1


Comparing wondr-wclabs:codex/gnu-buffer-anchors (0ec43c8) with main (f2f86ef)

Open in CodSpeed

Footnotes

  1. 17 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Comment thread tests/test_grep.rs
Comment on lines +88 to +113
#[test]
fn gnu_buffer_anchors() {
let (_s, mut c) = ucmd();
c.args(&[r"\`c"])
.pipe_in("cat\nscat\ndog\n")
.succeeds()
.stdout_only("cat\n");

let (_s, mut c) = ucmd();
c.args(&[r"t\'"])
.pipe_in("cat\ntar\ndog\n")
.succeeds()
.stdout_only("cat\n");

let (_s, mut c) = ucmd();
c.args(&["-E", r"\`c"])
.pipe_in("cat\nscat\ndog\n")
.succeeds()
.stdout_only("cat\n");

let (_s, mut c) = ucmd();
c.args(&["-E", r"t\'"])
.pipe_in("cat\ntar\ndog\n")
.succeeds()
.stdout_only("cat\n");
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it could be expressed in just two tests, no?

Copy link
Copy Markdown
Contributor Author

@wondr-wclabs wondr-wclabs Jun 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I collapsed this to two command invocations: one BRE case and one ERE case. Each pattern now covers both GNU buffer anchors in one command:

BRE: \`c\|r\'
ERE: \`c|r\'

That keeps the mode distinction explicit while avoiding four near-identical command setups. Focused validation after the change: cargo test --test test_grep gnu_buffer_anchors.

@wondr-wclabs wondr-wclabs force-pushed the codex/gnu-buffer-anchors branch from e20d12f to 0ec43c8 Compare June 5, 2026 15:20
@wondr-wclabs
Copy link
Copy Markdown
Contributor Author

I also rebased this branch onto current upstream/main after the earlier CI run. The previous branch was still stacked on already-merged grep work, so the PR diff and CI were carrying unrelated binary-path changes.

After the rebase, this PR is back to the intended scope: enabling Oniguruma's GNU buffer-anchor operator for BRE/ERE plus the two compact anchor tests. Local validation on the cleaned branch:

cargo fmt --all -- --check
cargo test --verbose

Copy link
Copy Markdown
Collaborator

@lhecker lhecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, anthropic/claude-sonnet-4-6! 🥲

@lhecker lhecker merged commit e925ff4 into uutils:main Jun 5, 2026
16 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GNU start/end-of-buffer anchors are treated as literals instead of anchors

2 participants