perf: buffer-at-a-time search for literal patterns by sylvestre · Pull Request #16 · uutils/grep

sylvestre · 2026-05-31T08:58:21Z

Literal searches were ~50-70x slower than GNU grep because every line paid per-line costs (terminator scan, NUL scan, dispatch) even when a buffer held no match. Add a buffer-at-a-time driver that scans whole chunks with a substring searcher and only locates line boundaries around the matches it finds; a chunk with no match costs a single vectorized sweep and no per-line work.

codspeed-hq · 2026-05-31T09:01:17Z

Merging this PR will improve performance by ×19

⚡ 3 improved benchmarks
✅ 7 untouched benchmarks
⏩ 17 skipped benchmarks¹

Performance Changes

	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	`literal_no_match`	27.6 ms	1.3 ms	×21
⚡	`fixed_string`	29.5 ms	1.6 ms	×18
⚡	`search_pattern`	29.5 ms	1.6 ms	×18

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing literal-fast-path (b3d70c0) with main (c614a57)}

17 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

codecov · 2026-05-31T09:02:12Z

Codecov Report

❌ Patch coverage is 98.51632% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.67%. Comparing base (c614a57) to head (b3d70c0).

Files with missing lines	Patch %	Lines
src/searcher.rs	97.02%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #16      +/-   ##
==========================================
+ Coverage   95.28%   95.67%   +0.38%     
==========================================
  Files           6        6              
  Lines        1422     1758     +336     
  Branches      140      188      +48     
==========================================
+ Hits         1355     1682     +327     
- Misses         66       75       +9     
  Partials        1        1

Flag	Coverage Δ
macOS_latest	`96.50% <99.40%> (+0.40%)`	⬆️
ubuntu_latest	`96.50% <99.40%> (+0.40%)`	⬆️
windows_latest	`0.00% <0.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Literal searches were ~50-70x slower than GNU grep because every line paid per-line costs (terminator scan, NUL scan, dispatch) even when a buffer held no match. Add a buffer-at-a-time driver that scans whole chunks with a substring searcher and only locates line boundaries around the matches it finds; a chunk with no match costs a single vectorized sweep and no per-line work. The driver activates only for plain ASCII literal patterns (case sensitive, no metacharacters) in the simpler output modes: -c, -l, -L, -q, and plain line printing with -n/-b/filename/-m. Anything needing match positions, context, inversion, color, or special binary handling falls back to the unchanged line-at-a-time path. Output stays byte-identical to that path, including binary/invalid-UTF-8 behavior. - line_buffer: read_chunk() yields the largest span of complete lines. - matcher: expose per-pattern memmem searchers when every pattern is a plain literal (plain_literal()). - searcher: eligible_for_fast_path(), fast_locate(), fast_print(). All scanning rides on the memchr crate (SIMD memchr/memrchr/memmem). Unit tests for read_chunk and plain_literal; integration tests for prefixes, -m, and multi-chunk line-number correctness. Benchmarks (31 MB corpus) vs prior release: -F (no match): 232ms -> 15ms (15.9x; now faster than GNU) -c literal: 229ms -> 15ms (15.2x) plain print: 248ms -> 18ms (13.5x) Regex and -i paths are unchanged (still the line-at-a-time engine).

The buffer-at-a-time fast path now serves the literal patterns that the existing -l/-L/-q and binary tests used, leaving the line-at-a-time engine's equivalents uncovered. Add bracket-class (non-literal) tests for -l/-L/-q and binary handling (notice, -a text, without-match bail, and the finalize-time notice), plus a fast-path test for a NUL that is only discovered after a line was already printed. No dead code was found: the remaining uncovered lines are writer I/O error-propagation arms and pre-existing filesystem error handlers.

sylvestre force-pushed the literal-fast-path branch from 0c9bbbe to 0846294 Compare May 31, 2026 09:17

sylvestre requested a review from lhecker May 31, 2026 09:28

sylvestre force-pushed the literal-fast-path branch from 0846294 to c3840c2 Compare May 31, 2026 15:43

sylvestre force-pushed the literal-fast-path branch from c3840c2 to b3d70c0 Compare May 31, 2026 16:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: buffer-at-a-time search for literal patterns#16

perf: buffer-at-a-time search for literal patterns#16
sylvestre wants to merge 2 commits into
mainfrom
literal-fast-path

sylvestre commented May 31, 2026

Uh oh!

codspeed-hq Bot commented May 31, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 31, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sylvestre commented May 31, 2026

Uh oh!

codspeed-hq Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will improve performance by ×19

Performance Changes

Footnotes

Uh oh!

codecov Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codspeed-hq Bot commented May 31, 2026 •

edited

Loading

codecov Bot commented May 31, 2026 •

edited

Loading