Skip to content

perf: buffer-at-a-time search for literal patterns#16

Open
sylvestre wants to merge 2 commits into
mainfrom
literal-fast-path
Open

perf: buffer-at-a-time search for literal patterns#16
sylvestre wants to merge 2 commits into
mainfrom
literal-fast-path

Conversation

@sylvestre
Copy link
Copy Markdown
Contributor

Literal searches were ~50-70x slower than GNU grep because every line paid per-line costs (terminator scan, NUL scan, dispatch) even when a buffer held no match. Add a buffer-at-a-time driver that scans whole chunks with a substring searcher and only locates line boundaries around the matches it finds; a chunk with no match costs a single vectorized sweep and no per-line work.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 31, 2026

Merging this PR will improve performance by ×19

⚡ 3 improved benchmarks
✅ 7 untouched benchmarks
⏩ 17 skipped benchmarks1

Performance Changes

Benchmark BASE HEAD Efficiency
literal_no_match 27.6 ms 1.3 ms ×21
fixed_string 29.5 ms 1.6 ms ×18
search_pattern 29.5 ms 1.6 ms ×18

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing literal-fast-path (b3d70c0) with main (c614a57)

Open in CodSpeed

Footnotes

  1. 17 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 31, 2026

Codecov Report

❌ Patch coverage is 98.51632% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.67%. Comparing base (c614a57) to head (b3d70c0).

Files with missing lines Patch % Lines
src/searcher.rs 97.02% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #16      +/-   ##
==========================================
+ Coverage   95.28%   95.67%   +0.38%     
==========================================
  Files           6        6              
  Lines        1422     1758     +336     
  Branches      140      188      +48     
==========================================
+ Hits         1355     1682     +327     
- Misses         66       75       +9     
  Partials        1        1              
Flag Coverage Δ
macOS_latest 96.50% <99.40%> (+0.40%) ⬆️
ubuntu_latest 96.50% <99.40%> (+0.40%) ⬆️
windows_latest 0.00% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sylvestre sylvestre force-pushed the literal-fast-path branch from 0c9bbbe to 0846294 Compare May 31, 2026 09:17
@sylvestre sylvestre requested a review from lhecker May 31, 2026 09:28
Literal searches were ~50-70x slower than GNU grep because every line
paid per-line costs (terminator scan, NUL scan, dispatch) even when a
buffer held no match. Add a buffer-at-a-time driver that scans whole
chunks with a substring searcher and only locates line boundaries
around the matches it finds; a chunk with no match costs a single
vectorized sweep and no per-line work.

The driver activates only for plain ASCII literal patterns (case
sensitive, no metacharacters) in the simpler output modes: -c, -l, -L,
-q, and plain line printing with -n/-b/filename/-m. Anything needing
match positions, context, inversion, color, or special binary handling
falls back to the unchanged line-at-a-time path. Output stays
byte-identical to that path, including binary/invalid-UTF-8 behavior.

- line_buffer: read_chunk() yields the largest span of complete lines.
- matcher: expose per-pattern memmem searchers when every pattern is a
  plain literal (plain_literal()).
- searcher: eligible_for_fast_path(), fast_locate(), fast_print().

All scanning rides on the memchr crate (SIMD memchr/memrchr/memmem).
Unit tests for read_chunk and plain_literal; integration tests for
prefixes, -m, and multi-chunk line-number correctness.

Benchmarks (31 MB corpus) vs prior release:
  -F (no match):  232ms -> 15ms  (15.9x; now faster than GNU)
  -c literal:     229ms -> 15ms  (15.2x)
  plain print:    248ms -> 18ms  (13.5x)
Regex and -i paths are unchanged (still the line-at-a-time engine).
@sylvestre sylvestre force-pushed the literal-fast-path branch from 0846294 to c3840c2 Compare May 31, 2026 15:43
The buffer-at-a-time fast path now serves the literal patterns that the
existing -l/-L/-q and binary tests used, leaving the line-at-a-time
engine's equivalents uncovered. Add bracket-class (non-literal) tests
for -l/-L/-q and binary handling (notice, -a text, without-match bail,
and the finalize-time notice), plus a fast-path test for a NUL that is
only discovered after a line was already printed.

No dead code was found: the remaining uncovered lines are writer I/O
error-propagation arms and pre-existing filesystem error handlers.
@sylvestre sylvestre force-pushed the literal-fast-path branch from c3840c2 to b3d70c0 Compare May 31, 2026 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant