perf: buffer-at-a-time search for literal patterns#16
Conversation
Merging this PR will improve performance by ×19
Performance Changes
Tip Curious why this is faster? Comment Comparing Footnotes
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16 +/- ##
==========================================
+ Coverage 95.28% 95.67% +0.38%
==========================================
Files 6 6
Lines 1422 1758 +336
Branches 140 188 +48
==========================================
+ Hits 1355 1682 +327
- Misses 66 75 +9
Partials 1 1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
0c9bbbe to
0846294
Compare
Literal searches were ~50-70x slower than GNU grep because every line paid per-line costs (terminator scan, NUL scan, dispatch) even when a buffer held no match. Add a buffer-at-a-time driver that scans whole chunks with a substring searcher and only locates line boundaries around the matches it finds; a chunk with no match costs a single vectorized sweep and no per-line work. The driver activates only for plain ASCII literal patterns (case sensitive, no metacharacters) in the simpler output modes: -c, -l, -L, -q, and plain line printing with -n/-b/filename/-m. Anything needing match positions, context, inversion, color, or special binary handling falls back to the unchanged line-at-a-time path. Output stays byte-identical to that path, including binary/invalid-UTF-8 behavior. - line_buffer: read_chunk() yields the largest span of complete lines. - matcher: expose per-pattern memmem searchers when every pattern is a plain literal (plain_literal()). - searcher: eligible_for_fast_path(), fast_locate(), fast_print(). All scanning rides on the memchr crate (SIMD memchr/memrchr/memmem). Unit tests for read_chunk and plain_literal; integration tests for prefixes, -m, and multi-chunk line-number correctness. Benchmarks (31 MB corpus) vs prior release: -F (no match): 232ms -> 15ms (15.9x; now faster than GNU) -c literal: 229ms -> 15ms (15.2x) plain print: 248ms -> 18ms (13.5x) Regex and -i paths are unchanged (still the line-at-a-time engine).
0846294 to
c3840c2
Compare
The buffer-at-a-time fast path now serves the literal patterns that the existing -l/-L/-q and binary tests used, leaving the line-at-a-time engine's equivalents uncovered. Add bracket-class (non-literal) tests for -l/-L/-q and binary handling (notice, -a text, without-match bail, and the finalize-time notice), plus a fast-path test for a NUL that is only discovered after a line was already printed. No dead code was found: the remaining uncovered lines are writer I/O error-propagation arms and pre-existing filesystem error handlers.
c3840c2 to
b3d70c0
Compare
Literal searches were ~50-70x slower than GNU grep because every line paid per-line costs (terminator scan, NUL scan, dispatch) even when a buffer held no match. Add a buffer-at-a-time driver that scans whole chunks with a substring searcher and only locates line boundaries around the matches it finds; a chunk with no match costs a single vectorized sweep and no per-line work.