perf(scanner): cache rare-byte anchor in CompiledPattern#83
Conversation
parse_aob() now stores the rarest literal byte's index on CompiledPattern::anchor so find_pattern() reads it as a single load instead of re-running the selection loop on every scan. Manually constructed patterns fall back to inline selection via a sentinel default. Also hoists cpu_has_avx2() out of the memchr hit loop and replaces the AVX2 mismatch sentinel with std::optional<size_t>. Adds tests/bench_scanner.cpp that contrasts the rare-byte anchor against a first-literal-byte anchor on an 8 MiB code-like buffer; on an AVX2 host the rare-byte strategy is 24x to 28x faster on patterns whose first literal is a common opcode, and within 1% noise when the first literal is already rare.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (8)
📝 WalkthroughWalkthroughThe pull request implements a rare-byte anchor heuristic for pattern matching in the AOB scanner. ChangesAnchor Heuristic Feature and Validation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
parse_aob()now caches the rarest literal byte's index onCompiledPattern::anchorsofind_pattern()reads it as a single load instead of re-running the selection loop on every scan.CompiledPattern::compile_anchor()lets callers opt in to the cached fast path.cpu_has_avx2()is hoisted out of the per-memchr-hit loop, and the AVX2 mismatch sentinel is replaced withstd::optional<size_t>for clearer code.tests/bench_scanner.cpp, a standalone microbench that contrasts the rare-byte anchor against a first-literal-byte anchor on an 8 MiB code-like buffer. Build with-DDMK_BUILD_BENCHMARKS=ON.Benchmark
8 MiB synthetic buffer tuned to x64
.textbyte frequencies, AVX2 verify, 200 scans per sample, 11-sample median. Both runs use the exact samefind_patterncode path; onlyCompiledPattern::anchordiffers.common_first_rare_buried_8(48 8B 05 37 DE AD BE EF)common_first_rare_buried_16all_common_first_no_matchrare_first_short_no_match(37 6B C1 BA 5E 71)long_mostly_wildcardsverify_heavy_32B_match(32 bytes)When the first literal byte is common (
0x48REX.W,0x8BMOV, etc) the rare-byte heuristic produces a 24x to 28x speedup. When the first byte is already rare both strategies are within 1% noise; the heuristic never regresses.Full methodology and reproduction steps live in
docs/analysis/scanner_bench_v3.x/README.md.Test plan
compile_anchor()idempotency, the manual-construction fallback, and the empty-pattern boundary.mingw-debug.mingw-releasewith LTO.Summary by CodeRabbit
Release Notes
New Features
compile_anchor()method for explicit anchor selection in compiled patterns.Documentation
Tests