Skip to content

perf(eval): batch rule loading to avoid O(N²) index rebuilds#119

Merged
mostafa merged 1 commit into
mainfrom
fix/quadratic-rule-load
May 16, 2026
Merged

perf(eval): batch rule loading to avoid O(N²) index rebuilds#119
mostafa merged 1 commit into
mainfrom
fix/quadratic-rule-load

Conversation

@mostafa
Copy link
Copy Markdown
Member

@mostafa mostafa commented May 16, 2026

Summary

Engine::add_rule rebuilds the inverted index and per-field bloom on every call, so loading a rule set with add_rule in a loop is O(N²) in the rule count. The CLI validate command exercises this exact path, and on the 3120-rule SigmaHQ corpus it appeared to hang because it was paying ~3K full index rebuilds. CorrelationEngine::add_collection hit the same trap via the inner engine, so rsigma eval, the daemon, and every RuntimeEngine caller shared the bug.

This PR adds batched-load primitives to rsigma-eval and routes the slow callers through them.

Impact

rsigma validate on the full 3120-rule SigmaHQ corpus on a MacBook M4 Pro:

Build Before After
Debug killed at 60s+ (never completed) 2.2s
Release ~25-30s extrapolated from 2390-rule subset 0.5s

The win (O(N)+O(1) → O(N)) comes from collapsing N per-rule rebuilds into a single rebuild: roughly an N× reduction in RuleIndex::build and FieldBloomIndex::build calls (3120 → 1), and the bloom builder is the dominant cost because it sweeps every detection on every rebuild to collect needles, trigram-dedup, and size each per-field filter. The same fix applies to the correlation engine, so daemon and eval rule loads also drop from O(N²) to O(N).

Changes

  • Engine::add_rules<'a, I: IntoIterator<Item = &'a SigmaRule>>(&mut self, rules: I) -> Vec<(usize, EvalError)> compiles each rule (applying configured pipelines), collects per-rule compile errors without aborting the batch, and rebuilds the engine indexes exactly once.
  • Engine::extend_compiled_rules<I: IntoIterator<Item = CompiledRule>> does the same for pre-compiled rules, used by the correlation engine to avoid double pipeline application.
  • Engine::add_rule and Engine::add_collection share a new compile_with_pipelines helper so single- and batch-add paths stay behaviourally identical.
  • rsigma validate now sends the whole parsed collection through add_rules while preserving the existing per-rule compile-error reporting.
  • CorrelationEngine::add_collection compiles each rule sequentially (with its own pipelines, custom-attribute handling, and rule_ids tracking), then pushes the resulting batch through extend_compiled_rules once. The single-rule CorrelationEngine::add_rule is unchanged.
  • crates/rsigma-eval/README.md API table now lists the two new methods.

Tests

  • test_add_rules_matches_per_rule_loop: batched and per-rule paths produce identical evaluation verdicts across a mix of indexable and substring rules.
  • test_add_rules_collects_errors_without_aborting: a bad rule between two good ones is reported via the returned (index, error) list while both good rules end up loaded.
  • test_add_rules_scales_linearly_on_large_corpus: loads 2000 generated rules and asserts the call completes inside a deliberately generous 30s ceiling; the previous per-rule path would have timed out.

Test plan

  • cargo fmt --all -- --check
  • cargo clippy --workspace --all-targets --all-features -- -D warnings
  • cargo test --workspace --all-features
  • rsigma validate ../sigma/rules/ against the full SigmaHQ corpus (3120 rules) in both debug and release builds.

`Engine::add_rule` rebuilds the inverted index and per-field bloom on
every call, so loading a rule set with `add_rule` in a loop is O(N²) in
the rule count. On the 3120-rule SigmaHQ corpus the validate command
appeared to hang; in practice it was paying ~3K full index rebuilds.

Add a batched `Engine::add_rules` API that compiles each rule (applying
configured pipelines), collects per-rule compile errors without aborting
the batch, and rebuilds the engine indexes exactly once at the end. Add
`Engine::extend_compiled_rules` for the pre-compiled equivalent.

Route the slow callers through the new APIs:

- `rsigma validate` now batches the whole collection through
  `add_rules` while preserving per-rule error reporting.
- `CorrelationEngine::add_collection` compiles rules sequentially (with
  its own pipelines and metadata tracking) and pushes the resulting
  batch through `extend_compiled_rules`, so `rsigma eval`, the daemon,
  and every `RuntimeEngine` caller share the same one-rebuild path.

Tests cover behavioural equivalence with the per-rule loop, mid-batch
compile errors, and a 2K-rule load that guards against quadratic
regressions.
@mostafa mostafa merged commit 7fcc8e0 into main May 16, 2026
10 checks passed
@mostafa mostafa deleted the fix/quadratic-rule-load branch May 16, 2026 16:32
SecurityEnthusiast pushed a commit to SecurityEnthusiast/rsigma that referenced this pull request May 17, 2026
Replaces the placeholder Unreleased section with a full release-notes
draft following the format of the v0.11.0 / v0.10.0 / v0.9.0 entries.
Covers every PR merged to main since v0.11.0:

- Daemon and CLI observability (PR timescale#107) - tower-http access logs,
  per-request OTLP tracing, batch spans, source resolution spans, DLQ
  visibility, NATS/sink lifecycle, correlation eviction warnings, rule
  load diagnostics, daemon lifecycle, global `--log-format` flag.
- Eval rule loading performance (PRs timescale#119, timescale#121, timescale#122, timescale#123) - batched
  loaders rebuild indexes once per batch via `Engine::add_rules` /
  `extend_compiled_rules` / `add_collection`; single-rule path
  amortized O(1) via `RuleIndex::append_rule` and a doubling-watermark
  `FieldBloomIndex`. SigmaHQ corpus (~3,120 rules) now loads in ~120 ms.
- CLI command groups (PR timescale#124) - the noun-led `engine` / `rule` /
  `backend` / `pipeline` / `attack` grouping with the existing
  migration table preserved verbatim.
- Test reliability (PRs timescale#115, timescale#123) - cli_daemon_http and
  cli_daemon_otlp E2E suites de-flaked on macOS under load; eval bloom
  test made deterministic against random AHash seeds.
- Dependency and CI bumps.

All command-name references within the draft already use the new
noun-led paths (`engine eval`, `rule validate`, etc.) so the next
release ships with consistent terminology throughout the notes.
@mostafa mostafa mentioned this pull request May 19, 2026
5 tasks
SecurityEnthusiast pushed a commit to SecurityEnthusiast/rsigma that referenced this pull request May 20, 2026
The "operability, performance, and documentation" release.

* Workspace bumped 0.11.0 -> 0.12.0; all 10 inter-crate dep pins
  refreshed; Cargo.lock regenerated under --locked.
* CHANGELOG.md [Unreleased] section flipped to [0.12.0] - 2026-05-19;
  comparison link updated to v0.11.0...v0.12.0; tag reference added
  to the bottom-of-file link block.
* CHANGELOG also gained a Documentation site (PR timescale#129) section under
  the existing observability / eval-perf / CLI-groups / test-reliability
  / dependencies headings, and the TL;DR theme moved from "operations
  and load performance" to "operability, performance, and documentation"
  to reflect the new docs site as a top-line deliverable.

Covers all 13 PRs merged since v0.11.0: timescale#107 (observability),
timescale#111/timescale#113/timescale#114/timescale#120 (dependency batches), timescale#115/timescale#123 (test
reliability), timescale#119/timescale#121/timescale#122/timescale#123 (eval rule loading perf), timescale#124
(CLI command groups), timescale#127 (CLI docs followup), timescale#129 (documentation
site).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant