Skip to content

feat(cli): expose bloom prefilter via --bloom-prefilter on eval and daemon#104

Merged
mostafa merged 1 commit into
mainfrom
feat/bloom-prefilter-cli
May 13, 2026
Merged

feat(cli): expose bloom prefilter via --bloom-prefilter on eval and daemon#104
mostafa merged 1 commit into
mainfrom
feat/bloom-prefilter-cli

Conversation

@mostafa
Copy link
Copy Markdown
Member

@mostafa mostafa commented May 13, 2026

Summary

Engine::set_bloom_prefilter was only reachable programmatically after #102 merged. This PR wires the toggle through the daemon and eval subcommands so operators can flip it at deploy time without touching code, and adds a paired --bloom-max-bytes knob for the per-engine memory budget.

Engine API additions

  • Engine::set_bloom_max_bytes(usize) overrides the 1 MB default for the per-field bloom budget. The setter rebuilds the index immediately if rules are already loaded; otherwise the budget applies on the next add_collection / add_rule call.
  • Engine::bloom_max_bytes() -> Option<usize> exposes the override (None means crate default).
  • CorrelationEngine gains forwarding setters for both bloom knobs so the daemon's correlation path picks them up too.

Runtime + CLI wiring

  • RuntimeEngine carries bloom_prefilter: bool and bloom_max_bytes: Option<usize>. Both setters apply on every load_rules() (detection-only AND correlation variants), so hot reloads keep the toggle.
  • DaemonConfig adds matching fields, populated from new CLI flags.
  • cmd_eval and its detection-only / correlation helpers accept the toggle and forward to the inner engines.

New CLI flags (on both rsigma daemon and rsigma eval)

  • --bloom-prefilter: enable opt-in bloom pre-filtering. Off by default because the per-event probe (~1 µs on a typical CommandLine) outweighs the savings on rule sets where most events overlap with at least one needle's trigrams.
  • --bloom-max-bytes <BYTES>: override the 1 MB default. No effect without --bloom-prefilter. Lower the cap on memory-constrained deployments; raise it for very large rule sets where the default starts evicting useful filters.

Why opt-in

Flagged because the optimization only pays off on substring-heavy rule sets paired with mostly-non-matching events. The eval_bloom_rejection benchmark in crates/rsigma-eval/benches/eval.rs reports throughput with both default-off and bloom-on engines so deployments can size the win on their own corpus before flipping the flag.

Tests

  • eval_bloom_prefilter_flag_is_accepted: flag passes through and produces the same match output as the default path.
  • eval_bloom_prefilter_with_max_bytes: paired memory budget flag at 128 KB.
  • eval_bloom_prefilter_rejects_non_matching_event: short-circuit path with a digit-only event that shares zero trigrams with the rule's needles.

All workspace tests pass; clippy + fmt clean.

Documentation

  • crates/rsigma-eval/README.md gains a "Bloom Pre-Filter (Opt-In)" section with the trade-off explanation and CLI/library equivalents.
  • crates/rsigma-cli/README.md lists the new flags in both the daemon and eval flag tables.

Test plan

  • cargo fmt --all -- --check
  • cargo clippy --workspace --all-targets --all-features -- -D warnings
  • cargo test --workspace
  • CLI integration tests for the flag (cli_eval suite)

…aemon

`Engine::set_bloom_prefilter` was only reachable programmatically. Wire
the toggle through the daemon and eval subcommands so operators can flip
it at deploy time without touching code, and add a paired
`--bloom-max-bytes` knob for the per-engine memory budget.

Engine API additions:
- `Engine::set_bloom_max_bytes(usize)` overrides the 1 MB default for
  the per-field bloom budget. The setter rebuilds the index immediately
  if rules are already loaded; otherwise the budget applies on the next
  `add_collection` / `add_rule` call.
- `Engine::bloom_max_bytes() -> Option<usize>` returns the override,
  `None` meaning the crate default is in use.
- `CorrelationEngine` gains forwarding setters for both bloom knobs.

Runtime + CLI wiring:
- `RuntimeEngine` carries `bloom_prefilter: bool` and
  `bloom_max_bytes: Option<usize>`. The `load_rules()` path applies
  them on every reload to both the detection-only and
  CorrelationEngine variants, so hot reloads keep the toggle.
- `DaemonConfig` adds matching fields, populated from new CLI flags.
- `cmd_eval` and its detection-only / correlation helpers accept the
  toggle and forward to the inner engines.

CLI flags on `rsigma daemon` and `rsigma eval`:
- `--bloom-prefilter`: enable opt-in bloom pre-filtering. Off by
  default because the per-event probe (~1 µs on a typical CommandLine)
  outweighs the savings on rule sets where most events overlap with at
  least one needle's trigrams.
- `--bloom-max-bytes <BYTES>`: override the 1 MB default. No effect
  unless `--bloom-prefilter` is set. Lower the cap on memory-
  constrained deployments; raise it for very large rule sets where the
  default starts evicting useful filters.

Tests:
- `eval_bloom_prefilter_flag_is_accepted` smoke-tests the flag passes
  through and produces the same match output as the default path.
- `eval_bloom_prefilter_with_max_bytes` covers the paired memory
  budget flag at 128 KB.
- `eval_bloom_prefilter_rejects_non_matching_event` exercises the
  short-circuit path with a digit-only event that shares zero trigrams
  with the rule's needles.

Documentation:
- `crates/rsigma-eval/README.md` gains a "Bloom Pre-Filter (Opt-In)"
  section with the trade-off explanation and CLI/library equivalents.
- `crates/rsigma-cli/README.md` lists the new flags in both the
  `daemon` and `eval` flag tables.
@mostafa mostafa merged commit 13eed90 into main May 13, 2026
9 checks passed
@mostafa mostafa deleted the feat/bloom-prefilter-cli branch May 13, 2026 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant