feat(cli): expose bloom prefilter via --bloom-prefilter on eval and daemon#104
Merged
Conversation
…aemon `Engine::set_bloom_prefilter` was only reachable programmatically. Wire the toggle through the daemon and eval subcommands so operators can flip it at deploy time without touching code, and add a paired `--bloom-max-bytes` knob for the per-engine memory budget. Engine API additions: - `Engine::set_bloom_max_bytes(usize)` overrides the 1 MB default for the per-field bloom budget. The setter rebuilds the index immediately if rules are already loaded; otherwise the budget applies on the next `add_collection` / `add_rule` call. - `Engine::bloom_max_bytes() -> Option<usize>` returns the override, `None` meaning the crate default is in use. - `CorrelationEngine` gains forwarding setters for both bloom knobs. Runtime + CLI wiring: - `RuntimeEngine` carries `bloom_prefilter: bool` and `bloom_max_bytes: Option<usize>`. The `load_rules()` path applies them on every reload to both the detection-only and CorrelationEngine variants, so hot reloads keep the toggle. - `DaemonConfig` adds matching fields, populated from new CLI flags. - `cmd_eval` and its detection-only / correlation helpers accept the toggle and forward to the inner engines. CLI flags on `rsigma daemon` and `rsigma eval`: - `--bloom-prefilter`: enable opt-in bloom pre-filtering. Off by default because the per-event probe (~1 µs on a typical CommandLine) outweighs the savings on rule sets where most events overlap with at least one needle's trigrams. - `--bloom-max-bytes <BYTES>`: override the 1 MB default. No effect unless `--bloom-prefilter` is set. Lower the cap on memory- constrained deployments; raise it for very large rule sets where the default starts evicting useful filters. Tests: - `eval_bloom_prefilter_flag_is_accepted` smoke-tests the flag passes through and produces the same match output as the default path. - `eval_bloom_prefilter_with_max_bytes` covers the paired memory budget flag at 128 KB. - `eval_bloom_prefilter_rejects_non_matching_event` exercises the short-circuit path with a digit-only event that shares zero trigrams with the rule's needles. Documentation: - `crates/rsigma-eval/README.md` gains a "Bloom Pre-Filter (Opt-In)" section with the trade-off explanation and CLI/library equivalents. - `crates/rsigma-cli/README.md` lists the new flags in both the `daemon` and `eval` flag tables.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Engine::set_bloom_prefilterwas only reachable programmatically after #102 merged. This PR wires the toggle through the daemon and eval subcommands so operators can flip it at deploy time without touching code, and adds a paired--bloom-max-bytesknob for the per-engine memory budget.Engine API additions
Engine::set_bloom_max_bytes(usize)overrides the 1 MB default for the per-field bloom budget. The setter rebuilds the index immediately if rules are already loaded; otherwise the budget applies on the nextadd_collection/add_rulecall.Engine::bloom_max_bytes() -> Option<usize>exposes the override (Nonemeans crate default).CorrelationEnginegains forwarding setters for both bloom knobs so the daemon's correlation path picks them up too.Runtime + CLI wiring
RuntimeEnginecarriesbloom_prefilter: boolandbloom_max_bytes: Option<usize>. Both setters apply on everyload_rules()(detection-only AND correlation variants), so hot reloads keep the toggle.DaemonConfigadds matching fields, populated from new CLI flags.cmd_evaland its detection-only / correlation helpers accept the toggle and forward to the inner engines.New CLI flags (on both
rsigma daemonandrsigma eval)--bloom-prefilter: enable opt-in bloom pre-filtering. Off by default because the per-event probe (~1 µs on a typical CommandLine) outweighs the savings on rule sets where most events overlap with at least one needle's trigrams.--bloom-max-bytes <BYTES>: override the 1 MB default. No effect without--bloom-prefilter. Lower the cap on memory-constrained deployments; raise it for very large rule sets where the default starts evicting useful filters.Why opt-in
Flagged because the optimization only pays off on substring-heavy rule sets paired with mostly-non-matching events. The
eval_bloom_rejectionbenchmark incrates/rsigma-eval/benches/eval.rsreports throughput with both default-off and bloom-on engines so deployments can size the win on their own corpus before flipping the flag.Tests
eval_bloom_prefilter_flag_is_accepted: flag passes through and produces the same match output as the default path.eval_bloom_prefilter_with_max_bytes: paired memory budget flag at 128 KB.eval_bloom_prefilter_rejects_non_matching_event: short-circuit path with a digit-only event that shares zero trigrams with the rule's needles.All workspace tests pass; clippy + fmt clean.
Documentation
crates/rsigma-eval/README.mdgains a "Bloom Pre-Filter (Opt-In)" section with the trade-off explanation and CLI/library equivalents.crates/rsigma-cli/README.mdlists the new flags in both thedaemonandevalflag tables.Test plan
cargo fmt --all -- --checkcargo clippy --workspace --all-targets --all-features -- -D warningscargo test --workspacecli_evalsuite)