Propagate custom rule attributes through results#26
Conversation
There was a problem hiding this comment.
Please see #20 (comment) first.
Overall
The split between the existing typed custom_attributes (rsigma.* engine controls, HashMap<String, String>) and the new custom_rule_attributes (arbitrary user metadata, nested values) is the right call -- widening custom_attributes to nested values would have broken engine code that calls .parse() on the strings. Two semantically-distinct dicts is cleaner. The detection-rule key list is complete and accurate.
A few items inline to address before merge, plus two design follow-ups worth highlighting.
Required (see inline comments)
- Bug:
standard_correlation_keysis out of sync withparse_correlation_rule-- silent duplication ofname/tags, silent data loss fortaxonomy/falsepositives/top-levelgenerate. - Visibility:
yaml_to_json_mapshould bepub(crate). - Robustness: NaN/Inf would panic in the
f64arm ofyaml_to_json.
Worth elevating (not blocking -- open as follow-up issues if you agree)
- Sync-drift hazard between
standard_*_keysand the struct fields/parser. The lists live in a different file from the struct definitions and from theget_str(...)calls that consume the keys. Adding a new top-level field means updating both, easy to forget. A cleaner pattern would be to invert the predicate: track which keys the parser actually consumed and collect the rest, mirroring serde's#[serde(flatten)]behavior. Out of scope for this PR but worth queuing. - Per-match clone of the attributes map.
MatchResultandCorrelationResultdeep-clone the whole map on every successful match. With Level-3 batch evaluation (parallel rayon, millions of events/sec) and rules that carry nested attrs, this isn't free. StoringCompiledRule.custom_rule_attributesasArc<HashMap<String, serde_json::Value>>and cloning theArcwould make this a pointer bump with no public-API churn. Suggest a follow-up PR rather than scope-creep here.
Thanks for splitting the new field cleanly from custom_attributes and for the thorough parser tests.
It's certainly possible to merge them. The main reason I didn't is mostly that it looked like a more complicated change and I tried to keep my changes as small as possible. But let me address your review feedback and also try to merge the two properties into a single one. The question still is how conflicts should be handled. What if both the pipeline and the rule attempt to add the same custom attribute? |
The order of precedence would be like this:
Each step can overwrite keys set by an earlier step. That gives us three useful invariants:
We can then warn on overwrite with something like this: if custom_attributes.insert(key.clone(), new_value).is_some() {
log::warn!(
"custom attribute '{key}' overwritten by {source}; previous value discarded"
);
}Possible edge case: if the user uses an unknown key under So: defined order + last-write-wins + log warning. |
Made yaml_to_json_map pub(crate) Improved robustness for handling NaN/Infinity Added missing test case Use an Arc to share custom attributes and avoid expensive clones
|
Merging them certainly would be the cleanest approach and as long as precedence is well-defined, it's certainly a good idea. I also agree that to avoid the risk of drift, having the parse_* function move everything that wasn't consumed into the custom_attributes might be a cleaner approach compared to the approach I have taken. However, it appears that the parse_correlation_rule function in particular isn't standard compliant, as what it consumes differs from the official schemas. |
|
@fwosar Thank you for your contribution. I'll take it from here. |
I decided not to piggy-back off the existing custom_attributes property. Instead, custom attributes set in rules are propagated by the custom_rule_attributes properties. The reason being is that custom_rule_attributes really wants to be a HashMap pointing to Serde values (as the rule attributes can be nested). Also, it avoids any sort of name conflicts.
I tried to stick to the existing code style. That being said, I am not a super experienced Rust developer. So if there are more idiomatic ways to do things, please let me know and I can change it.
Closes #20.