Skip to content

Propagate custom rule attributes through results#26

Merged
mostafa merged 7 commits into
timescale:mainfrom
fwosar:propagate-custom-attributes
Apr 17, 2026
Merged

Propagate custom rule attributes through results#26
mostafa merged 7 commits into
timescale:mainfrom
fwosar:propagate-custom-attributes

Conversation

@fwosar
Copy link
Copy Markdown
Contributor

@fwosar fwosar commented Apr 17, 2026

I decided not to piggy-back off the existing custom_attributes property. Instead, custom attributes set in rules are propagated by the custom_rule_attributes properties. The reason being is that custom_rule_attributes really wants to be a HashMap pointing to Serde values (as the rule attributes can be nested). Also, it avoids any sort of name conflicts.

I tried to stick to the existing code style. That being said, I am not a super experienced Rust developer. So if there are more idiomatic ways to do things, please let me know and I can change it.

Closes #20.

@fwosar fwosar requested a review from mostafa as a code owner April 17, 2026 17:47
Copy link
Copy Markdown
Member

@mostafa mostafa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #20 (comment) first.

Overall

The split between the existing typed custom_attributes (rsigma.* engine controls, HashMap<String, String>) and the new custom_rule_attributes (arbitrary user metadata, nested values) is the right call -- widening custom_attributes to nested values would have broken engine code that calls .parse() on the strings. Two semantically-distinct dicts is cleaner. The detection-rule key list is complete and accurate.

A few items inline to address before merge, plus two design follow-ups worth highlighting.

Required (see inline comments)

  • Bug: standard_correlation_keys is out of sync with parse_correlation_rule -- silent duplication of name/tags, silent data loss for taxonomy/falsepositives/top-level generate.
  • Visibility: yaml_to_json_map should be pub(crate).
  • Robustness: NaN/Inf would panic in the f64 arm of yaml_to_json.

Worth elevating (not blocking -- open as follow-up issues if you agree)

  • Sync-drift hazard between standard_*_keys and the struct fields/parser. The lists live in a different file from the struct definitions and from the get_str(...) calls that consume the keys. Adding a new top-level field means updating both, easy to forget. A cleaner pattern would be to invert the predicate: track which keys the parser actually consumed and collect the rest, mirroring serde's #[serde(flatten)] behavior. Out of scope for this PR but worth queuing.
  • Per-match clone of the attributes map. MatchResult and CorrelationResult deep-clone the whole map on every successful match. With Level-3 batch evaluation (parallel rayon, millions of events/sec) and rules that carry nested attrs, this isn't free. Storing CompiledRule.custom_rule_attributes as Arc<HashMap<String, serde_json::Value>> and cloning the Arc would make this a pointer bump with no public-API churn. Suggest a follow-up PR rather than scope-creep here.

Thanks for splitting the new field cleanly from custom_attributes and for the thorough parser tests.

Comment thread crates/rsigma-eval/src/compiler.rs Outdated
Comment thread crates/rsigma-eval/src/compiler.rs Outdated
Comment thread crates/rsigma-eval/src/compiler.rs
Comment thread crates/rsigma-eval/src/correlation_engine.rs
Comment thread crates/rsigma-eval/src/correlation_engine.rs Outdated
Comment thread crates/rsigma-parser/src/parser.rs
@fwosar
Copy link
Copy Markdown
Contributor Author

fwosar commented Apr 17, 2026

@fwosar I am wondering if it'd be better to merge the two to stay pySigma-aligned and what the blast radius is! Merging them is a breaking change, which I am okay with since the project is new and is getting traction, so it'd be okay to just experiment. WDYT?

It's certainly possible to merge them. The main reason I didn't is mostly that it looked like a more complicated change and I tried to keep my changes as small as possible. But let me address your review feedback and also try to merge the two properties into a single one.

The question still is how conflicts should be handled. What if both the pipeline and the rule attempt to add the same custom attribute?

@mostafa
Copy link
Copy Markdown
Member

mostafa commented Apr 17, 2026

The question still is how conflicts should be handled. What if both the pipeline and the rule attempt to add the same custom attribute?

The order of precedence would be like this:

  1. Rule YAML top-level custom_attributes: block (correlation rules have it today).
  2. Rule YAML non-standard top-level fields (user metadata like template, severity_score, etc.).
  3. Pipeline SetCustomAttribute transformations, applied in pipeline order.

Each step can overwrite keys set by an earlier step. That gives us three useful invariants:

  • Pipelines beat rules.
  • Within a rule, explicit beats implicit.
  • Simple to explain in one sentence.

We can then warn on overwrite with something like this:

if custom_attributes.insert(key.clone(), new_value).is_some() {
    log::warn!(
        "custom attribute '{key}' overwritten by {source}; previous value discarded"
    );
}

Possible edge case: if the user uses an unknown key under rsigma.*, which are semi-reserved for the engine controls, we can either accept it (less favorable - conventional) or warn on the unknown keys and then ignore it by emitting log::warn!("unknown rsigma.* key '{k}'; ignoring").

So: defined order + last-write-wins + log warning.

fwosar added 2 commits April 17, 2026 22:39
Made yaml_to_json_map pub(crate)
Improved robustness for handling NaN/Infinity
Added missing test case
Use an Arc to share custom attributes and avoid expensive clones
@fwosar
Copy link
Copy Markdown
Contributor Author

fwosar commented Apr 17, 2026

Merging them certainly would be the cleanest approach and as long as precedence is well-defined, it's certainly a good idea.

I also agree that to avoid the risk of drift, having the parse_* function move everything that wasn't consumed into the custom_attributes might be a cleaner approach compared to the approach I have taken. However, it appears that the parse_correlation_rule function in particular isn't standard compliant, as what it consumes differs from the official schemas.

@mostafa
Copy link
Copy Markdown
Member

mostafa commented Apr 17, 2026

@fwosar Thank you for your contribution. I'll take it from here.

@mostafa mostafa self-assigned this Apr 17, 2026
@mostafa mostafa merged commit c1535d0 into timescale:main Apr 17, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for custom Sigma rule fields

2 participants