Skip to content

feat: dynamic pipeline source resolution engine (Phase 2-3)#87

Merged
mostafa merged 7 commits intomainfrom
feat/dynamic-pipelines-resolve
May 6, 2026
Merged

feat: dynamic pipeline source resolution engine (Phase 2-3)#87
mostafa merged 7 commits intomainfrom
feat/dynamic-pipelines-resolve

Conversation

@mostafa
Copy link
Copy Markdown
Member

@mostafa mostafa commented May 6, 2026

Summary

Implements the full dynamic source resolution infrastructure for Sigma pipelines, enabling pipeline variables to be populated from external sources (files, commands, HTTP endpoints, NATS subjects) at startup and refreshed continuously at runtime.

Phase 2a+2b: Core resolution engine

  • SourceResolver trait with DefaultSourceResolver dispatching to file, command, and HTTP resolvers
  • jq-based extraction via jaq-interpret for post-fetch data shaping
  • TemplateExpander replacing ${source.X} and ${source.X.path} in pipeline vars
  • In-memory + SQLite-backed cache for last-known-good values
  • Error policies: use_cached, use_default, fail

Phase 2c: Refresh scheduling

  • RefreshScheduler with per-source interval timers
  • NATS push subscriptions (subscribe to subject, forward parsed messages)
  • File watchers (notify-based, with debouncing)
  • On-demand trigger channel for API/SIGHUP integration
  • API endpoints: GET /api/v1/sources, POST /api/v1/sources/resolve, POST /api/v1/sources/resolve/{source_id}

Phase 2d: Include expansion + daemon wiring

  • Transformation::Include expansion from resolved source data
  • Security: remote includes (HTTP/NATS) blocked unless explicitly allowed
  • Startup sequencing: required sources block daemon start, optional sources fall back to Null
  • RuntimeEngine carries resolver across hot-reloads via LogProcessor

Phase 3: CLI tooling + observability

  • rsigma resolve -p pipeline.yml for offline source testing
  • Prometheus metrics: rsigma_source_resolves_total, rsigma_source_resolve_errors_total, rsigma_source_resolve_seconds, rsigma_source_cache_hits_total
  • InstrumentedResolver wrapper recording all metrics transparently

Dependencies added

  • reqwest 0.12 (HTTP client), rusqlite 0.39 (cache persistence)
  • jaq-interpret 1.5.0 + jaq-parse 1.0.3 (jq extraction)
  • csv 1 (CSV format parsing), notify 8.2 (file watching)
  • async-trait (CLI daemon feature)

Test plan

  • 21 integration tests in crates/rsigma-runtime/tests/sources_integration.rs covering file/command sources, template expansion, error policies, SQLite cache persistence
  • All 20 daemon integration tests pass (verified route syntax fix for axum 0.8)
  • Full workspace test suite green (1100+ tests)
  • cargo clippy --workspace --all-features -- -D warnings clean
  • cargo fmt --all -- --check clean

mostafa added 7 commits May 6, 2026 21:53
Implement the source resolution infrastructure for dynamic Sigma
pipelines (Phase 2a+2b+2c scaffolding):

Source resolvers (crates/rsigma-runtime/src/sources/):
- SourceResolver trait with DefaultSourceResolver
- File source: read + parse (JSON/YAML/lines/CSV) + jq extract
- Command source: tokio::process::Command + stdout parsing
- HTTP source: reqwest with env-var header expansion, configurable
  method/timeout, format parsing + extract
- jq extraction via jaq-interpret for post-fetch data shaping

Template expansion:
- ${source.X} and ${source.X.path.to.field} replacement in pipeline vars
- Handles scalar, list, and inline template expansion

Caching:
- In-memory + SQLite-backed persistence (last-known-good values)
- Serves stale data on failure when on_error: use_cached

Error policies:
- use_cached: serve from cache on failure
- use_default: use declared default value
- fail: propagate error (blocks startup if required: true)

Refresh scheduler:
- RefreshScheduler with per-source interval timers
- On-demand trigger channel for API/SIGHUP integration
- Watch channel for notifying consumers of updated data

Daemon integration:
- RuntimeEngine.set_source_resolver() + resolve_dynamic_pipelines()
- load_rules() resolves dynamic sources when resolver is set
- LogProcessor carries resolver across reload cycles
- API endpoints: GET /api/v1/sources, POST /api/v1/sources/resolve,
  POST /api/v1/sources/resolve/{source_id}

Dependencies added to rsigma-runtime:
- reqwest 0.12 (HTTP client)
- rusqlite 0.39 (cache persistence)
- jaq-interpret 1.5.0 + jaq-parse 1.0.3 (jq extraction)
- csv 1 (CSV format parsing)
- async-trait, regex

21 integration tests covering file/command sources, template expansion,
error policies, and SQLite cache persistence.
Complete Phase 2c of dynamic pipelines:

- NATS push source: subscribe to subject, parse incoming messages,
  forward as RefreshTrigger::NatsPush to the scheduler
- File watch: per-source notify watcher with debouncing for
  RefreshPolicy::Watch sources
- RefreshScheduler: spawns interval timers, NATS subscriptions, and
  file watchers; coordinates re-resolution on triggers
- futures dependency added to nats feature for StreamExt

Dependencies: notify 8.2 (file watching), futures 0.3 (optional, nats)
Complete Phase 2d of dynamic pipelines:

Include expansion (sources/include.rs):
- Expands Transformation::Include directives by fetching source data
  and parsing it as transformation arrays
- Security: blocks remote includes (HTTP/NATS) unless explicitly
  allowed via allow_remote_include setting
- Uses rsigma-eval's parse_transformation_items (newly public)

Startup sequencing:
- resolve_all() now differentiates required vs optional sources
- Required sources (required=true) with Fail policy propagate errors
  and block daemon startup
- Optional sources that fail log a warning and use Null fallback

Daemon wiring (server.rs):
- Creates DefaultSourceResolver and sets it on RuntimeEngine
- Calls resolve_dynamic_pipelines() at startup (async)
- Spins up RefreshScheduler with trigger sender wired to AppState
- allow_remote_include carried across reload cycles in processor

Exports from rsigma-eval:
- TransformationItem and parse_transformation_items now public
Phase 3 of dynamic pipelines:

CLI `resolve` command:
- `rsigma resolve -p pipeline.yml` resolves all dynamic sources and
  prints their data as JSON
- `--source <id>` filters to a specific source
- `--pretty` for formatted output
- Exits non-zero if any resolution fails

Prometheus metrics (registered, ready for instrumentation):
- rsigma_source_resolves_total{source_id, source_type}
- rsigma_source_resolve_errors_total{source_id, error_kind}
- rsigma_source_resolve_seconds (histogram, 10ms-10s buckets)
- rsigma_source_cache_hits_total

The /api/v1/sources status endpoint was already added in Phase 2c.
Wire the declared source metrics into actual resolution:

- InstrumentedResolver wraps DefaultSourceResolver, recording per-call
  metrics: resolves_total, resolve_errors, resolve_latency, cache_hits
- ResolvedValue gains a `from_cache` flag so the wrapper can detect
  when the inner resolver served stale data on failure
- Daemon uses InstrumentedResolver instead of DefaultSourceResolver
- Error labels use the SourceErrorKind variant name (Fetch/Parse/
  Extract/Timeout) for grouping
- async-trait added to rsigma-cli daemon feature deps
Rust 1.88.0 cannot infer the shared element type when mixing &String
and &str in a slice passed to `with_label_values`. Use `.as_str()`
on the String references to produce a uniform `&[&str]`.
- Replace `printf` with file-based cat/type for lines test
- Replace `echo` with file-based cat/type for JSON tests (Windows
  echo mangles JSON through cmd.exe)
- Replace `false` with `cmd /C exit 1` on Windows for failure test
- All command tests now use #[cfg(unix)]/#[cfg(windows)] branches
@mostafa mostafa force-pushed the feat/dynamic-pipelines-resolve branch from ed5d714 to 3b37563 Compare May 6, 2026 20:38
@mostafa mostafa self-assigned this May 6, 2026
@mostafa mostafa merged commit 2884dfa into main May 6, 2026
12 checks passed
@mostafa mostafa deleted the feat/dynamic-pipelines-resolve branch May 6, 2026 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant