Add documentation site at docs/ using MkDocs Material#129
Merged
Conversation
Lays down the documentation skeleton at the repo root using the standard MkDocs layout (mkdocs.yml + docs/), themed to match docs.zizmor.sh: default Material palette, minimal feature flags (navigation.expand/sections/ footer/tracking + content.action.edit/view + content.code.copy/annotate), link-emoji TOC permalinks, and pymdownx.magiclink for automatic GitHub issue/PR/commit references. Plugin set is curated from the mkdocs/catalog with reproducible CI in mind: mkdocs-material[imaging], awesome-pages, section-index, glightbox, git-revision-date-localized, git-committers-2, include-markdown, macros, minify, rss, table-reader, llmstxt, redirects. All pinned by exact version in docs/requirements.txt. Awesome-pages .pages files in every section folder control the sidebar order. Macros plugin reads docs/_data/vars.yml so version, MSRV, and docker image tag stay in lockstep with the workspace Cargo.toml on release. A small extra.css tightens reference tables. The landing page (docs/index.md) uses Material's grid cards for six audience-specific entry ramps, a comparison table against pySigma / sigma_engine / sigma-rust, an admonition wall for the featured-in quotes (DEW #149/#154, tl;dr sec #320, BlackNoise), and a five-article table linking the published deep dives. strict: true is on so the next CI build will catch unrecognised links and omitted files before they ship.
Three pages that take a new operator from install to first detection without bouncing around the rest of the docs: - installation.md: cargo install with feature matrix (daemon-nats, daemon-otlp, evtx, daachorse-index), hardened Docker run with cosign verification, signed binary archives per target, build-from-source. - quick-start.md: one rule, one event, one daemon, in five steps. Every snippet is verified against rsigma 0.11.0 with --pretty for readable first output, then the compact NDJSON form for production. Includes a daemon shutdown note pointing at SIGTERM drain behaviour. - concepts.md: opinionated tour of Sigma rules, selections/modifiers, eval vs daemon, processing pipelines (static, builtin, dynamic), conversion backends, input formats, the noun-led CLI groups, and output payloads. Includes a SigmaHQ reading list pointing back to the canonical authoring docs. Style: zizmor-inspired plain code blocks for command output, with stderr noise pushed into prose so each scenario reads as one command followed by one code fence.
Two core engine pages that turn the noun-led commands into narrative tutorials: - evaluating-rules.md: the five input modes for engine eval (inline, @file NDJSON, @file EVTX, stdin, plain text), pipelines and field mapping, jq/jsonpath extraction with array-unwrap, in-memory correlation behaviour with --suppress/--action/--correlation-event- mode, --include-event semantics, --fail-on-detection for CI, and an explicit eval-vs-daemon decision table. - streaming-detection.md: the daemon's life cycle from input through LogProcessor to fan-out sinks (shown as an ASCII flowchart); input sources (stdin/HTTP/NATS/OTLP), output sinks with --dlq fallback, --buffer-size / --batch-size / --drain-timeout tuning, the three hot- reload triggers (file watcher, SIGHUP, /api/v1/reload) with ArcSwap semantics, SQLite state persistence with smart restore-during-replay, the HTTP API surface, structured JSON logging via tracing-subscriber, graceful shutdown, and a production checklist. CLI flag tables stay in the CLI Reference; these pages are the tutorial layer on top.
- rule-conversion.md: the PostgreSQL backend (modifier mapping table, five output formats including TimescaleDB continuous aggregates and sliding-window correlation, backend options including JSONB mode, multi-table temporal CTEs, OCSF pipelines, per-rule custom attributes) and the LynxDB SPL2 backend (deferred where-clauses, parenthesisation for non-standard precedence). Cross-links the canonical LynxDB Sigma guide at docs.lynxdb.org for operator-facing material. Closes with a complete rules-to-Grafana workflow. backend targets / backend formats command outputs match the actual rendering. - linting-rules.md: covers all 66 lint rules categorised by what they inspect (infrastructure / metadata / detection / correlation / filter / detection-logic), the four severities and --fail-level gate, the 13 safely-fixable rules and the --fix workflow, the three-tier suppression system (CLI --disable/--exclude, .rsigma-lint.yml, inline comments), JSON schema validation, and CI patterns for GitHub Actions and pre-commit hooks.
- processing-pipelines.md: covers static pipelines (the 26 transformation types, three-tier condition system, priority-based chaining, custom attributes for engine and PostgreSQL backends), the two builtin pipelines (ecs_windows, sysmon), and the dynamic pipelines that are unique to RSigma: source types (file/http/command/nats), data formats, three extract languages (jq/JSONPath/CEL), refresh policies, error handling, include directives, the rsigma pipeline resolve test path, hot-reload triggers including the dedicated /api/v1/sources/resolve endpoint, and the security limits enforced on every dynamic source. - input-formats.md: the seven supported formats (JSON, syslog, logfmt, CEF, EVTX, OTLP, plain text), when to use each, auto-detect behaviour, syslog timezone handling, EVTX @file routing, OTLP LogRecord-to-JSON flattening, and the timestamp-extraction priority list with all accepted formats and the --timestamp-fallback skip option for forensic replay.
…cipes
- nats-streaming.md: end-to-end walk of the daemon's JetStream
integration. Why JetStream (at-least-once + server-confirmed
publishes) vs core NATS, source/sink URLs, the five auth methods
including mutual TLS, at-least-once delivery semantics and what they
do and do not guarantee, the three replay modes and the state-restore
matrix that decides whether SQLite correlation state is kept or
cleared, consumer groups for horizontal scaling, the dead-letter
queue, and a production tuning checklist with a complete reference
invocation.
- otlp-integration.md: enabling daemon-otlp, the HTTP + gRPC endpoints
multiplexed on the same --api-addr, the LogRecord-to-JSON flattening
(including how map bodies become top-level fields so the same Sigma
rule matches OTLP-shipped events), and minimal copy-paste recipes
for Grafana Alloy, Vector, Fluent Bit, and the OpenTelemetry
Collector. Every recipe verified end-to-end against the rsigma 0.11.0
daemon built with daemon-otlp:
- Alloy 1.16.1: needs --stability.level=public-preview for
otelcol.receiver.filelog plus a json_parser operator.
- Vector 0.55.0: its opentelemetry sink is OTel-to-OTel passthrough
only, so the realistic file-to-RSigma path is the http sink
pointed at /api/v1/events with newline_delimited framing.
- Fluent Bit 5.0.5: needs an explicit Parser json + parsers.conf
for the opentelemetry output to produce field-keyed bodies.
- otelcol-contrib 0.152: switched filelog -> file_log and otlphttp
-> otlp_http to silence the v0.152 deprecation warnings.
Closes with TLS/auth via reverse proxy, observability hooks, and how
OTLP composes with other inputs on a single daemon.
`--replay-from-latest` maps to JetStream's DeliverLast policy, which
delivers the last existing message plus all new ones. The previous wording
("skip stream history entirely") was wrong. Also document that the daemon
calls get_or_create_stream / get_or_create_consumer on startup, and replace
the placeholder DLQ entry with the actual JSON shape rsigma emits.
Verified end-to-end against nats:2.11-alpine with auth, replay, consumer
groups, and DLQ.
The previous SQL example for `sliding_window` named the seed CTE `combined_events` and omitted the time-window predicate. Actual output uses a CTE named `source` containing `WHERE time >= NOW() - INTERVAL '<timespan>'`. Also note that base detection rules return `unknown output format: sliding_window` and require `--skip-unsupported` to convert correlation rules alongside them. For `continuous_aggregate`, the supported path is the per-rule `CREATE MATERIALIZED VIEW ... WITH (timescaledb.continuous) AS ... WITH NO DATA` wrapper for base detections. The current correlation output is malformed (nested CREATE MATERIALIZED VIEW inside a CTE); document the supported path and steer users away from the broken one.
…e_placeholders
The previous example wired `${source.ip_blocklist}` into
`add_condition.conditions.DestinationIp`. That does not work:
- `parse_value_mapping` only accepts scalar `String`, `Number`, `Bool`,
`Null`. YAML sequences silently fall through to `SigmaValue::Null`.
- `TemplateExpander::expand` only substitutes `vars:` entries, never
transformation field values.
Replace the example with the actually supported pattern: declare a
`vars:` entry pointing at the dynamic source, add the `value_placeholders`
transformation, and reference the resulting value from rules using the
standard Sigma `%name%` placeholder. Add a note that transformation field
values do not substitute `${source.*}` directly.
Verified end-to-end with an HTTP source, four test events, and the
expected two matches.
EVTX records ship as nested JSON from the evtx crate (Event.System.EventID, Event.System.Channel, Event.EventData.<name>), not the flat field names previously documented. Sigma rules must reference fields by their full dotted path. The builtin `sysmon` and `ecs_windows` pipelines do not flatten this structure; they assume already-flat events. Document the actual record shape and steer users toward dotted paths or a custom `field_name_mapping` pipeline. Also correct the stderr summary line to "Processed N EVTX records, M matches." to match what the eval loop emits. The `--syslog-tz` flag rejects `+HHMM`; the implementation requires `+HH:MM` or `-HH:MM`. Update the three example invocations across evaluating-rules, input-formats, and streaming-detection.
The CI/CD page documents the four structured exit codes verified
against the running 0.11.0 binary (0 success, 1 findings, 2 rule-error,
3 config-error), when each command actually emits each code, the
failure-controlling flags (`--fail-on-detection`, `--fail-level`,
`--resolve-sources`), and copy-paste pipelines for GitHub Actions,
GitLab CI, pre-commit, and a generic shell runner.
The four YAML code blocks were parsed with PyYAML to catch
continuation, anchor, and key-ordering bugs before publishing.
Notable correctness checks that informed the final shape:
- GitHub Actions example follows the repo's own development-workflow
rule: top-level `permissions: {}`, per-job least-privilege
`contents: read`, `persist-credentials: false` on every checkout,
every action pinned to a full commit SHA with a version comment
(matching the SHAs the repo uses today), and `cargo install` pinned
via a workflow-level `RSIGMA_VERSION` env. The accompanying zizmor
tip is now consistent with the example instead of contradicting it.
- GitLab CI uses the modern `rules:` form (`only:` is deprecated) and a
Debian image that explicitly installs curl + ca-certificates before
fetching the release archive. The previous YAML had a broken `curl`
continuation that became two separate list items.
- Pre-commit hooks now use `pass_filenames: false` on both `rsigma
rule lint` and `rsigma rule validate`. Verified against the binary
that those commands accept a single `<PATH>` argument and reject
additional positional paths with `error: unexpected argument`.
While verifying, two related claims elsewhere were wrong and got
fixed in the same commit:
- `evaluating-rules.md` claimed exit 2 means "rule error" for `engine
eval`. In practice `engine eval` logs per-rule parse errors as
warnings and only exits 2 when the rules path itself cannot be
read. Use `rule validate` for a strict per-rule gate.
- `installation.md` claimed the Linux archives are musl-linked and
cosign signed. The release workflow actually targets
`x86_64-unknown-linux-gnu` / `aarch64-unknown-linux-gnu`, and
archives use SLSA build provenance attestations via
`actions/attest-build-provenance` rather than cosign keyless
signatures (cosign signs the GHCR Docker image, not the archives).
Verification path now shows `gh attestation verify ...`.
Chicago Manual of Style §6.106 (also Oxford / New Hart's Rules) says
to close the slash up to single words ("and/or", "read/write",
"pass/fail") and space it only when at least one side is multi-word
("World War I / World War II"). The user-guide pages were
inconsistent: some inline references used `stdout/stderr` (correct),
others used `stdout / stderr` (non-standard).
Closed all single-word slash pairs across the guide and getting-started
sections (JSON/NDJSON, stdin/HTTP, NATS/OTLP, NATS/DLQ, stdout/file,
RFC 3164/5424, CEF/ArcSight, --jq/--jsonpath, event_count/value_count,
field_name_prefix/field_name_suffix, is_sigma_rule/is_sigma_correlation_rule,
"true"/"false", max_age/max_msgs, connect/disconnect/reconnect,
HTTP/file/command, Alpine/scratch). Kept spaces only where at least
one side is multi-word: "Cisco / Palo Alto / network appliance",
"IS NOT NULL / IS NULL", "this event should match" / "this event
should not match".
Includes a small box-width adjustment to the daemon ASCII diagram in
streaming-detection.md so the boxes still line up after closing the
slashes.
Covers the four practical knobs: - The always-on matcher optimizer (Aho-Corasick collapse at >=8 contains needles, RegexSet collapse at >=3 regex matchers, CaseInsensitiveGroup wrapper). No user flag; documented so operators know what is already happening at compile time. - `--bloom-prefilter` and `--bloom-max-bytes` for substring-heavy rule sets paired with mostly-non-matching telemetry. Off by default because the trigram probe is pure overhead when most events overlap a needle. - `--cross-rule-ac` (feature-gated on `daachorse-index`) for very large rule sets dominated by shared positive substrings. Quantified with the published `eval_cross_rule_ac` numbers (~68x at 1K, ~101x at 10K rules) and qualified as best-case only. Notes that `cargo install rsigma` does not include the feature by default but the release archives and Docker image do, because both are built with --all-features. - Daemon throughput: `--batch-size` and `--buffer-size` with the tail-latency trade-off; correlation memory pressure controls (`timespan`, `max_correlation_events`, `--correlation-event-mode`). Verified all benchmark commands list real Criterion groups: `eval_single_event`, `eval_bloom_rejection`, `eval_cross_rule_ac` in rsigma-eval; `runtime_throughput` and `dynamic_pipelines` in rsigma-runtime.
Documents the four observability surfaces operators actually wire up:
- `--log-format <json|text>` and how it composes with `RUST_LOG`. Clear
about the asymmetry: the daemon always emits JSON, the other
subcommands need an explicit `--log-format` to enable a tracing
subscriber at all.
- `RUST_LOG` filter targets, verified against the running 0.11.0 binary
in both INFO and DEBUG modes. Every target listed is one I actually
saw emitted: `rsigma::daemon::{server,reload,health}`,
`rsigma_runtime::{engine,sources,sources::refresh}`,
`rsigma_eval::{engine,correlation_engine}`, `async_nats(::connector)`,
`tower_http::trace::{on_request,on_response}`. No invented targets.
Includes four copy-paste RUST_LOG recipes for the cases the daemon
hits in practice (quiet prod, hot-reload debugging, dynamic source
timeouts, HTTP access log).
- Spans (`load_rules`, `evaluate_batch`, `otlp_logs_request`) with the
exact JSON shape the daemon emits, including the `span` / `spans`
call-stack fields.
- Prometheus metrics: the 27 definitions grouped into engine
throughput, queue and back-pressure, rule and state load, per-rule
labels, dynamic sources, and OTLP. Notes that a startup `/metrics`
scrape exposes ~20 names; the per-rule and OTLP counters only appear
after their first event.
Includes four ready-made Prometheus alerts (back-pressure,
correlation state pressure, DLQ volume, reload failures) and a short
section on `/healthz` and `/readyz` for orchestrator probes.
Cross-links to Streaming Detection, Performance Tuning, NATS Streaming,
Prometheus metrics reference, HTTP API reference, and the upstream
`tracing-subscriber` EnvFilter docs.
Thirteen pages, one per `--help` surface, plus a section index that includes the flat-to-grouped migration table and a copy-paste sed script for repos still on pre-0.11 invocations. Each subcommand page follows the same template (synopsis, description, flag tables grouped by concern, examples, exit codes, related guide / reference links) so operators can predict where to look without hunting through prose: - engine: eval, daemon - rule: parse, validate, lint, fields, condition, stdin - backend: convert, targets, formats - pipeline: resolve Every flag and default is extracted from the actual --help output of an rsigma binary built with `--all-features`, including the NATS auth / replay / consumer-group flags that are feature-gated behind `daemon-nats`. Exit codes match the structured scheme verified against the binary in the CI/CD commit. The CLI index documents the four global exit codes, the global --log-format flag, the high-level command tree, and the per-subcommand migration table referencing issues #125 (hide aliases) and #126 (remove aliases). Also drops the `attack` placeholder from the CLI surface across the docs (CLI index, command tree, Concepts CLI table, installation verification line). It was a reserved-but-empty group that distracts operators reading the docs today; the upcoming ATT&CK tooling can reintroduce a row in those four places when it actually ships something invokable.
`mkdocs build` writes the rendered site to `/site/` in the repo root. Useful for one-shot CI deploys and local previews, but the directory is generated and should not land in git. Adds `/site/` to .gitignore so the next `mkdocs build` does not pollute `git status`.
…ibutes Four short reference pages that were already linked from CLI and User Guide pages but didn't exist yet. Each is the canonical table for one operator-facing surface: - exit-codes.md: the four structured codes (`0 SUCCESS`, `1 FINDINGS`, `2 RULE_ERROR`, `3 CONFIG_ERROR`) and which command emits which code in which situation. Source-of-truth pointer at the `exit_code` module. Per-command behaviour matrix. - environment-variables.md: every env var rsigma actually reads (RUST_LOG, NO_COLOR, RSIGMA_CONSUMER_GROUP, NATS_CREDS / NATS_TOKEN / NATS_USER / NATS_PASSWORD / NATS_NKEY) plus a "NOT read" list to preempt confusion about OTEL_*, SIGMA_*, and PROMETHEUS_* variables. Precedence example shows that CLI flags override env vars. - feature-flags.md: the per-crate feature inventory verified against the workspace Cargo.tomls (rsigma-cli, rsigma-eval, rsigma-runtime, rsigma-parser). Documents the build recipes (default, daemon-nats/daemon-otlp/evtx/daachorse-index combinations, --all-features for the release shape), the CI matrix coverage, and the `--help`-based feature-detection workaround for users who need to confirm a specific build's surface area. - custom-attributes.md: the `rsigma.*` and `postgres.*` namespaces. Documents precedence (rule-level > set_state > -O backend option > backend default), six rsigma.* attributes with their CLI flag equivalents and scopes, and three postgres.* attributes for backend routing. Includes worked YAML examples.
Three medium-size reference pages, verified against a running 0.11.0 daemon built with --all-features. - metrics.md: the 27-metric catalogue split into four groups (16 core always-on, 2 per-rule labelled, 5 dynamic source, 3 OTLP) with type, labels, and verbatim HELP text. Includes alert recipes (back pressure, correlation state pressure, DLQ volume, reload failures, source staleness) and histogram bucket guidance. - http-api.md: every endpoint exposed on `--api-addr` with request body, response body, and verified curl examples. Covers /healthz, /readyz, /metrics, the /api/v1/* control endpoints, /v1/logs, and the gRPC pointer. Notes that authentication is not yet implemented and points at #128 for the planned TLS termination. - builtin-pipelines.md: full field-mapping tables for ecs_windows (per-category Sigma -> ECS renames) and sysmon (23 logsource category -> EventID injections). Explains why neither builtin matches raw .evtx records directly and how to chain builtins with file pipelines via priority. Also restructures the CLI overview's --log-format row from `--log-format <json\|text>` to a separate "Values" column so the markdown table renders cleanly without backslash-escaped pipes.
MkDocs Material's table renderer (pymdownx.tables) emits the literal
backslash from `\|` inside backticks, so `--correlation-event-mode
none\|full\|refs` shows up as `none\|full\|refs` for readers.
Rewrite every affected cell to avoid embedding pipes in inline code:
- evaluating-rules.md: `--action alert\|reset` -> `--action <alert,reset>`
with the values explained in the description, same for
`--correlation-event-mode`.
- performance-tuning.md: `\|contains`, `\|re` -> `contains`, `re`.
Context makes it obvious they are Sigma modifier names.
- linting-rules.md: `\|all`, `\|exists: true` -> `all`, `exists: true`.
- rule-conversion.md: LynxDB syntax table rewrites SPL2 pipe-bearing
cells like "Deferred to `\| where field=~\"regex\"`" into prose
("Deferred to a `where field=~\"regex\"` pipeline stage") and adds
a one-line definition of what "Deferred" means. Sigma modifier
references stop including the leading `|`.
- concepts.md: backends table rewrites the LynxDB syntax cell to
describe the SPL2 shape in prose rather than embedding `\|`.
The standalone pipe character outside inline code (e.g. the LynxDB
example "`FROM main | search ...`" outside a table) renders correctly
and stays unchanged.
Verified every claim from the recent CLI reference and reference-batch
commits against a 0.11.0 daemon and the source. Several claims were
wrong; this commit fixes them.
Metric labels (metrics.md):
- `rsigma_detection_matches_by_rule_total` labels are `rule_title`,
`level`, not `rule_id`. Added a caveat that `rule_title` is not
unique in a rule set; for collision-free per-rule analytics, join
against the detection NDJSON by `rule_id` outside Prometheus.
- `rsigma_correlation_matches_by_rule_total` adds a third
`correlation_type` label.
- `rsigma_source_resolves_total` labels are `source_id`, `source_type`,
not `source_id`, `result`. The metric counts every attempt
(successful or not), with errors broken out into
`rsigma_source_resolve_errors_total` instead of a `result` label.
- `rsigma_source_resolve_errors_total` labels are `source_id`,
`error_kind` with values `Fetch`, `Parse`, `Extract`, `Timeout`,
`ResourceLimit` (the `SourceErrorKind` variant names).
- `rsigma_source_resolve_seconds` has no labels (global histogram).
- `rsigma_source_cache_hits_total` has no labels (global counter).
- `rsigma_otlp_requests_total` labels are `transport`, `encoding`
(not `transport`, `result`).
- `rsigma_otlp_errors_total` labels are `transport`, `reason` with
values `unsupported_content_type`, `decompression`, `decode`,
`channel_closed`.
Exit codes (exit-codes.md, cli/rule/{parse,stdin,fields}.md,
cli/backend/formats.md, cli/pipeline/resolve.md):
- `rule parse` returns `0` on YAML syntax or missing-required-field
issues (warnings only). It only exits `2` on IO errors.
- `rule stdin` is fully lenient: it always exits `0`, with YAML
warnings on stderr and the partial AST on stdout.
- `rule fields` returns `0` on per-rule parse errors (warnings only).
Exits `2` only on bad rules path and `3` on bad pipeline path.
- `backend formats` returns `0` for unknown backend names; the
error is printed but the process exits cleanly.
- `pipeline resolve` returns `0` even when sources fail; per-source
`status: "error"` is in the JSON output. For a strict CI gate, use
`rule validate --resolve-sources` (which exits `3`).
HTTP API (http-api.md):
- `DELETE /api/v1/sources/cache/{source_id}` response is
`{"status":"invalidated","source_id":"..."}` (not
`"cache_invalidated"`), and the endpoint is a no-op for nonexistent
source IDs (it still returns `200 OK`).
Pulls these inaccuracies out of the docs and adds a "non-obvious
behaviours" subsection to exit-codes.md so the lenient-exit-on-warnings
pattern is documented in one place.
dynamic-sources.md: the canonical spec for every aspect of dynamic
pipeline sources, derived from the source code (constants pinned
against rsigma_runtime::sources and rsigma_eval::pipeline::sources):
- Source declaration: every field on a DynamicSource (id, type,
format, extract, refresh, required, timeout, on_error, default,
max_body_size, max_stdout) with required-or-optional status.
- Source types: file, http, command, nats. Each with its own
type-specific fields, default values, and feature gating.
- Data formats: json, yaml, lines, csv with the libraries used.
- Extract languages: jq (jaq), jsonpath (serde_json_path, RFC 9535),
cel (cel-interpreter), with the explicit object-form syntax and the
plain-string jq shorthand.
- Refresh policies: once, <duration>, watch (file only), push (NATS
only), on_demand with all five hot-reload trigger surfaces.
- Error policies: use_cached, fail, use_default, including the
interaction with the `required` flag.
- Template substitution: documents the actual TemplateExpander
behaviour. Inline templates are supported (caught a doc bug -
earlier draft claimed they weren't) but whole-value substitution
via `${source.X}` in a `vars:` entry is the safe form for array
sources.
- Include directives: MAX_INCLUDE_DEPTH = 1, --allow-remote-include
opt-in.
- Resource limits: MAX_SOURCE_RESPONSE_BYTES 10 MiB,
DEFAULT_COMMAND_TIMEOUT 30 s, 64 KiB stderr cap, MIN_REFRESH_INTERVAL
1 s, MAX_INCLUDE_DEPTH 1.
security.md: catalogues every input bound, parser-robustness choice,
and operational concern, all verified against source:
- Input caps: MAX_LINE_BYTES (1 MiB), MAX_CONDITION_LEN (64 KiB),
MAX_CONDITION_DEPTH (64), MAX_NESTING_DEPTH (64),
MAX_WINDASH_DASHES (8), MAX_CHAIN_DEPTH (10), max_state_entries
(100,000 default).
- Dynamic pipeline limits: cross-referenced from dynamic-sources.md.
- Parser robustness: yaml_serde 0.10 (the maintained serde_yaml fork),
bounded recursive-descent condition parser, evtx streaming reader,
prost+tonic OTLP, plus the fuzz harness inventory.
- SQL injection prevention: actual validate_sql_identifier regex is
`^[A-Za-z_][A-Za-z0-9_$]*$` (allows `$`, which I had wrong in an
earlier draft); single-quote escape via doubling.
- Process and concurrency: SIGTERM/SIGINT drain, SIGHUP reload via
ArcSwap, parking_lot::Mutex on the hot processor path with
std::sync::Mutex elsewhere (caught and corrected the earlier
over-broad parking_lot claim).
- Network exposure: documents that the daemon HTTP/gRPC listeners are
unauthenticated today and points at #128 for planned TLS.
- Supply chain: cargo audit, dependabot, cosign-signed Docker image,
SLSA Build L3 provenance on archives, Grype scan gate.
- Threat model summary in one paragraph.
Create the three meta pages that the docs nav (.pages) already lists
but were never populated:
- docs/release-notes.md includes the root CHANGELOG.md verbatim.
- docs/contributing.md includes the root CONTRIBUTING.md verbatim,
followed by a short pointer to the development-workflow rule file
in the repo.
- docs/security-policy.md includes the root SECURITY.md verbatim,
followed by a pointer to the runtime Security Hardening reference.
This is the pattern the docs plan specified for "files that already
exist at the repo root and should not be duplicated".
Side fixes uncovered while landing this:
- docs/getting-started/{concepts,installation}.md were pointing at
../developers/contributing.md, a page that does not exist. Rewired
them to ../contributing.md (the new include-markdown page).
- The root CONTRIBUTING.md ended with `[MIT License](LICENSE)`. The
relative path resolves on GitHub's repo view but 404s when included
into the docs site (which does not ship LICENSE). Rewrote to an
absolute github.com URL so it works in both contexts.
Remaining mkdocs warnings about unwritten reference and developer
pages (backends/postgres, backends/lynxdb, lint-rules, library/*,
developers/*, editors/*) are the forward-references to to-do batches
and will resolve as those pages land.
Two backend-specific reference pages, both backed by live runs of
backend convert and verified against the source.
postgres.md:
- Backend options: table, schema, database, timestamp_field,
json_field, case_sensitive_re. Default value and precedence chain
documented for each.
- Full Sigma-modifier-to-PostgreSQL operator table with field
quoting, single-quote escaping, and the SQL identifier validation
regex (`^[A-Za-z_][A-Za-z0-9_$]*$`) cross-linked to Security
Hardening.
- Output formats (default, view, timescaledb, continuous_aggregate,
sliding_window) with verified SQL output for each.
- JSONB mode: top-level (`->>`) and dotted-path (`->...->>`)
extraction.
- Correlation strategy per type (event_count, value_count, value_sum,
value_avg, value_percentile, value_median, temporal, plus the
not-yet-shipped temporal_ordered).
- Custom attributes (postgres.table/schema/database).
- OCSF pipeline pointers and the open-items roadmap.
lynxdb.md:
- How it differs from PostgreSQL: no table/schema, no WHERE,
non-standard boolean precedence, deferred modifiers via `where`
pipeline stages.
- Index selection via `set_state` (default `main`); validated as a
SQL identifier.
- Modifier mapping with native vs deferred categorisation.
- Output formats: default (full query) and minimal (search expression
only for REST API).
- Boolean precedence explanation with the parenthesisation rationale.
- Six worked examples (plain string, integer, custom index, deferred
regex, CIDR with combination, keyword).
- Limitations table including unsupported value modifiers and the
not-yet-shipped temporal_ordered.
Three live-verification corrections caught while writing:
- PostgreSQL CIDR emits `("DestinationIp")::inet <<= '10.0.0.0/8'::cidr`
(the field cast is wrapped in parens). My first draft omitted them.
- PostgreSQL keywords emit `to_tsvector('simple', ROW(*)::text) @@
plainto_tsquery('simple', 'value')`, NOT `to_tsvector('english',
"field") @@ plainto_tsquery('english', 'value')`. The configuration
is 'simple' (no stemming) over every column concatenated, which is
intentionally broader than per-field FTS. Documented this design
choice explicitly.
- LynxDB keyword emits `"keyword"` (double-quoted), not bare token.
Catalogue of every rsigma-parser lint rule. All 66 enum variants
verified against the source; the 13 with safe auto-fixes are listed in
a dedicated table near the top with what each fix actually does.
Per-section tables follow the source file structure:
- Infrastructure (4): yaml_parse_error, not_a_mapping, file_read_error,
schema_violation. Pre-parse failures; cannot be suppressed inline.
- Shared metadata (16): title, id, level, status, date, author,
description, key naming.
- Detection (18): detection block, condition, logsource, tags,
references, modifiers.
- Correlation (13): correlation block structure and validity.
- Filter (7+1 reserved): filter block constraints. Includes the
reserved-but-unemitted empty_filter_rules variant.
- Detection-modifier hygiene (cross-references): subset of detection
rules that flag modifier misuses.
Severity counts verified by parsing every rule emission site
(`error(...)`, `warning(...)`, `info(...)`, `LintWarning { severity:
... }`):
- 33 error
- 29 warning
- 3 info
- 0 hint (the variant is defined but unused by shipped rules)
- 1 reserved (empty_filter_rules: declared, never emitted)
Fix-attachment verified by walking each `w.fix = safe_fix(...)` call
site and each inline `Fix { ... }` struct construction back to its
nearest LintRule emission. Earlier draft over-attributed fixes to six
rules (title_too_long, invalid_id, scope_too_short, taxonomy_too_long,
unknown_tag_namespace, missing_filter_condition, missing_filter_logsource);
those rows now correctly read "—".
Hybrid approach in response to the zizmor-style audits reference. Full
per-rule pages would balloon the lint catalogue to ~3000 lines for 66
rules, most of which are spec hygiene ("missing X", "invalid Y") that
need no extra prose. Instead, add a "Selected findings, with worked
examples" section for the ten rules that genuinely surprise authors
on first contact.
Each entry shows the triggering YAML, explains what's wrong, and
shows the fix:
- wildcard_only_value: `'*'` value vs `|exists: true`.
- single_value_all_modifier: `|all` is a no-op on a single value.
- all_with_re: `|all` is meaningless with `|re`.
- non_lowercase_key: `Title:` silently does not work because keys
are case-sensitive.
- condition_references_unknown: the `condition:` names a selection
that does not exist.
- deprecated_aggregation_syntax: migrating the pipe-aggregation form
to a separate `correlation:` document.
- duplicate_fields: YAML duplicate-key silent overwrite and how to
express "any of" with a list.
- unknown_tag_namespace: the recognised set is small (attack, cve,
detection, tlp, stp, informational) and typos slip through.
- null_in_value_list: the three things a bare `null` could mean and
how to disambiguate.
- invalid_status and invalid_level: the exact valid value sets.
The catalogue tables at the top stay compact. Operators looking for
"why did this rule fire" land on the worked example via the rule's
heading; everyone else skims the tables.
The page renders the canonical mermaid diagram at `assets/architecture.mmd` inline (superfences + mermaid is already wired in mkdocs.yml). Sections: - Crate map: the diagram plus a short legend for the feature-gated paths. - Crate responsibilities: one row per crate (rsigma-parser, rsigma-eval, rsigma-convert, rsigma-runtime, rsigma-lsp, rsigma-cli) with role, key types, and feature flags. - The four execution shapes: library, `engine eval`, `engine daemon`, `backend convert`, plus the LSP. Each links into the guide that explains it. - Data flow: YAML to AST, AST to compiled rules, event evaluation, the streaming pipeline (bounded mpsc + ArcSwap), dynamic source resolution. - Performance posture and threat model: short paragraphs cross-linking to BENCHMARKS.md, Performance Tuning, and Security Hardening.
…n page The benchmark numbers are referenced from performance-tuning, architecture, and the matcher-optimizer threshold discussion. Inlining BENCHMARKS.md via mkdocs-include-markdown lets operators look up specific Criterion groups (eval_single_event, eval_throughput, eval_bloom_rejection, eval_cross_rule_ac, runtime_throughput, dynamic_pipelines) without leaving the docs. Also resolves the standing mkdocs warning that developers/benchmarks.md is referenced from .pages but never existed. Rewires three cross-references away from the GitHub-hosted blob and onto the new in-docs page: - performance-tuning.md: matcher-optimizer threshold sweep citation and the bench-commands appendix. - architecture.md: the headline-figures paragraph and the See-also bullet. Other developer-section pages (testing, fuzzing, adding-backends, adding-input-formats, linter-and-lsp, contributing) are still on the to-do list; their nav slots stay declared in docs/developers/.pages so they slot in cleanly when written.
The benchmarks include lived at `docs/developers/benchmarks.md`, requiring `../../BENCHMARKS.md`. The three sibling root-md mirrors (release-notes, contributing, security-policy) sit at the top of `docs/` and use a single-level `../FILE.md` include. The inconsistency made the include path uglier and hid the page deeper in the nav than its operator-facing audience warranted. - Move the page to `docs/benchmarks.md` (`git mv` preserves history). - Drop the leading `../` from the include path to match the other three (`../BENCHMARKS.md`). - Add a top-level nav entry in `docs/.pages` between Reference and Deployment; remove the matching entry from `docs/developers/.pages`. - Update the four cross-references in `index.md`, `guide/performance-tuning.md`, and `reference/architecture.md` from `../developers/benchmarks.md` to `../benchmarks.md` (or `benchmarks.md` from the docs root).
…n cross-refs
Docker page covers the ghcr.io/timescale/rsigma image (multi-arch
linux/amd64+linux/arm64, FROM scratch, USER 65534), tag-and-digest
pinning, cosign keyless signature verification, gh attestation
verification, the standard runtime-hardening flag set, a
self-contained docker-compose.yml, the state-db writable-volume
pattern, and the build-from-source path for non-standard feature
combinations.
Verified against a locally built image (Dockerfile in repo root) on
Docker 29.4.3 / Apple Silicon:
- `docker run --rm rsigma --version` -> `rsigma 0.11.0`.
- `docker run --rm rsigma --help` shows the grouped command tree
(engine / rule / backend / pipeline / attack) plus the deprecated
flat aliases.
- `rule validate /rules/` against a bind-mounted rule.
- Same `rule validate` under
`--read-only --cap-drop=ALL --security-opt=no-new-privileges:true
--tmpfs /tmp` succeeds.
- `engine daemon -r /rules/ --input http --api-addr 0.0.0.0:9090`
under the same hardening returns 200 on /healthz and /readyz.
- `--state-db /state/correlation.db` writes the SQLite file via a
writable bind-mount. Added the note that on Linux hosts the host
directory must be writable by uid 65534 (chown or use a
Docker-managed volume).
- The state-db example now explicitly sets `--input http` so the
daemon stays alive (the default `--input stdin` exits as soon as
the container's stdin closes; spotted during the verification
pass).
- The docker-compose example brings up the daemon, /readyz returns
ready, and the `["CMD", "/rsigma", "--version"]` healthcheck
transitions to "healthy" (verified via
`docker compose ps --format 'table {{.Name}}\t{{.State}}\t{{.Status}}'`).
- `cosign verify --certificate-identity-regexp 'github.com/timescale/rsigma'
--certificate-oidc-issuer https://token.actions.githubusercontent.com
ghcr.io/timescale/rsigma:0.11.0` -> successful verification of the
v0.11.0 image's keyless cosign signature and SLSA provenance.
- `gh attestation verify --owner timescale oci://ghcr.io/timescale/rsigma:0.11.0`
-> verified SLSA provenance attestation with
`verifiedIdentity.subjectAlternativeName.regexp ==
^https://github.com/timescale/` and the v0.11.0 release-trigger
metadata.
Kubernetes guide: the draft is held in `docs-drafts/kubernetes.md`
(gitignored) until the canonical Helm chart lands and the manifests
can be exercised on a real cluster. mkdocs.yml `nav.Reference.Deployment`
and docs/deployment/.pages drop the kubernetes.md entry so the build
no longer warns about the missing page.
Side cleanup: two doc spots that linked to https://github.com/timescale/rsigma/issues
as if they backed a real issue when they were actually pointing at
roadmap-only entries. `reference/backends/postgres.md` referenced
"roadmap item #3" and `reference/feature-flags.md` referenced "the
roadmap" for a feature-introspection flag. Both rewritten as plain
prose ("Tracked on the project roadmap...") without misleading issue
links.
Adds an overview page and four per-crate pages (parser, eval, convert, runtime) that orient embedders without duplicating the docs.rs reference. Every code snippet uses the actual public API names verified against the sources, so a `cargo check` on a freshly extracted snippet would compile against the current workspace. The pages cross-link into the existing guide and reference pages and point readers at the per-crate READMEs on GitHub for the exhaustive trait surface.
Adds six developer pages: orientation, testing, fuzzing, adding-backends, adding-input-formats, and linter-and-lsp. The .pages nav drops the architecture and contributing entries that pointed at non-existent duplicates; the canonical pages already live at /reference/architecture and /contributing. The testing page describes the five-tier CI shape against the actual job names and commands in ci.yml. The fuzzing page documents all 15 cargo-fuzz harnesses with verified max_len values. The two "adding a *" walkthroughs use the real Backend trait and InputFormat shapes; the lint page describes the real `# rsigma-disable` syntax, LintConfig fields, and apply_suppressions signature. The library/parser.md page is updated in the same commit to use the Display-based rule id (w.rule) instead of a fictional .id() method.
detection.studio is a browser-based Sigma rule playground that wired in real-time rule evaluation by compiling RSigma to WebAssembly. The new section sits between Featured in and Read the deep dives, uses the same cards grid as the page header, and is structured so additional integrations can be added as one more card.
The contributor checklists at the end of the adding-backends,
adding-input-formats, and linter-and-lsp pages were already written with
the - [ ] task-list syntax, but the extension was not enabled so they
rendered as plain bullets. Adding pymdownx.tasklist with
custom_checkbox: true makes them render as Material's styled checkboxes.
Also drops the developers/{architecture,contributing,benchmarks}.md
entries from the llmstxt section list, since those pages live at
/reference/architecture, /contributing, and /benchmarks now.
Three pages covering integrations that had stale outbound links from the user guide, deployment guide, and architecture reference: - editors/vscode.md walks through the VS Code/Cursor extension build flow against the actual editors/vscode/package.json, the settings it exposes, and the .rsigma-lint.yml + # rsigma-disable conventions. - editors/neovim.md gives copy-paste configs for Neovim native LSP, nvim-lspconfig, Helix, Zed, Emacs (eglot), and Sublime LSP, all driving the same rsigma-lsp binary. - ecosystem/helr.md positions Helr as the polling collector that feeds rsigma engine daemon, with a complete docker-compose stack (NATS + Helr + rsigma) and a small Okta -> ECS pipeline example.
Two jobs: build runs on every PR with mkdocs build --strict (matching
the same dependency pinning as docs/requirements.txt), and deploy runs
only on main and only after the build job uploaded its artifact.
The workflow follows the repo CI conventions: top-level permissions:
{} with least-privilege per-job grants (contents: read for build,
pages: write + id-token: write for deploy), every action pinned by
full commit SHA with a version comment, persist-credentials: false on
checkout, concurrency group with cancel-in-progress, and a
workflow_dispatch trigger for manual runs. zizmor --pedantic passes
with zero findings.
The path filter covers docs/, mkdocs.yml, docs/requirements.txt, and
the four root-level files that the include-markdown plugin embeds
(README, CHANGELOG, CONTRIBUTING, SECURITY, BENCHMARKS), so a change
to any of those triggers a rebuild.
docs/.pages listed a Blog: blog.md item that pointed at a file that never existed; mkdocs serve was 404-ing on /blog.md. We do not plan to host a blog in the site (engineering posts live on mostafa.dev, already linked from the Read the deep dives section on the home page), so drop the nav entry entirely.
The plugin queries GitHub's REST API once per documentation page to render a committer-avatars row at the bottom of every page. In CI it exhausts the unauthenticated 60 req/hour quota almost immediately, aborting the strict build with two 403 errors per run. Authentication via GITHUB_TOKEN would lift the quota, but the avatar row is a nice-to-have rather than a load-bearing feature; the contributor list is already available via `git log` and the GitHub UI. Dropping the plugin keeps the build hermetic with no network dependency, and matches the simpler footer of the upstream mkdocs-material themed projects we modelled this site after.
docs/ using MkDocs Material
SecurityEnthusiast
pushed a commit
to SecurityEnthusiast/rsigma
that referenced
this pull request
May 20, 2026
The "operability, performance, and documentation" release. * Workspace bumped 0.11.0 -> 0.12.0; all 10 inter-crate dep pins refreshed; Cargo.lock regenerated under --locked. * CHANGELOG.md [Unreleased] section flipped to [0.12.0] - 2026-05-19; comparison link updated to v0.11.0...v0.12.0; tag reference added to the bottom-of-file link block. * CHANGELOG also gained a Documentation site (PR timescale#129) section under the existing observability / eval-perf / CLI-groups / test-reliability / dependencies headings, and the TL;DR theme moved from "operations and load performance" to "operability, performance, and documentation" to reflect the new docs site as a top-line deliverable. Covers all 13 PRs merged since v0.11.0: timescale#107 (observability), timescale#111/timescale#113/timescale#114/timescale#120 (dependency batches), timescale#115/timescale#123 (test reliability), timescale#119/timescale#121/timescale#122/timescale#123 (eval rule loading perf), timescale#124 (CLI command groups), timescale#127 (CLI docs followup), timescale#129 (documentation site).
mostafa
added a commit
that referenced
this pull request
May 20, 2026
The repo has had a published mkdocs site at docs/ since the v0.12.0 release (PR #129), but CONTRIBUTING.md never called out that user-facing docs need to ship alongside the code that changes them. Add a Documentation section that lists the two surfaces (crate READMEs and the mkdocs site under docs/) and the page-to-change matrix per kind of change (CLI flag, daemon config key, library API, metric, etc.). Also points contributors at `mkdocs build --strict` for local verification and at the docs.yml workflow that enforces the same on PRs, so the loop is closed before review.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Builds out the full operator and contributor documentation site under
docs/, deployable to GitHub Pages frommain, structured afterzizmor.shandlynxdb.org. 36 thematic commits, ~9.7k line additions across 82 files. The dev server has been running throughout the work; every code snippet was verified against a liversigmabinary or against the workspace source, not transcribed from memory.What landed
--all-features).engine,rule,backend,pipeline), with the migration table from the deprecated flat commands.rsigma-parser,rsigma-eval,rsigma-convert,rsigma-runtime) using only the actual public API surface.InputFormatenum, and lint module shapes.editors/vscode/) and Neovim/Helix/Zed/Emacs/Sublime (all drivingrsigma-lspdirectly)..github/workflows/docs.yml): build on every PR (strict mode), deploy only frommain, every action SHA-pinned with version comment,zizmor --pedanticreports zero findings.Built with RSigmasection featuring detection.studio (Sigma rule playground that compiled rsigma to WebAssembly).Verification
mkdocs build --strictpasses locally with zero warnings or errors.zizmor --pedantic .github/workflows/docs.ymlpasses with zero findings.rsigma:localimage frommain.Test plan
Docsworkflow runs on this PR (build job only) and produces no errors.GitHub Actionsas the source in repo settings (one-time setup; see thedeployjob'senvironment: github-pages).