Add documentation site at `docs/` using MkDocs Material by mostafa · Pull Request #129 · timescale/rsigma

mostafa · 2026-05-19T13:32:33Z

Summary

Builds out the full operator and contributor documentation site under docs/, deployable to GitHub Pages from main, structured after zizmor.sh and lynxdb.org. 36 thematic commits, ~9.7k line additions across 82 files. The dev server has been running throughout the work; every code snippet was verified against a live rsigma binary or against the workspace source, not transcribed from memory.

What landed

Getting Started, User Guide, CLI Reference, Library, Developers, Reference, Deployment, Editors, Ecosystem sections — every page covered.
66-rule lint catalogue with verified severities, fixability, and worked examples for the trickier rules.
27 Prometheus metrics catalogue with verified labels (cross-checked against a live daemon with --all-features).
CLI reference for every grouped subcommand (engine, rule, backend, pipeline), with the migration table from the deprecated flat commands.
Per-crate library overviews (rsigma-parser, rsigma-eval, rsigma-convert, rsigma-runtime) using only the actual public API surface.
Contributor walkthroughs for adding a new backend, a new input format, and a new lint rule, derived from the real Backend trait, InputFormat enum, and lint module shapes.
Editor pages for VS Code/Cursor (the wrapping extension under editors/vscode/) and Neovim/Helix/Zed/Emacs/Sublime (all driving rsigma-lsp directly).
Helr ecosystem page with a complete docker-compose stack pairing Helr with the rsigma daemon over NATS.
GitHub Pages deploy workflow (.github/workflows/docs.yml): build on every PR (strict mode), deploy only from main, every action SHA-pinned with version comment, zizmor --pedantic reports zero findings.
Home page includes a Built with RSigma section featuring detection.studio (Sigma rule playground that compiled rsigma to WebAssembly).

Verification

mkdocs build --strict passes locally with zero warnings or errors.
zizmor --pedantic .github/workflows/docs.yml passes with zero findings.
Every documented CLI flag, exit code, metric label, HTTP endpoint, environment variable, and feature flag was checked against the binary or source.
The Docker page commands were tested against a locally built rsigma:local image from main.
Dynamic-pipeline source resolution, NATS replay semantics, EVTX input, OTLP ingest, and backend output formats (Postgres view, timescaledb, continuous_aggregate, sliding_window, JSONB; LynxDB default and minimal) were each exercised end-to-end before being documented.

Test plan

Reviewers spot-check pages they own. Suggested entry points: Getting Started, Library overview, Lint Rules reference, Docker deployment, Helr ecosystem page.
Verify the new Docs workflow runs on this PR (build job only) and produces no errors.
After merge, confirm the deploy job publishes the site to https://timescale.github.io/rsigma/ and the artifact contains the full nav.
Enable GitHub Pages with GitHub Actions as the source in repo settings (one-time setup; see the deploy job's environment: github-pages).

Lays down the documentation skeleton at the repo root using the standard MkDocs layout (mkdocs.yml + docs/), themed to match docs.zizmor.sh: default Material palette, minimal feature flags (navigation.expand/sections/ footer/tracking + content.action.edit/view + content.code.copy/annotate), link-emoji TOC permalinks, and pymdownx.magiclink for automatic GitHub issue/PR/commit references. Plugin set is curated from the mkdocs/catalog with reproducible CI in mind: mkdocs-material[imaging], awesome-pages, section-index, glightbox, git-revision-date-localized, git-committers-2, include-markdown, macros, minify, rss, table-reader, llmstxt, redirects. All pinned by exact version in docs/requirements.txt. Awesome-pages .pages files in every section folder control the sidebar order. Macros plugin reads docs/_data/vars.yml so version, MSRV, and docker image tag stay in lockstep with the workspace Cargo.toml on release. A small extra.css tightens reference tables. The landing page (docs/index.md) uses Material's grid cards for six audience-specific entry ramps, a comparison table against pySigma / sigma_engine / sigma-rust, an admonition wall for the featured-in quotes (DEW #149/#154, tl;dr sec #320, BlackNoise), and a five-article table linking the published deep dives. strict: true is on so the next CI build will catch unrecognised links and omitted files before they ship.

Three pages that take a new operator from install to first detection without bouncing around the rest of the docs: - installation.md: cargo install with feature matrix (daemon-nats, daemon-otlp, evtx, daachorse-index), hardened Docker run with cosign verification, signed binary archives per target, build-from-source. - quick-start.md: one rule, one event, one daemon, in five steps. Every snippet is verified against rsigma 0.11.0 with --pretty for readable first output, then the compact NDJSON form for production. Includes a daemon shutdown note pointing at SIGTERM drain behaviour. - concepts.md: opinionated tour of Sigma rules, selections/modifiers, eval vs daemon, processing pipelines (static, builtin, dynamic), conversion backends, input formats, the noun-led CLI groups, and output payloads. Includes a SigmaHQ reading list pointing back to the canonical authoring docs. Style: zizmor-inspired plain code blocks for command output, with stderr noise pushed into prose so each scenario reads as one command followed by one code fence.

Two core engine pages that turn the noun-led commands into narrative tutorials: - evaluating-rules.md: the five input modes for engine eval (inline, @file NDJSON, @file EVTX, stdin, plain text), pipelines and field mapping, jq/jsonpath extraction with array-unwrap, in-memory correlation behaviour with --suppress/--action/--correlation-event- mode, --include-event semantics, --fail-on-detection for CI, and an explicit eval-vs-daemon decision table. - streaming-detection.md: the daemon's life cycle from input through LogProcessor to fan-out sinks (shown as an ASCII flowchart); input sources (stdin/HTTP/NATS/OTLP), output sinks with --dlq fallback, --buffer-size / --batch-size / --drain-timeout tuning, the three hot- reload triggers (file watcher, SIGHUP, /api/v1/reload) with ArcSwap semantics, SQLite state persistence with smart restore-during-replay, the HTTP API surface, structured JSON logging via tracing-subscriber, graceful shutdown, and a production checklist. CLI flag tables stay in the CLI Reference; these pages are the tutorial layer on top.

- rule-conversion.md: the PostgreSQL backend (modifier mapping table, five output formats including TimescaleDB continuous aggregates and sliding-window correlation, backend options including JSONB mode, multi-table temporal CTEs, OCSF pipelines, per-rule custom attributes) and the LynxDB SPL2 backend (deferred where-clauses, parenthesisation for non-standard precedence). Cross-links the canonical LynxDB Sigma guide at docs.lynxdb.org for operator-facing material. Closes with a complete rules-to-Grafana workflow. backend targets / backend formats command outputs match the actual rendering. - linting-rules.md: covers all 66 lint rules categorised by what they inspect (infrastructure / metadata / detection / correlation / filter / detection-logic), the four severities and --fail-level gate, the 13 safely-fixable rules and the --fix workflow, the three-tier suppression system (CLI --disable/--exclude, .rsigma-lint.yml, inline comments), JSON schema validation, and CI patterns for GitHub Actions and pre-commit hooks.

- processing-pipelines.md: covers static pipelines (the 26 transformation types, three-tier condition system, priority-based chaining, custom attributes for engine and PostgreSQL backends), the two builtin pipelines (ecs_windows, sysmon), and the dynamic pipelines that are unique to RSigma: source types (file/http/command/nats), data formats, three extract languages (jq/JSONPath/CEL), refresh policies, error handling, include directives, the rsigma pipeline resolve test path, hot-reload triggers including the dedicated /api/v1/sources/resolve endpoint, and the security limits enforced on every dynamic source. - input-formats.md: the seven supported formats (JSON, syslog, logfmt, CEF, EVTX, OTLP, plain text), when to use each, auto-detect behaviour, syslog timezone handling, EVTX @file routing, OTLP LogRecord-to-JSON flattening, and the timestamp-extraction priority list with all accepted formats and the --timestamp-fallback skip option for forensic replay.

…cipes - nats-streaming.md: end-to-end walk of the daemon's JetStream integration. Why JetStream (at-least-once + server-confirmed publishes) vs core NATS, source/sink URLs, the five auth methods including mutual TLS, at-least-once delivery semantics and what they do and do not guarantee, the three replay modes and the state-restore matrix that decides whether SQLite correlation state is kept or cleared, consumer groups for horizontal scaling, the dead-letter queue, and a production tuning checklist with a complete reference invocation. - otlp-integration.md: enabling daemon-otlp, the HTTP + gRPC endpoints multiplexed on the same --api-addr, the LogRecord-to-JSON flattening (including how map bodies become top-level fields so the same Sigma rule matches OTLP-shipped events), and minimal copy-paste recipes for Grafana Alloy, Vector, Fluent Bit, and the OpenTelemetry Collector. Every recipe verified end-to-end against the rsigma 0.11.0 daemon built with daemon-otlp: - Alloy 1.16.1: needs --stability.level=public-preview for otelcol.receiver.filelog plus a json_parser operator. - Vector 0.55.0: its opentelemetry sink is OTel-to-OTel passthrough only, so the realistic file-to-RSigma path is the http sink pointed at /api/v1/events with newline_delimited framing. - Fluent Bit 5.0.5: needs an explicit Parser json + parsers.conf for the opentelemetry output to produce field-keyed bodies. - otelcol-contrib 0.152: switched filelog -> file_log and otlphttp -> otlp_http to silence the v0.152 deprecation warnings. Closes with TLS/auth via reverse proxy, observability hooks, and how OTLP composes with other inputs on a single daemon.

`--replay-from-latest` maps to JetStream's DeliverLast policy, which delivers the last existing message plus all new ones. The previous wording ("skip stream history entirely") was wrong. Also document that the daemon calls get_or_create_stream / get_or_create_consumer on startup, and replace the placeholder DLQ entry with the actual JSON shape rsigma emits. Verified end-to-end against nats:2.11-alpine with auth, replay, consumer groups, and DLQ.

The previous SQL example for `sliding_window` named the seed CTE `combined_events` and omitted the time-window predicate. Actual output uses a CTE named `source` containing `WHERE time >= NOW() - INTERVAL '<timespan>'`. Also note that base detection rules return `unknown output format: sliding_window` and require `--skip-unsupported` to convert correlation rules alongside them. For `continuous_aggregate`, the supported path is the per-rule `CREATE MATERIALIZED VIEW ... WITH (timescaledb.continuous) AS ... WITH NO DATA` wrapper for base detections. The current correlation output is malformed (nested CREATE MATERIALIZED VIEW inside a CTE); document the supported path and steer users away from the broken one.

…e_placeholders The previous example wired `${source.ip_blocklist}` into `add_condition.conditions.DestinationIp`. That does not work: - `parse_value_mapping` only accepts scalar `String`, `Number`, `Bool`, `Null`. YAML sequences silently fall through to `SigmaValue::Null`. - `TemplateExpander::expand` only substitutes `vars:` entries, never transformation field values. Replace the example with the actually supported pattern: declare a `vars:` entry pointing at the dynamic source, add the `value_placeholders` transformation, and reference the resulting value from rules using the standard Sigma `%name%` placeholder. Add a note that transformation field values do not substitute `${source.*}` directly. Verified end-to-end with an HTTP source, four test events, and the expected two matches.

EVTX records ship as nested JSON from the evtx crate (Event.System.EventID, Event.System.Channel, Event.EventData.<name>), not the flat field names previously documented. Sigma rules must reference fields by their full dotted path. The builtin `sysmon` and `ecs_windows` pipelines do not flatten this structure; they assume already-flat events. Document the actual record shape and steer users toward dotted paths or a custom `field_name_mapping` pipeline. Also correct the stderr summary line to "Processed N EVTX records, M matches." to match what the eval loop emits. The `--syslog-tz` flag rejects `+HHMM`; the implementation requires `+HH:MM` or `-HH:MM`. Update the three example invocations across evaluating-rules, input-formats, and streaming-detection.

The CI/CD page documents the four structured exit codes verified against the running 0.11.0 binary (0 success, 1 findings, 2 rule-error, 3 config-error), when each command actually emits each code, the failure-controlling flags (`--fail-on-detection`, `--fail-level`, `--resolve-sources`), and copy-paste pipelines for GitHub Actions, GitLab CI, pre-commit, and a generic shell runner. The four YAML code blocks were parsed with PyYAML to catch continuation, anchor, and key-ordering bugs before publishing. Notable correctness checks that informed the final shape: - GitHub Actions example follows the repo's own development-workflow rule: top-level `permissions: {}`, per-job least-privilege `contents: read`, `persist-credentials: false` on every checkout, every action pinned to a full commit SHA with a version comment (matching the SHAs the repo uses today), and `cargo install` pinned via a workflow-level `RSIGMA_VERSION` env. The accompanying zizmor tip is now consistent with the example instead of contradicting it. - GitLab CI uses the modern `rules:` form (`only:` is deprecated) and a Debian image that explicitly installs curl + ca-certificates before fetching the release archive. The previous YAML had a broken `curl` continuation that became two separate list items. - Pre-commit hooks now use `pass_filenames: false` on both `rsigma rule lint` and `rsigma rule validate`. Verified against the binary that those commands accept a single `<PATH>` argument and reject additional positional paths with `error: unexpected argument`. While verifying, two related claims elsewhere were wrong and got fixed in the same commit: - `evaluating-rules.md` claimed exit 2 means "rule error" for `engine eval`. In practice `engine eval` logs per-rule parse errors as warnings and only exits 2 when the rules path itself cannot be read. Use `rule validate` for a strict per-rule gate. - `installation.md` claimed the Linux archives are musl-linked and cosign signed. The release workflow actually targets `x86_64-unknown-linux-gnu` / `aarch64-unknown-linux-gnu`, and archives use SLSA build provenance attestations via `actions/attest-build-provenance` rather than cosign keyless signatures (cosign signs the GHCR Docker image, not the archives). Verification path now shows `gh attestation verify ...`.

Chicago Manual of Style §6.106 (also Oxford / New Hart's Rules) says to close the slash up to single words ("and/or", "read/write", "pass/fail") and space it only when at least one side is multi-word ("World War I / World War II"). The user-guide pages were inconsistent: some inline references used `stdout/stderr` (correct), others used `stdout / stderr` (non-standard). Closed all single-word slash pairs across the guide and getting-started sections (JSON/NDJSON, stdin/HTTP, NATS/OTLP, NATS/DLQ, stdout/file, RFC 3164/5424, CEF/ArcSight, --jq/--jsonpath, event_count/value_count, field_name_prefix/field_name_suffix, is_sigma_rule/is_sigma_correlation_rule, "true"/"false", max_age/max_msgs, connect/disconnect/reconnect, HTTP/file/command, Alpine/scratch). Kept spaces only where at least one side is multi-word: "Cisco / Palo Alto / network appliance", "IS NOT NULL / IS NULL", "this event should match" / "this event should not match". Includes a small box-width adjustment to the daemon ASCII diagram in streaming-detection.md so the boxes still line up after closing the slashes.

Covers the four practical knobs: - The always-on matcher optimizer (Aho-Corasick collapse at >=8 contains needles, RegexSet collapse at >=3 regex matchers, CaseInsensitiveGroup wrapper). No user flag; documented so operators know what is already happening at compile time. - `--bloom-prefilter` and `--bloom-max-bytes` for substring-heavy rule sets paired with mostly-non-matching telemetry. Off by default because the trigram probe is pure overhead when most events overlap a needle. - `--cross-rule-ac` (feature-gated on `daachorse-index`) for very large rule sets dominated by shared positive substrings. Quantified with the published `eval_cross_rule_ac` numbers (~68x at 1K, ~101x at 10K rules) and qualified as best-case only. Notes that `cargo install rsigma` does not include the feature by default but the release archives and Docker image do, because both are built with --all-features. - Daemon throughput: `--batch-size` and `--buffer-size` with the tail-latency trade-off; correlation memory pressure controls (`timespan`, `max_correlation_events`, `--correlation-event-mode`). Verified all benchmark commands list real Criterion groups: `eval_single_event`, `eval_bloom_rejection`, `eval_cross_rule_ac` in rsigma-eval; `runtime_throughput` and `dynamic_pipelines` in rsigma-runtime.

Documents the four observability surfaces operators actually wire up: - `--log-format <json|text>` and how it composes with `RUST_LOG`. Clear about the asymmetry: the daemon always emits JSON, the other subcommands need an explicit `--log-format` to enable a tracing subscriber at all. - `RUST_LOG` filter targets, verified against the running 0.11.0 binary in both INFO and DEBUG modes. Every target listed is one I actually saw emitted: `rsigma::daemon::{server,reload,health}`, `rsigma_runtime::{engine,sources,sources::refresh}`, `rsigma_eval::{engine,correlation_engine}`, `async_nats(::connector)`, `tower_http::trace::{on_request,on_response}`. No invented targets. Includes four copy-paste RUST_LOG recipes for the cases the daemon hits in practice (quiet prod, hot-reload debugging, dynamic source timeouts, HTTP access log). - Spans (`load_rules`, `evaluate_batch`, `otlp_logs_request`) with the exact JSON shape the daemon emits, including the `span` / `spans` call-stack fields. - Prometheus metrics: the 27 definitions grouped into engine throughput, queue and back-pressure, rule and state load, per-rule labels, dynamic sources, and OTLP. Notes that a startup `/metrics` scrape exposes ~20 names; the per-rule and OTLP counters only appear after their first event. Includes four ready-made Prometheus alerts (back-pressure, correlation state pressure, DLQ volume, reload failures) and a short section on `/healthz` and `/readyz` for orchestrator probes. Cross-links to Streaming Detection, Performance Tuning, NATS Streaming, Prometheus metrics reference, HTTP API reference, and the upstream `tracing-subscriber` EnvFilter docs.

Thirteen pages, one per `--help` surface, plus a section index that includes the flat-to-grouped migration table and a copy-paste sed script for repos still on pre-0.11 invocations. Each subcommand page follows the same template (synopsis, description, flag tables grouped by concern, examples, exit codes, related guide / reference links) so operators can predict where to look without hunting through prose: - engine: eval, daemon - rule: parse, validate, lint, fields, condition, stdin - backend: convert, targets, formats - pipeline: resolve Every flag and default is extracted from the actual --help output of an rsigma binary built with `--all-features`, including the NATS auth / replay / consumer-group flags that are feature-gated behind `daemon-nats`. Exit codes match the structured scheme verified against the binary in the CI/CD commit. The CLI index documents the four global exit codes, the global --log-format flag, the high-level command tree, and the per-subcommand migration table referencing issues #125 (hide aliases) and #126 (remove aliases). Also drops the `attack` placeholder from the CLI surface across the docs (CLI index, command tree, Concepts CLI table, installation verification line). It was a reserved-but-empty group that distracts operators reading the docs today; the upcoming ATT&CK tooling can reintroduce a row in those four places when it actually ships something invokable.

`mkdocs build` writes the rendered site to `/site/` in the repo root. Useful for one-shot CI deploys and local previews, but the directory is generated and should not land in git. Adds `/site/` to .gitignore so the next `mkdocs build` does not pollute `git status`.

…ibutes Four short reference pages that were already linked from CLI and User Guide pages but didn't exist yet. Each is the canonical table for one operator-facing surface: - exit-codes.md: the four structured codes (`0 SUCCESS`, `1 FINDINGS`, `2 RULE_ERROR`, `3 CONFIG_ERROR`) and which command emits which code in which situation. Source-of-truth pointer at the `exit_code` module. Per-command behaviour matrix. - environment-variables.md: every env var rsigma actually reads (RUST_LOG, NO_COLOR, RSIGMA_CONSUMER_GROUP, NATS_CREDS / NATS_TOKEN / NATS_USER / NATS_PASSWORD / NATS_NKEY) plus a "NOT read" list to preempt confusion about OTEL_*, SIGMA_*, and PROMETHEUS_* variables. Precedence example shows that CLI flags override env vars. - feature-flags.md: the per-crate feature inventory verified against the workspace Cargo.tomls (rsigma-cli, rsigma-eval, rsigma-runtime, rsigma-parser). Documents the build recipes (default, daemon-nats/daemon-otlp/evtx/daachorse-index combinations, --all-features for the release shape), the CI matrix coverage, and the `--help`-based feature-detection workaround for users who need to confirm a specific build's surface area. - custom-attributes.md: the `rsigma.*` and `postgres.*` namespaces. Documents precedence (rule-level > set_state > -O backend option > backend default), six rsigma.* attributes with their CLI flag equivalents and scopes, and three postgres.* attributes for backend routing. Includes worked YAML examples.

Three medium-size reference pages, verified against a running 0.11.0 daemon built with --all-features. - metrics.md: the 27-metric catalogue split into four groups (16 core always-on, 2 per-rule labelled, 5 dynamic source, 3 OTLP) with type, labels, and verbatim HELP text. Includes alert recipes (back pressure, correlation state pressure, DLQ volume, reload failures, source staleness) and histogram bucket guidance. - http-api.md: every endpoint exposed on `--api-addr` with request body, response body, and verified curl examples. Covers /healthz, /readyz, /metrics, the /api/v1/* control endpoints, /v1/logs, and the gRPC pointer. Notes that authentication is not yet implemented and points at #128 for the planned TLS termination. - builtin-pipelines.md: full field-mapping tables for ecs_windows (per-category Sigma -> ECS renames) and sysmon (23 logsource category -> EventID injections). Explains why neither builtin matches raw .evtx records directly and how to chain builtins with file pipelines via priority. Also restructures the CLI overview's --log-format row from `--log-format <json\|text>` to a separate "Values" column so the markdown table renders cleanly without backslash-escaped pipes.

MkDocs Material's table renderer (pymdownx.tables) emits the literal backslash from `\|` inside backticks, so `--correlation-event-mode none\|full\|refs` shows up as `none\|full\|refs` for readers. Rewrite every affected cell to avoid embedding pipes in inline code: - evaluating-rules.md: `--action alert\|reset` -> `--action <alert,reset>` with the values explained in the description, same for `--correlation-event-mode`. - performance-tuning.md: `\|contains`, `\|re` -> `contains`, `re`. Context makes it obvious they are Sigma modifier names. - linting-rules.md: `\|all`, `\|exists: true` -> `all`, `exists: true`. - rule-conversion.md: LynxDB syntax table rewrites SPL2 pipe-bearing cells like "Deferred to `\| where field=~\"regex\"`" into prose ("Deferred to a `where field=~\"regex\"` pipeline stage") and adds a one-line definition of what "Deferred" means. Sigma modifier references stop including the leading `|`. - concepts.md: backends table rewrites the LynxDB syntax cell to describe the SPL2 shape in prose rather than embedding `\|`. The standalone pipe character outside inline code (e.g. the LynxDB example "`FROM main | search ...`" outside a table) renders correctly and stays unchanged.

Verified every claim from the recent CLI reference and reference-batch commits against a 0.11.0 daemon and the source. Several claims were wrong; this commit fixes them. Metric labels (metrics.md): - `rsigma_detection_matches_by_rule_total` labels are `rule_title`, `level`, not `rule_id`. Added a caveat that `rule_title` is not unique in a rule set; for collision-free per-rule analytics, join against the detection NDJSON by `rule_id` outside Prometheus. - `rsigma_correlation_matches_by_rule_total` adds a third `correlation_type` label. - `rsigma_source_resolves_total` labels are `source_id`, `source_type`, not `source_id`, `result`. The metric counts every attempt (successful or not), with errors broken out into `rsigma_source_resolve_errors_total` instead of a `result` label. - `rsigma_source_resolve_errors_total` labels are `source_id`, `error_kind` with values `Fetch`, `Parse`, `Extract`, `Timeout`, `ResourceLimit` (the `SourceErrorKind` variant names). - `rsigma_source_resolve_seconds` has no labels (global histogram). - `rsigma_source_cache_hits_total` has no labels (global counter). - `rsigma_otlp_requests_total` labels are `transport`, `encoding` (not `transport`, `result`). - `rsigma_otlp_errors_total` labels are `transport`, `reason` with values `unsupported_content_type`, `decompression`, `decode`, `channel_closed`. Exit codes (exit-codes.md, cli/rule/{parse,stdin,fields}.md, cli/backend/formats.md, cli/pipeline/resolve.md): - `rule parse` returns `0` on YAML syntax or missing-required-field issues (warnings only). It only exits `2` on IO errors. - `rule stdin` is fully lenient: it always exits `0`, with YAML warnings on stderr and the partial AST on stdout. - `rule fields` returns `0` on per-rule parse errors (warnings only). Exits `2` only on bad rules path and `3` on bad pipeline path. - `backend formats` returns `0` for unknown backend names; the error is printed but the process exits cleanly. - `pipeline resolve` returns `0` even when sources fail; per-source `status: "error"` is in the JSON output. For a strict CI gate, use `rule validate --resolve-sources` (which exits `3`). HTTP API (http-api.md): - `DELETE /api/v1/sources/cache/{source_id}` response is `{"status":"invalidated","source_id":"..."}` (not `"cache_invalidated"`), and the endpoint is a no-op for nonexistent source IDs (it still returns `200 OK`). Pulls these inaccuracies out of the docs and adds a "non-obvious behaviours" subsection to exit-codes.md so the lenient-exit-on-warnings pattern is documented in one place.

dynamic-sources.md: the canonical spec for every aspect of dynamic pipeline sources, derived from the source code (constants pinned against rsigma_runtime::sources and rsigma_eval::pipeline::sources): - Source declaration: every field on a DynamicSource (id, type, format, extract, refresh, required, timeout, on_error, default, max_body_size, max_stdout) with required-or-optional status. - Source types: file, http, command, nats. Each with its own type-specific fields, default values, and feature gating. - Data formats: json, yaml, lines, csv with the libraries used. - Extract languages: jq (jaq), jsonpath (serde_json_path, RFC 9535), cel (cel-interpreter), with the explicit object-form syntax and the plain-string jq shorthand. - Refresh policies: once, <duration>, watch (file only), push (NATS only), on_demand with all five hot-reload trigger surfaces. - Error policies: use_cached, fail, use_default, including the interaction with the `required` flag. - Template substitution: documents the actual TemplateExpander behaviour. Inline templates are supported (caught a doc bug - earlier draft claimed they weren't) but whole-value substitution via `${source.X}` in a `vars:` entry is the safe form for array sources. - Include directives: MAX_INCLUDE_DEPTH = 1, --allow-remote-include opt-in. - Resource limits: MAX_SOURCE_RESPONSE_BYTES 10 MiB, DEFAULT_COMMAND_TIMEOUT 30 s, 64 KiB stderr cap, MIN_REFRESH_INTERVAL 1 s, MAX_INCLUDE_DEPTH 1. security.md: catalogues every input bound, parser-robustness choice, and operational concern, all verified against source: - Input caps: MAX_LINE_BYTES (1 MiB), MAX_CONDITION_LEN (64 KiB), MAX_CONDITION_DEPTH (64), MAX_NESTING_DEPTH (64), MAX_WINDASH_DASHES (8), MAX_CHAIN_DEPTH (10), max_state_entries (100,000 default). - Dynamic pipeline limits: cross-referenced from dynamic-sources.md. - Parser robustness: yaml_serde 0.10 (the maintained serde_yaml fork), bounded recursive-descent condition parser, evtx streaming reader, prost+tonic OTLP, plus the fuzz harness inventory. - SQL injection prevention: actual validate_sql_identifier regex is `^[A-Za-z_][A-Za-z0-9_$]*$` (allows `$`, which I had wrong in an earlier draft); single-quote escape via doubling. - Process and concurrency: SIGTERM/SIGINT drain, SIGHUP reload via ArcSwap, parking_lot::Mutex on the hot processor path with std::sync::Mutex elsewhere (caught and corrected the earlier over-broad parking_lot claim). - Network exposure: documents that the daemon HTTP/gRPC listeners are unauthenticated today and points at #128 for planned TLS. - Supply chain: cargo audit, dependabot, cosign-signed Docker image, SLSA Build L3 provenance on archives, Grype scan gate. - Threat model summary in one paragraph.

Create the three meta pages that the docs nav (.pages) already lists but were never populated: - docs/release-notes.md includes the root CHANGELOG.md verbatim. - docs/contributing.md includes the root CONTRIBUTING.md verbatim, followed by a short pointer to the development-workflow rule file in the repo. - docs/security-policy.md includes the root SECURITY.md verbatim, followed by a pointer to the runtime Security Hardening reference. This is the pattern the docs plan specified for "files that already exist at the repo root and should not be duplicated". Side fixes uncovered while landing this: - docs/getting-started/{concepts,installation}.md were pointing at ../developers/contributing.md, a page that does not exist. Rewired them to ../contributing.md (the new include-markdown page). - The root CONTRIBUTING.md ended with `[MIT License](LICENSE)`. The relative path resolves on GitHub's repo view but 404s when included into the docs site (which does not ship LICENSE). Rewrote to an absolute github.com URL so it works in both contexts. Remaining mkdocs warnings about unwritten reference and developer pages (backends/postgres, backends/lynxdb, lint-rules, library/*, developers/*, editors/*) are the forward-references to to-do batches and will resolve as those pages land.

Two backend-specific reference pages, both backed by live runs of backend convert and verified against the source. postgres.md: - Backend options: table, schema, database, timestamp_field, json_field, case_sensitive_re. Default value and precedence chain documented for each. - Full Sigma-modifier-to-PostgreSQL operator table with field quoting, single-quote escaping, and the SQL identifier validation regex (`^[A-Za-z_][A-Za-z0-9_$]*$`) cross-linked to Security Hardening. - Output formats (default, view, timescaledb, continuous_aggregate, sliding_window) with verified SQL output for each. - JSONB mode: top-level (`->>`) and dotted-path (`->...->>`) extraction. - Correlation strategy per type (event_count, value_count, value_sum, value_avg, value_percentile, value_median, temporal, plus the not-yet-shipped temporal_ordered). - Custom attributes (postgres.table/schema/database). - OCSF pipeline pointers and the open-items roadmap. lynxdb.md: - How it differs from PostgreSQL: no table/schema, no WHERE, non-standard boolean precedence, deferred modifiers via `where` pipeline stages. - Index selection via `set_state` (default `main`); validated as a SQL identifier. - Modifier mapping with native vs deferred categorisation. - Output formats: default (full query) and minimal (search expression only for REST API). - Boolean precedence explanation with the parenthesisation rationale. - Six worked examples (plain string, integer, custom index, deferred regex, CIDR with combination, keyword). - Limitations table including unsupported value modifiers and the not-yet-shipped temporal_ordered. Three live-verification corrections caught while writing: - PostgreSQL CIDR emits `("DestinationIp")::inet <<= '10.0.0.0/8'::cidr` (the field cast is wrapped in parens). My first draft omitted them. - PostgreSQL keywords emit `to_tsvector('simple', ROW(*)::text) @@ plainto_tsquery('simple', 'value')`, NOT `to_tsvector('english', "field") @@ plainto_tsquery('english', 'value')`. The configuration is 'simple' (no stemming) over every column concatenated, which is intentionally broader than per-field FTS. Documented this design choice explicitly. - LynxDB keyword emits `"keyword"` (double-quoted), not bare token.

Catalogue of every rsigma-parser lint rule. All 66 enum variants verified against the source; the 13 with safe auto-fixes are listed in a dedicated table near the top with what each fix actually does. Per-section tables follow the source file structure: - Infrastructure (4): yaml_parse_error, not_a_mapping, file_read_error, schema_violation. Pre-parse failures; cannot be suppressed inline. - Shared metadata (16): title, id, level, status, date, author, description, key naming. - Detection (18): detection block, condition, logsource, tags, references, modifiers. - Correlation (13): correlation block structure and validity. - Filter (7+1 reserved): filter block constraints. Includes the reserved-but-unemitted empty_filter_rules variant. - Detection-modifier hygiene (cross-references): subset of detection rules that flag modifier misuses. Severity counts verified by parsing every rule emission site (`error(...)`, `warning(...)`, `info(...)`, `LintWarning { severity: ... }`): - 33 error - 29 warning - 3 info - 0 hint (the variant is defined but unused by shipped rules) - 1 reserved (empty_filter_rules: declared, never emitted) Fix-attachment verified by walking each `w.fix = safe_fix(...)` call site and each inline `Fix { ... }` struct construction back to its nearest LintRule emission. Earlier draft over-attributed fixes to six rules (title_too_long, invalid_id, scope_too_short, taxonomy_too_long, unknown_tag_namespace, missing_filter_condition, missing_filter_logsource); those rows now correctly read "—".

Hybrid approach in response to the zizmor-style audits reference. Full per-rule pages would balloon the lint catalogue to ~3000 lines for 66 rules, most of which are spec hygiene ("missing X", "invalid Y") that need no extra prose. Instead, add a "Selected findings, with worked examples" section for the ten rules that genuinely surprise authors on first contact. Each entry shows the triggering YAML, explains what's wrong, and shows the fix: - wildcard_only_value: `'*'` value vs `|exists: true`. - single_value_all_modifier: `|all` is a no-op on a single value. - all_with_re: `|all` is meaningless with `|re`. - non_lowercase_key: `Title:` silently does not work because keys are case-sensitive. - condition_references_unknown: the `condition:` names a selection that does not exist. - deprecated_aggregation_syntax: migrating the pipe-aggregation form to a separate `correlation:` document. - duplicate_fields: YAML duplicate-key silent overwrite and how to express "any of" with a list. - unknown_tag_namespace: the recognised set is small (attack, cve, detection, tlp, stp, informational) and typos slip through. - null_in_value_list: the three things a bare `null` could mean and how to disambiguate. - invalid_status and invalid_level: the exact valid value sets. The catalogue tables at the top stay compact. Operators looking for "why did this rule fire" land on the worked example via the rule's heading; everyone else skims the tables.

The page renders the canonical mermaid diagram at `assets/architecture.mmd` inline (superfences + mermaid is already wired in mkdocs.yml). Sections: - Crate map: the diagram plus a short legend for the feature-gated paths. - Crate responsibilities: one row per crate (rsigma-parser, rsigma-eval, rsigma-convert, rsigma-runtime, rsigma-lsp, rsigma-cli) with role, key types, and feature flags. - The four execution shapes: library, `engine eval`, `engine daemon`, `backend convert`, plus the LSP. Each links into the guide that explains it. - Data flow: YAML to AST, AST to compiled rules, event evaluation, the streaming pipeline (bounded mpsc + ArcSwap), dynamic source resolution. - Performance posture and threat model: short paragraphs cross-linking to BENCHMARKS.md, Performance Tuning, and Security Hardening.

…n page The benchmark numbers are referenced from performance-tuning, architecture, and the matcher-optimizer threshold discussion. Inlining BENCHMARKS.md via mkdocs-include-markdown lets operators look up specific Criterion groups (eval_single_event, eval_throughput, eval_bloom_rejection, eval_cross_rule_ac, runtime_throughput, dynamic_pipelines) without leaving the docs. Also resolves the standing mkdocs warning that developers/benchmarks.md is referenced from .pages but never existed. Rewires three cross-references away from the GitHub-hosted blob and onto the new in-docs page: - performance-tuning.md: matcher-optimizer threshold sweep citation and the bench-commands appendix. - architecture.md: the headline-figures paragraph and the See-also bullet. Other developer-section pages (testing, fuzzing, adding-backends, adding-input-formats, linter-and-lsp, contributing) are still on the to-do list; their nav slots stay declared in docs/developers/.pages so they slot in cleanly when written.

The benchmarks include lived at `docs/developers/benchmarks.md`, requiring `../../BENCHMARKS.md`. The three sibling root-md mirrors (release-notes, contributing, security-policy) sit at the top of `docs/` and use a single-level `../FILE.md` include. The inconsistency made the include path uglier and hid the page deeper in the nav than its operator-facing audience warranted. - Move the page to `docs/benchmarks.md` (`git mv` preserves history). - Drop the leading `../` from the include path to match the other three (`../BENCHMARKS.md`). - Add a top-level nav entry in `docs/.pages` between Reference and Deployment; remove the matching entry from `docs/developers/.pages`. - Update the four cross-references in `index.md`, `guide/performance-tuning.md`, and `reference/architecture.md` from `../developers/benchmarks.md` to `../benchmarks.md` (or `benchmarks.md` from the docs root).

…n cross-refs Docker page covers the ghcr.io/timescale/rsigma image (multi-arch linux/amd64+linux/arm64, FROM scratch, USER 65534), tag-and-digest pinning, cosign keyless signature verification, gh attestation verification, the standard runtime-hardening flag set, a self-contained docker-compose.yml, the state-db writable-volume pattern, and the build-from-source path for non-standard feature combinations. Verified against a locally built image (Dockerfile in repo root) on Docker 29.4.3 / Apple Silicon: - `docker run --rm rsigma --version` -> `rsigma 0.11.0`. - `docker run --rm rsigma --help` shows the grouped command tree (engine / rule / backend / pipeline / attack) plus the deprecated flat aliases. - `rule validate /rules/` against a bind-mounted rule. - Same `rule validate` under `--read-only --cap-drop=ALL --security-opt=no-new-privileges:true --tmpfs /tmp` succeeds. - `engine daemon -r /rules/ --input http --api-addr 0.0.0.0:9090` under the same hardening returns 200 on /healthz and /readyz. - `--state-db /state/correlation.db` writes the SQLite file via a writable bind-mount. Added the note that on Linux hosts the host directory must be writable by uid 65534 (chown or use a Docker-managed volume). - The state-db example now explicitly sets `--input http` so the daemon stays alive (the default `--input stdin` exits as soon as the container's stdin closes; spotted during the verification pass). - The docker-compose example brings up the daemon, /readyz returns ready, and the `["CMD", "/rsigma", "--version"]` healthcheck transitions to "healthy" (verified via `docker compose ps --format 'table {{.Name}}\t{{.State}}\t{{.Status}}'`). - `cosign verify --certificate-identity-regexp 'github.com/timescale/rsigma' --certificate-oidc-issuer https://token.actions.githubusercontent.com ghcr.io/timescale/rsigma:0.11.0` -> successful verification of the v0.11.0 image's keyless cosign signature and SLSA provenance. - `gh attestation verify --owner timescale oci://ghcr.io/timescale/rsigma:0.11.0` -> verified SLSA provenance attestation with `verifiedIdentity.subjectAlternativeName.regexp == ^https://github.com/timescale/` and the v0.11.0 release-trigger metadata. Kubernetes guide: the draft is held in `docs-drafts/kubernetes.md` (gitignored) until the canonical Helm chart lands and the manifests can be exercised on a real cluster. mkdocs.yml `nav.Reference.Deployment` and docs/deployment/.pages drop the kubernetes.md entry so the build no longer warns about the missing page. Side cleanup: two doc spots that linked to https://github.com/timescale/rsigma/issues as if they backed a real issue when they were actually pointing at roadmap-only entries. `reference/backends/postgres.md` referenced "roadmap item #3" and `reference/feature-flags.md` referenced "the roadmap" for a feature-introspection flag. Both rewritten as plain prose ("Tracked on the project roadmap...") without misleading issue links.

Adds an overview page and four per-crate pages (parser, eval, convert, runtime) that orient embedders without duplicating the docs.rs reference. Every code snippet uses the actual public API names verified against the sources, so a `cargo check` on a freshly extracted snippet would compile against the current workspace. The pages cross-link into the existing guide and reference pages and point readers at the per-crate READMEs on GitHub for the exhaustive trait surface.

Adds six developer pages: orientation, testing, fuzzing, adding-backends, adding-input-formats, and linter-and-lsp. The .pages nav drops the architecture and contributing entries that pointed at non-existent duplicates; the canonical pages already live at /reference/architecture and /contributing. The testing page describes the five-tier CI shape against the actual job names and commands in ci.yml. The fuzzing page documents all 15 cargo-fuzz harnesses with verified max_len values. The two "adding a *" walkthroughs use the real Backend trait and InputFormat shapes; the lint page describes the real `# rsigma-disable` syntax, LintConfig fields, and apply_suppressions signature. The library/parser.md page is updated in the same commit to use the Display-based rule id (w.rule) instead of a fictional .id() method.

detection.studio is a browser-based Sigma rule playground that wired in real-time rule evaluation by compiling RSigma to WebAssembly. The new section sits between Featured in and Read the deep dives, uses the same cards grid as the page header, and is structured so additional integrations can be added as one more card.

The contributor checklists at the end of the adding-backends, adding-input-formats, and linter-and-lsp pages were already written with the - [ ] task-list syntax, but the extension was not enabled so they rendered as plain bullets. Adding pymdownx.tasklist with custom_checkbox: true makes them render as Material's styled checkboxes. Also drops the developers/{architecture,contributing,benchmarks}.md entries from the llmstxt section list, since those pages live at /reference/architecture, /contributing, and /benchmarks now.

Three pages covering integrations that had stale outbound links from the user guide, deployment guide, and architecture reference: - editors/vscode.md walks through the VS Code/Cursor extension build flow against the actual editors/vscode/package.json, the settings it exposes, and the .rsigma-lint.yml + # rsigma-disable conventions. - editors/neovim.md gives copy-paste configs for Neovim native LSP, nvim-lspconfig, Helix, Zed, Emacs (eglot), and Sublime LSP, all driving the same rsigma-lsp binary. - ecosystem/helr.md positions Helr as the polling collector that feeds rsigma engine daemon, with a complete docker-compose stack (NATS + Helr + rsigma) and a small Okta -> ECS pipeline example.

Two jobs: build runs on every PR with mkdocs build --strict (matching the same dependency pinning as docs/requirements.txt), and deploy runs only on main and only after the build job uploaded its artifact. The workflow follows the repo CI conventions: top-level permissions: {} with least-privilege per-job grants (contents: read for build, pages: write + id-token: write for deploy), every action pinned by full commit SHA with a version comment, persist-credentials: false on checkout, concurrency group with cancel-in-progress, and a workflow_dispatch trigger for manual runs. zizmor --pedantic passes with zero findings. The path filter covers docs/, mkdocs.yml, docs/requirements.txt, and the four root-level files that the include-markdown plugin embeds (README, CHANGELOG, CONTRIBUTING, SECURITY, BENCHMARKS), so a change to any of those triggers a rebuild.

docs/.pages listed a Blog: blog.md item that pointed at a file that never existed; mkdocs serve was 404-ing on /blog.md. We do not plan to host a blog in the site (engineering posts live on mostafa.dev, already linked from the Read the deep dives section on the home page), so drop the nav entry entirely.

The plugin queries GitHub's REST API once per documentation page to render a committer-avatars row at the bottom of every page. In CI it exhausts the unauthenticated 60 req/hour quota almost immediately, aborting the strict build with two 403 errors per run. Authentication via GITHUB_TOKEN would lift the quota, but the avatar row is a nice-to-have rather than a load-bearing feature; the contributor list is already available via `git log` and the GitHub UI. Dropping the plugin keeps the build hermetic with no network dependency, and matches the simpler footer of the upstream mkdocs-material themed projects we modelled this site after.

The "operability, performance, and documentation" release. * Workspace bumped 0.11.0 -> 0.12.0; all 10 inter-crate dep pins refreshed; Cargo.lock regenerated under --locked. * CHANGELOG.md [Unreleased] section flipped to [0.12.0] - 2026-05-19; comparison link updated to v0.11.0...v0.12.0; tag reference added to the bottom-of-file link block. * CHANGELOG also gained a Documentation site (PR timescale#129) section under the existing observability / eval-perf / CLI-groups / test-reliability / dependencies headings, and the TL;DR theme moved from "operations and load performance" to "operability, performance, and documentation" to reflect the new docs site as a top-line deliverable. Covers all 13 PRs merged since v0.11.0: timescale#107 (observability), timescale#111/timescale#113/timescale#114/timescale#120 (dependency batches), timescale#115/timescale#123 (test reliability), timescale#119/timescale#121/timescale#122/timescale#123 (eval rule loading perf), timescale#124 (CLI command groups), timescale#127 (CLI docs followup), timescale#129 (documentation site).

The repo has had a published mkdocs site at docs/ since the v0.12.0 release (PR #129), but CONTRIBUTING.md never called out that user-facing docs need to ship alongside the code that changes them. Add a Documentation section that lists the two surfaces (crate READMEs and the mkdocs site under docs/) and the page-to-change matrix per kind of change (CLI flag, daemon config key, library API, metric, etc.). Also points contributors at `mkdocs build --strict` for local verification and at the docs.yml workflow that enforces the same on PRs, so the loop is closed before review.

mostafa added 30 commits May 18, 2026 16:11

mostafa added 7 commits May 19, 2026 15:16

mostafa changed the title ~~Add MkDocs Material documentation site at docs/~~ Add documentation site at docs/ using MkDocs Material May 19, 2026

mostafa merged commit 502c61a into main May 19, 2026
14 checks passed

mostafa deleted the feat/docs-site branch May 19, 2026 13:59

mostafa mentioned this pull request May 19, 2026

chore(release): v0.12.0 #130

Merged

5 tasks

mostafa mentioned this pull request May 20, 2026

docs(contributing): document docs/ mkdocs site as a release deliverable #133

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add documentation site at `docs/` using MkDocs Material#129

Add documentation site at `docs/` using MkDocs Material#129
mostafa merged 37 commits into
mainfrom
feat/docs-site

mostafa commented May 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mostafa commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What landed

Verification

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mostafa commented May 19, 2026 •

edited

Loading