Skip to content

feat(report): SARIF 2.1.0 emitter for CI / code-scanning integration#626

Merged
bearsyankees merged 2 commits into
usestrix:mainfrom
seanturner83:seedcx-sarif
Jul 3, 2026
Merged

feat(report): SARIF 2.1.0 emitter for CI / code-scanning integration#626
bearsyankees merged 2 commits into
usestrix:mainfrom
seanturner83:seedcx-sarif

Conversation

@seanturner83

Copy link
Copy Markdown
Contributor

Closes #624.

What Problem This Solves

Strix emits CSV, Markdown, and JSON, but no SARIF — so findings can't feed
GitHub code-scanning, an ASPM, or any SARIF-consuming CI gate without a custom
converter. This adds a first-class SARIF 2.1.0 emitter.

The Change

A stdlib-only emitter (strix/report/sarif.py), written to findings.sarif
from ReportState._save_artifacts alongside the existing artifacts.

Design invariants (learned running this in production):

  • Stable partialFingerprints.primaryLocationLineHash per finding, so a
    re-scan that re-words a title doesn't churn code-scanning alert IDs.
  • Class/category hashing so the same vuln class maps to a stable ruleId
    across scans rather than drifting.
  • Findings with no code location anchor to SECURITY.md with a synthetic
    marker instead of being silently dropped.
  • Always emit (even with zero findings) so a clean re-scan overwrites a
    stale findings.sarif and code-scanning auto-resolves fixed alerts.
  • tool.driver.version reports the strix package version.
  • Fully isolated in its own try/except: a SARIF build error must never
    break the CSV/MD/run-record path.

No new dependencies; no behavior change to existing artifacts.

Testing

  • Added tests/test_sarif.py (7 cases): basic 2.1.0 shape + real code location,
    always-emit on zero findings, tool_version reporting, locationless finding
    anchored (not dropped), fingerprint stability across a title rewording, and
    distinct findings getting distinct fingerprints. pytest tests/test_sarif.py
    → all pass.
  • Validated the emitted document against the official OASIS SARIF 2.1.0 JSON
    schema (basic, zero-findings, locationless, and mixed cases) — all valid.
  • Exercised end-to-end via a real scan: agent-discovered SQLi/command-injection/
    weak-hash findings → valid findings.sarif with real code locations and
    distinct per-CWE fingerprints.
  • ruff check, ruff format --check, pyupgrade --py312-plus, bandit -c pyproject.toml all clean.
  • (mypy not asserted: the repo's pinned pre-commit mypy hook errors out on the
    bundled openai SDK on main as well, independent of this change. Verified
    clean under mypy 1.17 with the repo's own config.)

@greptile-apps

greptile-apps Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds SARIF output for Strix reports. The main changes are:

  • New SARIF 2.1.0 report builder and writer.
  • Integration with report artifact saving.
  • Synthetic locations for findings without code anchors.
  • Stable fingerprints and class hashes for code-scanning consumers.
  • Repository provenance in SARIF runs when repo context is available.
  • Tests for SARIF shape, fingerprints, PoC handling, atomic writes, and repo context.

Confidence Score: 5/5

This looks safe to merge.

  • No blocking issues found in the changed code.

Important Files Changed

Filename Overview
strix/report/sarif.py Adds the SARIF emitter, location handling, fingerprinting, PoC metadata handling, fixes support, provenance fields, and atomic file writing.
strix/report/state.py Adds SARIF artifact writing with isolated error handling and best-effort repository/version metadata.
tests/test_sarif.py Adds coverage for SARIF output shape, empty runs, fingerprints, PoC metadata, fixes, logical locations, provenance, and atomic replacement.
tests/test_state_repo_context.py Adds coverage for repository name parsing and repository context derivation from cloned repos.

Reviews (3): Last reviewed commit: "fix(report): complete SARIF code scannin..." | Re-trigger Greptile

Comment thread strix/report/sarif.py Outdated
Comment thread strix/report/sarif.py
Comment thread strix/report/sarif.py Outdated
Strix emits CSV + markdown + JSON but no SARIF, so findings can't feed
GitHub code-scanning, an ASPM, or any SARIF-consuming CI gate. Add a
stdlib-only emitter (strix/report/sarif.py) and always write findings.sarif
from ReportState._save_artifacts, beside the existing artifacts.

Design invariants (learned from running this in production):
- Stable partialFingerprints.primaryLocationLineHash per finding, so a
  re-scan that re-words a title doesn't churn code-scanning alert IDs.
- Class/category hashing so the same vuln class maps to a stable ruleId
  across scans rather than drifting.
- Findings with no code location anchor to SECURITY.md with a synthetic
  location marker instead of being silently dropped.
- Always emit (even with zero findings) so a clean re-scan overwrites a
  stale findings.sarif and code-scanning auto-resolves fixed alerts.
- tool.driver.version reports the strix package version.
- Fully isolated in its own try/except: a SARIF build error must never
  break the CSV/MD/run-record path.

Verified end-to-end on v1.0.4 against a SQLi/cmd-inj/weak-hash fixture:
3 findings -> valid SARIF 2.1.0, 3 results, real code locations, distinct
per-finding fingerprints.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@rajpratham1 rajpratham1 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comprehensive SARIF implementation. The overall design is solid, especially keeping SARIF generation isolated so failures don't affect existing report outputs, along with the extensive test coverage. My main concern is scope: this PR introduces nearly 1,000 lines including the emitter, integration, fingerprinting, synthetic-location handling, and supporting utilities, which makes review and long-term maintenance difficult. If practical, I'd recommend splitting future work into smaller logical PRs (emitter, integration, fingerprint improvements). Also, sarif.py has grown quite large and could benefit from extracting helpers (locations, fingerprints, rule generation) into separate modules. Otherwise the implementation looks thoughtfully designed.

@bearsyankees

Copy link
Copy Markdown
Collaborator

@greptile

@bearsyankees

Copy link
Copy Markdown
Collaborator

@greptile

@bearsyankees bearsyankees merged commit 302efed into usestrix:main Jul 3, 2026
1 check passed
@bearsyankees

Copy link
Copy Markdown
Collaborator

LGTM thanks @seanturner83 removed the zh reference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Emit SARIF 2.1.0 for CI / code-scanning integration

3 participants