feat(report): SARIF 2.1.0 emitter for CI / code-scanning integration#626
Conversation
Greptile SummaryThis PR adds SARIF output for Strix reports. The main changes are:
Confidence Score: 5/5This looks safe to merge.
Important Files Changed
Reviews (3): Last reviewed commit: "fix(report): complete SARIF code scannin..." | Re-trigger Greptile |
Strix emits CSV + markdown + JSON but no SARIF, so findings can't feed GitHub code-scanning, an ASPM, or any SARIF-consuming CI gate. Add a stdlib-only emitter (strix/report/sarif.py) and always write findings.sarif from ReportState._save_artifacts, beside the existing artifacts. Design invariants (learned from running this in production): - Stable partialFingerprints.primaryLocationLineHash per finding, so a re-scan that re-words a title doesn't churn code-scanning alert IDs. - Class/category hashing so the same vuln class maps to a stable ruleId across scans rather than drifting. - Findings with no code location anchor to SECURITY.md with a synthetic location marker instead of being silently dropped. - Always emit (even with zero findings) so a clean re-scan overwrites a stale findings.sarif and code-scanning auto-resolves fixed alerts. - tool.driver.version reports the strix package version. - Fully isolated in its own try/except: a SARIF build error must never break the CSV/MD/run-record path. Verified end-to-end on v1.0.4 against a SQLi/cmd-inj/weak-hash fixture: 3 findings -> valid SARIF 2.1.0, 3 results, real code locations, distinct per-finding fingerprints. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
0bf2245 to
c04f2ab
Compare
rajpratham1
left a comment
There was a problem hiding this comment.
Thanks for the comprehensive SARIF implementation. The overall design is solid, especially keeping SARIF generation isolated so failures don't affect existing report outputs, along with the extensive test coverage. My main concern is scope: this PR introduces nearly 1,000 lines including the emitter, integration, fingerprinting, synthetic-location handling, and supporting utilities, which makes review and long-term maintenance difficult. If practical, I'd recommend splitting future work into smaller logical PRs (emitter, integration, fingerprint improvements). Also, sarif.py has grown quite large and could benefit from extracting helpers (locations, fingerprints, rule generation) into separate modules. Otherwise the implementation looks thoughtfully designed.
|
LGTM thanks @seanturner83 removed the zh reference |
Closes #624.
What Problem This Solves
Strix emits CSV, Markdown, and JSON, but no SARIF — so findings can't feed
GitHub code-scanning, an ASPM, or any SARIF-consuming CI gate without a custom
converter. This adds a first-class SARIF 2.1.0 emitter.
The Change
A stdlib-only emitter (
strix/report/sarif.py), written tofindings.sariffrom
ReportState._save_artifactsalongside the existing artifacts.Design invariants (learned running this in production):
partialFingerprints.primaryLocationLineHashper finding, so are-scan that re-words a title doesn't churn code-scanning alert IDs.
ruleIdacross scans rather than drifting.
SECURITY.mdwith a syntheticmarker instead of being silently dropped.
stale
findings.sarifand code-scanning auto-resolves fixed alerts.tool.driver.versionreports the strix package version.try/except: a SARIF build error must neverbreak the CSV/MD/run-record path.
No new dependencies; no behavior change to existing artifacts.
Testing
tests/test_sarif.py(7 cases): basic 2.1.0 shape + real code location,always-emit on zero findings,
tool_versionreporting, locationless findinganchored (not dropped), fingerprint stability across a title rewording, and
distinct findings getting distinct fingerprints.
pytest tests/test_sarif.py→ all pass.
schema (basic, zero-findings, locationless, and mixed cases) — all valid.
weak-hash findings → valid
findings.sarifwith real code locations anddistinct per-CWE fingerprints.
ruff check,ruff format --check,pyupgrade --py312-plus,bandit -c pyproject.tomlall clean.bundled
openaiSDK onmainas well, independent of this change. Verifiedclean under mypy 1.17 with the repo's own config.)