Skip to content

fix: pass each semgrep_rules entry as its own --config flag#98

Merged
tehw0lf merged 2 commits into
mainfrom
fix/semgrep-multi-config-args
Jul 1, 2026
Merged

fix: pass each semgrep_rules entry as its own --config flag#98
tehw0lf merged 2 commits into
mainfrom
fix/semgrep-multi-config-args

Conversation

@tehw0lf

@tehw0lf tehw0lf commented Jul 1, 2026

Copy link
Copy Markdown
Owner

Summary

  • semgrep ci only accepts one --config value per flag; security-scan-source.yml was passing the whole semgrep_rules string (e.g. auto p/security-audit p/owasp-top-ten p/ci p/nodejs p/typescript) as a single --config= argument, so everything after the first word was parsed as unknown positional arguments and semgrep ci exited 2 without writing semgrep.sarif.
  • The downstream "check semgrep results" step then treated the missing/empty SARIF as "0 findings" and exited 0, so the check has been showing green without Semgrep actually scanning anything, in every repo/run using this workflow with a multi-ruleset semgrep_rules value, since at least 2026-06-19 (confirmed in tehw0lf/tehwol.fi's daily cron runs going back to that date).
  • Fix: split semgrep_rules on whitespace and pass --config <rule> once per entry (the form semgrep ci actually supports for multiple rulesets), and make the findings-check step fail closed if the scan step itself failed, instead of trusting SARIF presence/emptiness alone.

Test plan

  • YAML parses (python3 -c "import yaml; yaml.safe_load(...)")
  • Verified the new arg-splitting bash logic locally produces --config auto --config p/security-audit --config p/owasp-top-ten --config p/ci --config p/nodejs --config p/typescript from the existing semgrep_rules default used by callers
  • Confirmed root cause by pulling the actual failing logs from tehw0lf/tehwol.fi's Security Monitoring workflow runs (both cron and a triggering PR run) — identical semgrep ci: too many arguments error and Invalid SARIF follow-on error in every run since 2026-06-19
  • Will re-verify against a live caller once merged (e.g. re-run WoWQuote2-Manager's PR-triggered security scan, which currently exhibits this exact bug)

Summary by CodeRabbit

  • Bug Fixes
    • Improved the security scan process to handle multiple rules more reliably.
    • Added a clearer failure path when the scan does not complete successfully, helping surface issues sooner.

semgrep ci only accepts a single --config value; passing a
space-separated ruleset string (e.g. "auto p/security-audit
p/owasp-top-ten p/ci p/nodejs p/typescript") as one --config= argument
made the remaining words parse as unknown positional arguments,
so semgrep ci exited 2 and never wrote semgrep.sarif.

The downstream findings check then silently treated the missing/empty
SARIF as "0 findings" and reported success, so every caller of this
reusable workflow has been getting a green Semgrep check without an
actual scan running since at least 2026-06-19.

Fix: split semgrep_rules on whitespace and repeat --config per rule
(the flag semgrep ci actually supports for multiple rulesets), and
make the findings check fail closed if the scan step itself failed
instead of only trusting the SARIF file's presence.
@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

The Semgrep security scan workflow was updated to build a config_args array from iterating over inputs.semgrep_rules, passing multiple --config arguments to semgrep ci. The results-checking step now fails early with an error if the semgrep step outcome is not success.

Changes

Semgrep Scan Workflow Update

Layer / File(s) Summary
Multi-rule config and early failure guard
.github/workflows/security-scan-source.yml
Semgrep scan now loops over inputs.semgrep_rules to build multiple --config arguments for semgrep ci, and the results checker exits with an error before SARIF checks if the semgrep step did not succeed.

Estimated code review effort: 1 (Trivial) | ~5 minutes

Estimated code review effort: 1 (Trivial) | ~5 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main workflow fix: passing each semgrep_rules entry as its own --config flag.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/semgrep-multi-config-args

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
.github/workflows/security-scan-source.yml (1)

116-116: 🔒 Security & Privacy | 🔵 Trivial | 💤 Low value

Minor: same template-in-shell pattern, lower risk.

steps.semgrep.outcome is a fixed GitHub-controlled enum, so this expansion is not attacker-influenced, but moving it to env: matches the fix used elsewhere and avoids further zizmor noise.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/security-scan-source.yml at line 116, The semgrep outcome
check in the workflow still uses a direct GitHub expression inside the shell
condition; move the `steps.semgrep.outcome` value into `env:` for the job or
step and reference that environment variable in the `if` test instead. Update
the existing shell guard around `steps.semgrep.outcome` so it follows the same
pattern used elsewhere in the workflow and removes the template-in-shell
warning.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/security-scan-source.yml:
- Around line 104-109: The shell loop in the Semgrep step is expanding
inputs.semgrep_rules directly inside the run script, creating a
template-to-shell injection risk. Move semgrep_rules into an env value for the
step and have the script read that variable instead, keeping the configuration
assembly in the same semgrep ci invocation logic but removing direct GitHub
expression interpolation from the shell body.
- Around line 112-120: The new semgrep guard in the semgrep-check step is
treating normal non-zero exits from semgrep ci as a tool failure, which skips
the existing findings summary logic. Update the check around
steps.semgrep.outcome in the security-scan-source workflow so it only fails on
true execution problems such as missing or invalid semgrep.sarif, and let the
total/suppressed/unsuppressed counting path run for real findings. Keep the
semgrep-check step and the semgrep.sarif validation logic aligned so the
detailed breakdown still reports unsuppressed findings correctly.

---

Nitpick comments:
In @.github/workflows/security-scan-source.yml:
- Line 116: The semgrep outcome check in the workflow still uses a direct GitHub
expression inside the shell condition; move the `steps.semgrep.outcome` value
into `env:` for the job or step and reference that environment variable in the
`if` test instead. Update the existing shell guard around
`steps.semgrep.outcome` so it follows the same pattern used elsewhere in the
workflow and removes the template-in-shell warning.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 94b658b3-8983-4fe9-af4e-3d069ffb8c69

📥 Commits

Reviewing files that changed from the base of the PR and between 685fea9 and 5df5c29.

📒 Files selected for processing (1)
  • .github/workflows/security-scan-source.yml

Comment on lines +104 to +109
run: |
config_args=()
for rule in ${{ inputs.semgrep_rules }}; do
config_args+=(--config "$rule")
done
semgrep ci "${config_args[@]}" --sarif > semgrep.sarif

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Workflow input spliced directly into shell (template injection).

inputs.semgrep_rules is expanded directly inside the run: script body. If this reusable workflow can be triggered with an untrusted/attacker-influenced value for semgrep_rules (e.g., from a fork PR context), this allows arbitrary shell command injection into a step with security-events: write / actions: read permissions. zizmor also flags this at line 106.

🔒 Proposed fix using env to break the template-to-shell boundary
       - name: semgrep security scan
         id: semgrep
+        env:
+          SEMGREP_RULES_INPUT: ${{ inputs.semgrep_rules }}
         run: |
           config_args=()
-          for rule in ${{ inputs.semgrep_rules }}; do
+          for rule in $SEMGREP_RULES_INPUT; do
             config_args+=(--config "$rule")
           done
           semgrep ci "${config_args[@]}" --sarif > semgrep.sarif
         continue-on-error: true
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
run: |
config_args=()
for rule in ${{ inputs.semgrep_rules }}; do
config_args+=(--config "$rule")
done
semgrep ci "${config_args[@]}" --sarif > semgrep.sarif
env:
SEMGREP_RULES_INPUT: ${{ inputs.semgrep_rules }}
run: |
config_args=()
for rule in $SEMGREP_RULES_INPUT; do
config_args+=(--config "$rule")
done
semgrep ci "${config_args[@]}" --sarif > semgrep.sarif
🧰 Tools
🪛 zizmor (1.26.1)

[error] 106-106: code injection via template expansion (template-injection): may expand into attacker-controllable code

(template-injection)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/security-scan-source.yml around lines 104 - 109, The shell
loop in the Semgrep step is expanding inputs.semgrep_rules directly inside the
run script, creating a template-to-shell injection risk. Move semgrep_rules into
an env value for the step and have the script read that variable instead,
keeping the configuration assembly in the same semgrep ci invocation logic but
removing direct GitHub expression interpolation from the shell body.

Source: Linters/SAST tools

Comment on lines +112 to +120
- name: check semgrep results for unsuppressed findings
id: semgrep-check
if: always()
run: |
if [ "${{ steps.semgrep.outcome }}" != "success" ]; then
echo "❌ Semgrep scan step failed, treating as a failed check"
exit 1
fi

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Early-exit guard suppresses the detailed findings breakdown for the common "real findings" path.

semgrep ci exits with code 1 whenever it detects blocking findings — this is the normal/expected outcome whenever the scan actually finds something, not just when the tool crashes (exit code 2). Since steps.semgrep.outcome reflects the raw exit status even with continue-on-error: true, this new guard will short-circuit with a generic "scan step failed" message for real unsuppressed findings too, before the script ever reaches the total/suppressed/unsuppressed counting and messaging at lines 128-145. That logic was specifically designed to distinguish "all findings suppressed, pass" from "unsuppressed findings, fail" — a distinction this guard now bypasses whenever semgrep ci exits non-zero for legitimate findings.

Consider gating the early-exit on an actual tool-failure signal (e.g., missing/empty/invalid SARIF file) rather than the raw step outcome, since the existing [ ! -f semgrep.sarif ] check at Line 122 already covers the "scan never produced output" failure mode described in the PR objective.

🐛 Suggested approach: validate SARIF content instead of raw outcome
       - name: check semgrep results for unsuppressed findings
         id: semgrep-check
         if: always()
         run: |
-          if [ "${{ steps.semgrep.outcome }}" != "success" ]; then
-            echo "❌ Semgrep scan step failed, treating as a failed check"
-            exit 1
-          fi
-
           # Check if semgrep.sarif exists and has findings
-          if [ ! -f semgrep.sarif ]; then
-            echo "No SARIF file generated, failing..."
+          if [ ! -s semgrep.sarif ] || ! jq -e '.runs' semgrep.sarif >/dev/null 2>&1; then
+            echo "❌ Semgrep scan step outcome: ${{ steps.semgrep.outcome }}; no valid SARIF produced, failing..."
             exit 1
           fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: check semgrep results for unsuppressed findings
id: semgrep-check
if: always()
run: |
if [ "${{ steps.semgrep.outcome }}" != "success" ]; then
echo "❌ Semgrep scan step failed, treating as a failed check"
exit 1
fi
- name: check semgrep results for unsuppressed findings
id: semgrep-check
if: always()
run: |
# Check if semgrep.sarif exists and has findings
if [ ! -s semgrep.sarif ] || ! jq -e '.runs' semgrep.sarif >/dev/null 2>&1; then
echo "❌ Semgrep scan step outcome: ${{ steps.semgrep.outcome }}; no valid SARIF produced, failing..."
exit 1
fi
🧰 Tools
🪛 zizmor (1.26.1)

[info] 116-116: code injection via template expansion (template-injection): may expand into attacker-controllable code

(template-injection)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/security-scan-source.yml around lines 112 - 120, The new
semgrep guard in the semgrep-check step is treating normal non-zero exits from
semgrep ci as a tool failure, which skips the existing findings summary logic.
Update the check around steps.semgrep.outcome in the security-scan-source
workflow so it only fails on true execution problems such as missing or invalid
semgrep.sarif, and let the total/suppressed/unsuppressed counting path run for
real findings. Keep the semgrep-check step and the semgrep.sarif validation
logic aligned so the detailed breakdown still reports unsuppressed findings
correctly.

…flate findings with tool failure

- Move inputs.semgrep_rules into env (SEMGREP_RULES) instead of interpolating
  the GitHub expression directly into the shell script, avoiding the
  template-injection pattern flagged by zizmor/CodeRabbit.
- Replace the semgrep step-outcome guard with a direct SARIF validity check
  (file non-empty and parses as JSON via `jq empty`). semgrep ci exits
  non-zero both on tool/config errors and whenever it has findings, so
  checking `steps.semgrep.outcome != success` would have also treated a
  successful scan with real findings as a failed scan and skipped the
  findings breakdown entirely.
@tehw0lf tehw0lf merged commit ddb54a4 into main Jul 1, 2026
2 checks passed
@tehw0lf tehw0lf deleted the fix/semgrep-multi-config-args branch July 1, 2026 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant