ci: run plugin Python test suites (catches the PR #125 regression) by tob-joe · Pull Request #147 · trailofbits/skills

tob-joe · 2026-04-14T23:55:09Z

Summary

Adds a python-tests job to lint.yml, parallel to the existing bats job. Discovers test_*.py / *_test.py under plugins/ and runs each as a script. Currently exercises:

plugins/constant-time-analysis/ct_analyzer/tests/test_analyzer.py (unittest, 60 tests)
plugins/let-fate-decide/skills/let-fate-decide/scripts/test_draw_cards.py (stdlib self-runner, 27 tests)

Both test files already existed; no CI job invoked them.

Motivation

PR #125 ("replace os.urandom with secrets.randbelow") removed import os but left os.path calls inside draw(). Ruff F821 catches this today, but the check went green at merge time (the PR's head commit was pushed 12 days before merge and wasn't re-run against drifted main). The existing test_draw_* / test_cli_* tests in test_draw_cards.py would also have caught it — draw() raises NameError on every call — but no CI job ran them. PR #144 fixes the one-line import.

A Python-tests CI job would have caught the regression independently of ruff.

Expected CI behavior on this PR

The python-tests job will FAIL on this PR until #144 merges. That failure is the point — it demonstrates the regression. Once #144 lands on main, rebase this PR and the job goes green. Suggested merge order:

Merge let-fate-decide: add missing import #144 (one-line import os fix)
Rebase & merge this PR

Or merge both together; either way the end state is green.

Design notes

Executes tests as scripts (python3 path/to/test.py) rather than invoking pytest. Matches how the existing tests are authored (each has an if __name__ == '__main__': main block or uses unittest's auto-discovery), avoids pytest rootdir/conftest surprises, and keeps the job dependency-free.
find discovery means new plugin test files are picked up automatically — no per-plugin config.
Job uses ::group:: folding so each test file's output is collapsed in the Actions UI.

Test plan

Locally, discovery finds both existing test files
Both pass on the let-fate-decide: add missing import #144 fix branch (let-fate-decide-fix)
test_draw_cards.py fails on current main (the let-fate-decide: replace os.urandom with secrets.randbelow #125 bug), test_analyzer.py passes
CI fails on this PR (expected) until let-fate-decide: add missing import #144 merges

claude

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

_{Tip: disable this comment in your organization's Code Review settings.}

Discovers test_*.py / *_test.py files under plugins/ and executes each one as a script. Matches the style of the existing bats job. Works today for: - plugins/constant-time-analysis/ct_analyzer/tests/test_analyzer.py - plugins/let-fate-decide/.../scripts/test_draw_cards.py Both test files already exist in the repo but no CI job invoked them. As a result, PR #125 (which broke let-fate-decide by removing `import os` while leaving `os.path` calls in draw()) merged with green CI even though the existing `test_draw_*` / `test_cli_*` tests would have caught it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…nalysis TestCrossArchitecture.test_cross_compile_arm64 invokes clang with --target=aarch64-unknown-linux-gnu, which needs the aarch64 libc headers. Without them clang fails with: fatal error: 'bits/libc-header-start.h' file not found Install gcc-aarch64-linux-gnu + libc6-dev-arm64-cross so clang can find the cross headers. Also install clang explicitly since the runner may not have it preinstalled. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Resolves code review findings on PR #147 (comment accuracy only — no behavior change): - Rewrite the toolchain dependencies comment. The previous wording ("aarch64 cross gcc pulls in libc6-dev-arm64-cross") implied a transitive dependency, but `--no-install-recommends` suppresses Recommends, so libc6-dev-arm64-cross is installed only because it is listed explicitly. New comment names what each package supplies. - Document why `set -uo pipefail` deliberately omits -e (the loop collects per-file failures and exits with a combined code). Reviewers (codex + gemini + pr-review-toolkit agents) flagged 13 findings total; 11 were dismissed (false positives, design choices matching the bats job, or speculative). Quality pipeline (actionlint, zizmor, shellcheck, pre-commit, plugin validators, both Python test suites) all pass locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dguido · 2026-04-16T22:14:20Z

Review Summary

Three review passes ran in parallel against the head commit (f37d54e): the pr-review-toolkit skill (code-reviewer, silent-failure-hunter, comment-analyzer), Codex (gpt-5.3-codex at xhigh reasoning), and Gemini (default model after gemini-3-pro-preview quota was exhausted). 13 findings total — 2 fixed, 11 dismissed.

Findings

#	Severity	Source(s)	Finding	Resolution
1	P2	comment-analyzer	Toolchain comment misstates package dep direction. With `--no-install-recommends`, gcc does NOT "pull in" libc6-dev-arm64-cross; libc is installed only because it is listed explicitly.	Fixed — rewrote the comment to accurately describe what each package supplies (libc6-dev-arm64-cross → headers/crt; gcc-aarch64-linux-gnu → cross linker via Depends on binutils).
2	P2	silent-failure-hunter, comment-analyzer	`set -uo pipefail` (no `-e`) is intentional but undocumented; future contributor may "fix" it to `-euo pipefail` and break the per-test failure aggregation.	Fixed — added a one-line comment above `set -uo pipefail` explaining why `-e` is omitted.
3	P3	gemini, code-reviewer	Use `uv run` instead of `python3` so PEP 723 inline metadata is honored and tests with deps work seamlessly.	Dismissed — author explicitly designed for stdlib-only ("keeps the job dependency-free", per PR description). Both current test files import only stdlib + local modules. Switching to `uv run` would override an explicit design choice for a forward-looking concern that doesn't apply today.
4	P3	silent-failure-hunter	`mapfile -d '' files < <(find ...)` silently swallows `find` exit code; missing `plugins/` dir would yield empty array.	Dismissed — `plugins/` is guaranteed to exist after a clean `actions/checkout`. The defensive wrap (`find ... \|\| exit 1`) is over-engineering for a near-zero risk.
5	P3	silent-failure-hunter, comment-analyzer	"No test files found" treated as success could hide accidental test removal.	Dismissed — the sibling `bats` job uses `xargs --no-run-if-empty` (same silent-success pattern). Changing only python-tests would create asymmetry. If we want fail-on-empty, both jobs should change together (out of scope).
6	P3	silent-failure-hunter	`python3 test_draw_cards.py` allegedly runs zero assertions because the file lacks a `unittest.main()` block.	Dismissed (false positive) — verified locally: `test_draw_cards.py` has a custom `__main__` block (lines 226-241) that auto-discovers and runs all `test_*` functions. Local run reports `27 passed, 0 failed`. CI run (25s) confirms tests execute.
7	P4	silent-failure-hunter	Per-file failure detail is collapsed into a single exit bit.	Dismissed — `::group::` folding in the GHA UI auto-expands the failing group, so the failing file is already obvious in the logs.
8	P4	silent-failure-hunter, gemini	`apt-get update` partial-failure modes; `apt-get install` could install stale cached version silently.	Dismissed — speculative; GHA runner package mirrors are stable and this risk is theoretical.
9	P4	silent-failure-hunter	No `timeout-minutes` on the test step.	Dismissed — the existing `bats` job has the same omission, so this is not a regression introduced by this PR. Worth fixing both together in a follow-up.
10	P4	code-reviewer	`find` may descend into `__pycache__` / `.venv` / `node_modules` if any contributor commits or generates them.	Dismissed — these directories are gitignored and won't appear on a fresh CI checkout (the agent acknowledged this).
11	P4	code-reviewer	aarch64 cross toolchain (~200MB) installed on every run; could be conditional via `paths:` filter or workflow split.	Dismissed — adds workflow complexity for a marginal time saving. The 25s CI runtime is acceptable.
12	P4	code-reviewer	`exit "$failed"` could be a literal `exit 1` / `exit 0`.	Dismissed — pure stylistic; current code is correct.
13	P4	gemini	If test count grows, switch to `pytest`.	Dismissed — premature for two test files; the PR description explicitly justifies the script-based approach.

Codex returned 0 findings.

Verification

actionlint: clean
zizmor: clean (no findings)
shellcheck (on the inline script): clean
pre-commit (on the modified file): all hooks pass (check-yaml, end-of-file-fixer, trim trailing whitespace)
Plugin validators (validate_codex_skills.py, validate_plugin_metadata.py): clean (72 plugin skills against 73 Codex entries; 38 plugins in sync)
Python tests locally:
- plugins/let-fate-decide/skills/let-fate-decide/scripts/test_draw_cards.py: 27 passed, 0 failed
- plugins/constant-time-analysis/ct_analyzer/tests/test_analyzer.py: 60 passed, 1 skipped
YAML syntax: valid

Commit

90e8b21 — ci: clarify python-tests workflow comments

tob-joe requested a review from dguido as a code owner April 14, 2026 23:55

claude Bot reviewed Apr 14, 2026

View reviewed changes

tob-joe and others added 2 commits April 14, 2026 20:30

tob-joe force-pushed the add-python-tests-ci branch from 59a4b5e to f37d54e Compare April 15, 2026 00:30

dguido merged commit e8cc5ba into main Apr 16, 2026
7 checks passed

dguido deleted the add-python-tests-ci branch April 16, 2026 22:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: run plugin Python test suites (catches the PR #125 regression)#147

ci: run plugin Python test suites (catches the PR #125 regression)#147
dguido merged 3 commits intomainfrom
add-python-tests-ci

tob-joe commented Apr 14, 2026

Uh oh!

claude Bot left a comment

Uh oh!

dguido commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tob-joe commented Apr 14, 2026

Summary

Motivation

Expected CI behavior on this PR

Design notes

Test plan

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

dguido commented Apr 16, 2026

Review Summary

Findings

Verification

Commit

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants