feat(scripts): Docker-only end-to-end /e2e command for example-libpng by ret2libc · Pull Request #552 · trailofbits/buttercup

ret2libc · 2026-05-15T13:15:33Z

What this adds

A Docker-only end-to-end smoke test of the full Buttercup pipeline against
example-libpng — no
Kubernetes/minikube. Mirrors the milestones in
.github/workflows/system-integration.yml but tails docker compose logs.

scripts/e2e.sh — brings the dev/docker-compose/ stack up, submits the
canned libpng trigger_task, and waits on the pipeline milestones
(fuzzer build → POV submitted → POV accepted → seed-gen → patch
generated/approved/passed → bundle submitted; optional SARIF).
make e2e (and make e2e E2E_ARGS=...).
.claude/commands/e2e.md — /e2e slash command wrapper.

Flags: --budget (LiteLLM per-user max budget, default $3),
--task-duration, --image-tag / BUTTERCUP_IMAGE_TAG, --no-pull,
--keep-up, --skip-wait, --sarif, per-phase timeout overrides.

Image source

By default the stack runs the prebuilt GHCR images via the
compose.prebuilt.yaml overlay (nothing built locally). --no-pull skips the
docker compose pull and uses already-present images (e.g. locally built and
tagged ghcr.io/trailofbits/buttercup/*:<tag>).

.env handling

e2e.sh regenerates dev/docker-compose/.env each run. It resolves each
value as environment → existing .env → placeholder, so manually-set
values (e.g. LANGFUSE_*) are preserved across runs instead of being
clobbered with empty/placeholder.

Dependency / merge ordering

The prebuilt path invokes
docker compose -f compose.yaml -f compose.prebuilt.yaml. The
compose.prebuilt.yaml overlay is not in this PR — it lives on the
separate compose-prebuilt branch/PR. This PR should land after or together
with that one; on its own the overlay file must already be present in
dev/docker-compose/.

Scope

e2e tooling only — .claude/commands/e2e.md, Makefile, scripts/e2e.sh.
Independent of the three pipeline fixes surfaced while building this
(buttercup-ui internal port, litellm budget enforcement, patcher task
storage), which are their own separate PRs.

Validation

This tooling was used to drive the pipeline end-to-end during development:
fuzzer build → POV submitted → POV accepted, through seed-gen and patch
generation, with budget tracking and Langfuse tracing.

🤖 Generated with Claude Code

ret2libc · 2026-05-15T14:45:50Z

Addressed CI lint/static failures in 5210208 (shellcheck SC2015 in scripts/e2e.sh).

Adds scripts/e2e.sh, `make e2e`, and a .claude/commands/e2e.md slash command that bring the Buttercup stack up via dev/docker-compose (no Kubernetes), submit the example-libpng task, and monitor the scheduler / seed-gen / patcher logs through the milestones tracked by .github/workflows/system-integration.yml (fuzzer build, POV submit/ pass, seed-gen, patch generate / approve / pass, bundle submit, and optionally SARIF). Defaults LITELLM_MAX_BUDGET to \$3 so accidental runs are cheap; tears the stack down on exit unless --keep-up is set. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The e2e driver now brings the stack up through the compose.prebuilt.yaml overlay and `docker compose pull` (tag configurable via --image-tag / BUTTERCUP_IMAGE_TAG, default "main") instead of `docker compose build`, so a run no longer depends on a working local image build (e.g. the cscope submodule / oss-fuzz base-runner build chain). - dc() applies `-f compose.yaml -f compose.prebuilt.yaml` and exports BUTTERCUP_IMAGE_TAG for every compose subcommand (pull/up/logs/down). - --no-build kept as a deprecated alias for the new --no-pull. - Teardown hint and e2e.md updated for the overlay. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

e2e.sh regenerates dev/docker-compose/.env from scratch every run, sourcing values only from environment variables. Variables not exported (notably LANGFUSE_HOST/PUBLIC_KEY/SECRET_KEY) were defaulted to empty and written back, clobbering values a user had set directly in .env. Add prev_env() and a 3-tier resolution: environment > existing .env > placeholder. Manually-set .env values (Langfuse creds, provider keys, litellm key) now survive subsequent runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace the `wait_for ... && record ok || record TIMEOUT` and `curl ... && record ok || record fail` constructs with explicit if-then-else blocks. shellcheck flagged these as SC2015 (A && B || C is not if-then-else), causing the "Lint shell scripts" step in the Static Checks workflow to fail. Behavior is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

With `set -o pipefail`, `dc logs ... | grep -m1` makes the upstream `docker compose logs` die with SIGPIPE (rc 141) once grep matches the first line; pipefail then fails the whole pipeline, so milestones whose log line appears early in a high-volume stream (e.g. seed-gen's 'Copied N files to corpus') are never registered and wait_for spins until timeout even though the milestone occurred. Capture grep output with '|| true' and test for non-empty instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Drop --no-build, --keep-up, --skip-wait, --sarif, --task-json and the per-phase --*-timeout flags. The stack now always tears down on exit; milestone timeouts are internal constants. Addresses PR #552 review: - provider-key check moved below the .env fallback so keys saved to .env on a prior run are accepted (tip is now accurate) - --task-json removed (was silently falling back to the libpng default) - trigger_task response uses mktemp + on_exit cleanup instead of a predictable /tmp/e2e_task_resp.$$ leaked on SIGINT/SIGTERM - --no-build phantom "deprecated alias" removed Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The local litellm master key is an internal detail of the docker-compose stack, not something the user should set. Remove it from the usage text and the env/.env resolution; e2e.sh now just writes the local default (sk-1234) into the generated .env. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

e2e.sh regenerates dev/docker-compose/.env every run and was always writing LANGFUSE_HOST=/PUBLIC_KEY=/SECRET_KEY= even when unset. Since .env is loaded last in compose's env_file list, an empty value silently disabled Langfuse telemetry. Now resolved env -> existing .env, and the LANGFUSE_* lines are only written when non-empty, so values the user set in .env survive across runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The pov-submit and bundle-submit waiters used "POV submission response: pov_id=" and "Bundle submission response: bundle_id=" which never match any rendered log line: the only "... submission response:" logs are logger.debug calls whose payload is an API object repr (no literal pov_id=/bundle_id=), while pov_id=/bundle_id= appear only in the separate structured summary line (logger.info) with a different prefix. Result: both milestones always timed out, so every run — including fully successful ones — wasted MILESTONE_TIMEOUT+BUNDLE_TIMEOUT and exited non-zero. Repoint both to the structured summary tokens (pov_id= / bundle_id=) and sync the marker list in .claude/commands/e2e.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ults Three defects found while verifying the pipeline end-to-end: 1. Approval one-shot race: capture_line 'competition_patch_id=' ran once right after the patch-generated milestone, but the scheduler logs that id only minutes later (after it builds+verifies+submits the patch). The capture always lost the race, so approval was always skipped and the local stack never reached Patch passed / bundle. Replace with a wait_capture() poll loop (mirrors wait_for) so approval actually fires. 2. Default --task-duration 1800 is self-defeating: build->POV->seed-gen-> patch exceeds 30 min on normal hardware, so the task expires mid-patch ("task expired/cancelled? Will discard") and never reaches patch/bundle. Default to 7200 so the task outlives the pipeline. 3. Default --budget 3 cannot reach patch/bundle: a full run through patch generation costs ~$10; $3 is exhausted around POV. Default to 10. e2e.md updated to match (defaults, the cheap --budget 3 caveat, and the poll-then-approve description). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ret2libc requested a review from hbrodin as a code owner May 15, 2026 13:15

hbrodin reviewed May 19, 2026

View reviewed changes

Comment thread scripts/e2e.sh Outdated

hbrodin reviewed May 19, 2026

View reviewed changes

Comment thread scripts/e2e.sh Outdated

hbrodin reviewed May 19, 2026

View reviewed changes

Comment thread scripts/e2e.sh Outdated

hbrodin reviewed May 19, 2026

View reviewed changes

Comment thread scripts/e2e.sh Outdated

ret2libc and others added 9 commits May 19, 2026 08:25

ret2libc force-pushed the e2e-commands branch from 66563c7 to dc77e02 Compare May 19, 2026 08:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(scripts): Docker-only end-to-end /e2e command for example-libpng#552

feat(scripts): Docker-only end-to-end /e2e command for example-libpng#552
ret2libc wants to merge 10 commits into
mainfrom
e2e-commands

ret2libc commented May 15, 2026

Uh oh!

ret2libc commented May 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ret2libc commented May 15, 2026

What this adds

Image source

.env handling

Dependency / merge ordering

Scope

Validation

Uh oh!

ret2libc commented May 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants