v0.8.0 made stack install server-mediated but the endpoint still
blocked end-to-end on `StackManager.install`, which meant long preps
risked Cloudflare-Access cutting the tunnel and the operator had no
way to see progress until the call returned. Reframed as: a stack
install is a single-task workflow. Submission returns a run_id
immediately; the rest of the system (`scripthut run view / logs /
watch`, the runs page, cancellation, retries) handles it like every
other run. No new observability surface, no new timeout concerns.
Schema-level reuse only — no new endpoints, no `RunStatus` mutations,
no parallel scheduler integration. The install just *is* a run.
RunManager:
- New `_synthesize_stack_install_command(stack, hash, rebuild)`
emits the bash that `StackManager.install` would have run:
`.ready` sentinel check + early exit, cleanup on rebuild or
half-built dir, `set -euo pipefail` + mkdir + export STACK_DIR +
prep + touch ready. Idempotency lives in the script — already-
ready installs exit 0 immediately and still show up in `run list`
as a successful run ("I tried to install julia, it was already
there"). `~` is substituted to `$HOME` at synthesis time so
tilde expansion works regardless of how bash quotes the path.
Empty-prep is a legitimate degenerate case (init-only stacks)
and produces a minimal mkdir + touch.
- New `create_run_from_stack(stack, backend, *, rebuild=False,
source_name=None)` synthesizes the command, builds a
TaskDefinition carrying the stack's `cpus`/`memory`/`time_limit`/
`partition` so heavy installs actually get the allocation they
need, and submits via the existing `_build_run` path —
`_stack/<name>` workflow_name (or `_stack/<source>/<name>` when
the install is for a repo-defined stack) so the runs page shows
provenance.
API:
- `POST /api/v1/stacks/{name}/install` now returns the `_run_summary`
shape (`id`, `workflow_name`, `backend_name`, `status`, …) instead
of a `StackStatus`. Non-blocking submission; the install runs at
its own pace on the backend. 422 on backend-unavailable
(`ValueError` from create_run_from_stack), 5xx on unexpected
failures during submission, 404 on unknown stack — same shape as
the other run-submission endpoints.
- `state.notify_poll()` invoked after submission so dashboard
pollers refresh promptly.
- `check` and `delete` endpoints unchanged (single fast SSH ops,
no benefit from queueing).
CLI:
- `stack install --backend <b>` in server mode submits and prints
the run_id + a hint pointing at `scripthut run watch <id>
--exit-status` and `scripthut run logs <id> install-XXXXXX -f`.
Exits 0 on successful submission, not on successful install —
matching the existing `task run` / `workflow run` convention.
- New `--watch` flag: re-enters `_cmd_run_watch` after submission so
one command does "install and wait" for interactive use, exiting
non-zero on install failure.
- New `--json` flag: prints the submitted-run summary so scripts can
`RUN_ID=$(scripthut stack install ... --json | jq -r .id)`.
- New `--interval` for `--watch`'s polling cadence.
- Remote timeout dropped from 30 min to 60 s — submission is fast
now, so a long timeout would only hide network problems.
- Local mode unchanged: still uses `StackManager.install` directly
via `_run_per_backend`, blocking. Asymmetric but acceptable —
local mode is the legacy/direct path; the workflow-run reframe
was driven entirely by the remote-mode pain.
Tests:
- New `tests/test_stack_as_workflow.py` (10) — synthesized bash
contains sentinel check, rebuild flag, prep body, set -euo
pipefail, STACK_DIR export, touch in the right order; tilde-to-
$HOME substitution; absolute cache_dir passes through; rebuild
flag changes the emitted REBUILD value; empty-prep degenerate
case; create_run_from_stack carries the stack's resources;
workflow_name includes source for provenance; unknown / unavailable
backend raise ValueError; command threaded into the TaskDefinition.
- `tests/test_api_v1.py` — 5 existing install tests rewritten to
expect the run-summary shape and assert
`RunManager.create_run_from_stack` was called with the right
kwargs. A new test confirms `source_name` is forwarded for the
workflow_name label.
- `tests/test_cli.py` — 3 new install routing tests: submit prints
run_id + hint and exits 0; `--watch` re-enters `_cmd_run_watch`
with `exit_status=True`; `--json` prints the run summary verbatim
for jq composition.
- Agent prompt's Stacks section rewritten: local mode = blocking,
server mode = submit-as-workflow with run_id + run watch /
run logs / run view as the monitoring path. The long-installs
caveat is gone (not a concern anymore).
531/531 in the broad sweep + 156 in the backend-specific suites pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>