Skip to content

v0.9.0

@tlamadon tlamadon tagged this 04 Jun 01:04
v0.8.0 made stack install server-mediated but the endpoint still
blocked end-to-end on `StackManager.install`, which meant long preps
risked Cloudflare-Access cutting the tunnel and the operator had no
way to see progress until the call returned. Reframed as: a stack
install is a single-task workflow. Submission returns a run_id
immediately; the rest of the system (`scripthut run view / logs /
watch`, the runs page, cancellation, retries) handles it like every
other run. No new observability surface, no new timeout concerns.

Schema-level reuse only — no new endpoints, no `RunStatus` mutations,
no parallel scheduler integration. The install just *is* a run.

RunManager:
- New `_synthesize_stack_install_command(stack, hash, rebuild)`
  emits the bash that `StackManager.install` would have run:
  `.ready` sentinel check + early exit, cleanup on rebuild or
  half-built dir, `set -euo pipefail` + mkdir + export STACK_DIR +
  prep + touch ready. Idempotency lives in the script — already-
  ready installs exit 0 immediately and still show up in `run list`
  as a successful run ("I tried to install julia, it was already
  there"). `~` is substituted to `$HOME` at synthesis time so
  tilde expansion works regardless of how bash quotes the path.
  Empty-prep is a legitimate degenerate case (init-only stacks)
  and produces a minimal mkdir + touch.
- New `create_run_from_stack(stack, backend, *, rebuild=False,
  source_name=None)` synthesizes the command, builds a
  TaskDefinition carrying the stack's `cpus`/`memory`/`time_limit`/
  `partition` so heavy installs actually get the allocation they
  need, and submits via the existing `_build_run` path —
  `_stack/<name>` workflow_name (or `_stack/<source>/<name>` when
  the install is for a repo-defined stack) so the runs page shows
  provenance.

API:
- `POST /api/v1/stacks/{name}/install` now returns the `_run_summary`
  shape (`id`, `workflow_name`, `backend_name`, `status`, …) instead
  of a `StackStatus`. Non-blocking submission; the install runs at
  its own pace on the backend. 422 on backend-unavailable
  (`ValueError` from create_run_from_stack), 5xx on unexpected
  failures during submission, 404 on unknown stack — same shape as
  the other run-submission endpoints.
- `state.notify_poll()` invoked after submission so dashboard
  pollers refresh promptly.
- `check` and `delete` endpoints unchanged (single fast SSH ops,
  no benefit from queueing).

CLI:
- `stack install --backend <b>` in server mode submits and prints
  the run_id + a hint pointing at `scripthut run watch <id>
  --exit-status` and `scripthut run logs <id> install-XXXXXX -f`.
  Exits 0 on successful submission, not on successful install —
  matching the existing `task run` / `workflow run` convention.
- New `--watch` flag: re-enters `_cmd_run_watch` after submission so
  one command does "install and wait" for interactive use, exiting
  non-zero on install failure.
- New `--json` flag: prints the submitted-run summary so scripts can
  `RUN_ID=$(scripthut stack install ... --json | jq -r .id)`.
- New `--interval` for `--watch`'s polling cadence.
- Remote timeout dropped from 30 min to 60 s — submission is fast
  now, so a long timeout would only hide network problems.
- Local mode unchanged: still uses `StackManager.install` directly
  via `_run_per_backend`, blocking. Asymmetric but acceptable —
  local mode is the legacy/direct path; the workflow-run reframe
  was driven entirely by the remote-mode pain.

Tests:
- New `tests/test_stack_as_workflow.py` (10) — synthesized bash
  contains sentinel check, rebuild flag, prep body, set -euo
  pipefail, STACK_DIR export, touch in the right order; tilde-to-
  $HOME substitution; absolute cache_dir passes through; rebuild
  flag changes the emitted REBUILD value; empty-prep degenerate
  case; create_run_from_stack carries the stack's resources;
  workflow_name includes source for provenance; unknown / unavailable
  backend raise ValueError; command threaded into the TaskDefinition.
- `tests/test_api_v1.py` — 5 existing install tests rewritten to
  expect the run-summary shape and assert
  `RunManager.create_run_from_stack` was called with the right
  kwargs. A new test confirms `source_name` is forwarded for the
  workflow_name label.
- `tests/test_cli.py` — 3 new install routing tests: submit prints
  run_id + hint and exits 0; `--watch` re-enters `_cmd_run_watch`
  with `exit_status=True`; `--json` prints the run summary verbatim
  for jq composition.
- Agent prompt's Stacks section rewritten: local mode = blocking,
  server mode = submit-as-workflow with run_id + run watch /
  run logs / run view as the monitoring path. The long-installs
  caveat is gone (not a concern anymore).

531/531 in the broad sweep + 156 in the backend-specific suites pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Assets 2
Loading