Skip to content

v0.11.1

@tlamadon tlamadon tagged this 04 Jun 19:18
v0.11.0 shipped the env-var contract ($SCRIPTHUT_OUTPUT_DIR /
$SCRIPTHUT_TASK_SUMMARY / $SCRIPTHUT_RUN_SUMMARY) but the agent
prompt didn't mention it. Agents reading the briefing had no way to
know the feature exists, so the UI panels they could produce just by
appending a few lines to their task commands stayed hidden.

New "Emitting structured outputs (plots, summaries, tables)" section
covers:

- All three env vars with what each one renders to (Outputs subtab
  vs. Run Summary card).
- A worked bash example the agent can paste/adapt — calls savefig
  into $SCRIPTHUT_OUTPUT_DIR, writes a markdown table to
  $SCRIPTHUT_TASK_SUMMARY (with the image referenced via relative
  ![](loss.png) so the agent sees the rewrite contract working),
  and a one-line per-task TL;DR to $SCRIPTHUT_RUN_SUMMARY.
- The behaviors that bite hardest if the agent doesn't know them:
  - SSH-only scope (Slurm + PBS) — don't promise UI panels on
    Batch / EC2, where the exports are silently ignored.
  - Post-completion collection, not streaming — Outputs appears
    when the item leaves SETTLING → COMPLETED, not while running.
  - Relative <img src> is rewritten to the file endpoint, so
    ![](plot.png) authored next to where plot.png was saved Just
    Works.
  - Sanitization via bleach — safe for any markdown the agent
    authors; agents don't need to escape anything themselves.
  - Size limits: 5 MB per file, 200 files per task, 1 MB markdown-
    render budget. A training-loop-per-iteration plot pattern blows
    through these immediately; suggest a summary image instead.
  - mkdir -p happens for you; no preflight needed.
  - Empty tasks are first-class — no panel, no errors. Don't add
    no-op `touch` calls.
- When-to-use guidance for the two surfaces: per-task vs. run-wide,
  with the explicit "don't duplicate" rule (write in run-summary,
  link from task-summary).

Pinned by ``TestTaskOutputsGuidance`` in test_agent_prompt.py (5
tests): section header present, all three env vars named, per-task
vs. run-wide distinction taught, the critical caveats (SSH-only,
post-completion, 5 MB / 200-file caps) surfaced, and the bash
example contains the write pattern (redirects, not just the path
names). Drift in any of these now fails CI.

585/585 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Assets 2
Loading