Skip to content

fix(submit): honor #SBATCH/#PJM directives in user-supplied scripts#11

Merged
ultimatile merged 3 commits into
mainfrom
fix/6-script-directives
May 8, 2026
Merged

fix(submit): honor #SBATCH/#PJM directives in user-supplied scripts#11
ultimatile merged 3 commits into
mainfrom
fix/6-script-directives

Conversation

@ultimatile
Copy link
Copy Markdown
Owner

Closes #6.

Summary

#SBATCH (Slurm) / #PJM (PJM) directives written in scripts passed via hpc submit -s script.sh were silently ignored. The job-script template placed user content below cd workdir and env-setup lines, past the point where sbatch / pjsub stop scanning prologue directives — so user directives ended up as ordinary comments.

Root cause

src/hpc/job.py's JOB_TEMPLATE rendered the user-supplied script content ({{ cmd }}) at the bottom, after the executable cd job_workdir line. Slurm and PJM both stop scanning #SBATCH / #PJM directives at the first non-comment, non-blank, non-directive line of the script, so the user's directives were never seen.

Fix

Hoist column-zero ^#SBATCH\b / ^#PJM\b lines out of the user script's prologue and emit them in the rendered job-script prologue. The extractor mirrors the schedulers' own prologue-scan rule:

  • a leading #! shebang is dropped (the template emits its own)
  • blank lines and non-directive comment lines are kept in the body but do not terminate the scan
  • the matching directive lines are extracted and removed from the body
  • the first executable line ends the scan; everything from there onward is preserved verbatim (so directives inside heredocs or after the user's first command are not falsely hoisted)

User-supplied directives are emitted after config-derived [slurm.options] / [pjm.options] so they win on conflict via the schedulers' last-occurrence-wins semantics. The template's hardcoded --output= / --error= lines stay last, so hpc's run-tracking output paths are not user-overrideable.

Test plan

  • _extract_prologue_directives helper unit tests (13 cases): empty input, shebang only / shebang + body, no shebang, Slurm and PJM extraction, prefix isolation between schedulers, post-executable directive not hoisted, heredoc directive-look-alike not hoisted, blank lines between directives preserved, non-directive comments kept in body, indented directive not hoisted, missing trailing newline preserved
  • integration tests on _render_job_script (5 cases): Slurm hoist with bookkeeping ordering, PJM multi-directive hoist with order preservation, conflict ordering for both schedulers, no-directive no-op
  • legacy submit_job path also hoists user directives (consistency between the two submission code paths)
  • full suite: 138 passed (baseline 119 + 19 new)
  • pyright src/hpc/: 0 errors, 0 warnings; ruff check, ruff format --check: clean

Out of scope

  • The PJM --output= / --error= lines emitted by the template currently use Slurm syntax (PJM expects -o / -e). Pre-existing bug, separate issue.
  • Replacing cd job_workdir with Slurm --chdir / PJM -d would be an architectural cleanup but does not by itself fix the prologue-scan-termination problem (env-setup lines remain). Future work.
  • Empirical verification of "last-occurrence wins" on real Slurm + Fugaku PJM clusters is deferred to post-deploy. If a scheduler turns out to be first-wins, the emit order needs to be flipped for that scheduler.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes hpc submit -s/--script so scheduler directives (#SBATCH / #PJM) written in the user-provided script are not accidentally placed after executable lines in the rendered wrapper script (where Slurm/PJM stop scanning directives).

Changes:

  • Add _extract_prologue_directives() to hoist column-zero scheduler directives from the user script’s prologue into the rendered job-script prologue.
  • Update job-script rendering for both tracked runs (submit_run / _render_job_script) and the legacy submit_job path to include hoisted user directives.
  • Add unit + integration tests covering directive extraction and ordering/precedence behavior; document the behavior in README and the Claude skill.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/hpc/job.py Implements directive hoisting and renders user directives in the job prologue for both submission paths.
tests/test_job.py Adds unit tests for the extractor and integration tests asserting correct prologue ordering/precedence.
README.md Documents that script-top directives are honored and explains precedence semantics.
.claude/skills/hpc/SKILL.md Updates the “writing scripts” guidance to mention directive hoisting and precedence.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md Outdated
Comment thread .claude/skills/hpc/SKILL.md Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread src/hpc/job.py
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread src/hpc/job.py
ultimatile added 3 commits May 8, 2026 18:53
The job-script template placed user content below `cd workdir` and
env-setup lines, where `sbatch` and `pjsub` have already stopped
scanning prologue directives. Directives written at the top of a
script passed via `hpc submit -s script.sh` were therefore silently
dropped.

Hoist column-zero `#SBATCH` (Slurm) / `#PJM` (PJM) lines from the
user script's prologue into the rendered job-script prologue,
mirroring the schedulers' own prologue-scan rule (shebang + blank +
comment + matching directive lines, stopping at the first executable
line). Directive-look-alike content past the first executable line
or inside heredocs is left in the body untouched.

User-supplied directives are emitted after config-derived ones so
they win on conflict via last-occurrence-wins; hpc's bookkeeping
`--output` / `--error` lines stay last so run-tracking output paths
are not overrideable.
PJM `options` is parsed as a list of lists, so it cannot live under a
TOML table header `[pjm.options]` — that syntax is for tables. Refer
to it as the `pjm.options` array (dotted-path) instead.
The previous extractor treated indented `# comment` lines and
whitespace-only lines as bash-style comments/blanks and continued
the prologue scan past them. Slurm and PJM are stricter: only
column-zero `#` lines are comments, and only truly-blank lines
(`\n` / `\r\n`) are blank. With the looser rule, hpc could hoist a
directive that the scheduler would have ignored if the script were
submitted directly via `sbatch script.sh` / `pjsub script.sh`.

Treat indented comments and whitespace-only lines as scan-
terminating, matching the schedulers' own column-sensitive rule.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

#SBATCH directives in submitted script are silently ignored

2 participants