Skip to content

feat(status): show sacct accounting fields for completed Slurm jobs#14

Merged
ultimatile merged 2 commits into
mainfrom
feat/5-sacct-status
May 8, 2026
Merged

feat(status): show sacct accounting fields for completed Slurm jobs#14
ultimatile merged 2 commits into
mainfrom
feat/5-sacct-status

Conversation

@ultimatile
Copy link
Copy Markdown
Owner

Summary

  • hpc status <id> on a terminal Slurm job now prints ExitCode, Elapsed, MaxRSS, and ReqMem alongside the raw sacct State string, fetched via a single sacct -j <id> --format=JobID,State,ExitCode,Elapsed,MaxRSS,ReqMem --noheader -P call.
  • The polling path (status_cmd / parse_status / wait_for_job) is left untouched. The new detail path is consumed only by cli.status.
  • PJM (and any newly-submitted Slurm job that sacct has not yet recorded) falls back to the prior single-line display.
  • MaxRSS is selected as the unit-aware maximum across all non-empty step rows, so jobs that dispatch via srun (where .batch only reflects the launcher) report the workload's real RSS rather than the launcher's small footprint.
  • The raw sacct State string is preserved verbatim in the output (e.g. OUT_OF_MEMORY, CANCELLED+, CANCELLED by 12345); the JobStatus enum is not extended.

Test plan

  • uv run pytest — 138 passed (baseline 119 + 19 new).
  • uv run ruff check — All checks passed.
  • uv run ruff format --check — clean.
  • uv run pyright src/hpc/ — 0 errors, 0 warnings.

Closes #5

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Slurm-specific “detail” path for hpc status <jobid> that surfaces useful sacct accounting fields for completed jobs, while keeping the existing polling/status logic unchanged for other call sites.

Changes:

  • Introduced JobDetail plus Slurm sacct detail command + parser (including unit-aware MaxRSS selection across steps).
  • Added JobManager.get_job_detail() and updated cli.status to render accounting fields for terminal Slurm states (with fallback when detail is unavailable / unsupported).
  • Added comprehensive unit tests for scheduler parsing, JobManager behavior, and CLI output formatting/fallbacks.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/test_scheduler.py Adds unit tests for Slurm detail_cmd and parse_detail, plus PJM “detail not supported” coverage.
tests/test_job.py Adds tests for JobManager.get_job_detail() behavior for Slurm vs PJM.
tests/test_cli.py Adds CLI-level tests verifying detailed output for terminal jobs and fallback behavior.
src/hpc/scheduler.py Adds JobDetail, Slurm sacct detail command, and parsing logic (including unit-aware MaxRSS max across steps).
src/hpc/job.py Adds get_job_detail() API to retrieve scheduler accounting details.
src/hpc/cli.py Updates status to display sacct details for terminal Slurm states and preserve raw state strings.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ultimatile added 2 commits May 8, 2026 18:45
`hpc status <id>` on a terminal Slurm job now also prints ExitCode,
Elapsed, MaxRSS, and ReqMem alongside the raw State, fetched via a
single `sacct -P` call. The polling path used by `wait_for_job` is
left untouched. PJM falls back to the prior single-line display.

Closes #5
The previous logic preferred the `.batch` row's MaxRSS whenever it was
non-empty. For workloads dispatched via `srun` (typical for MPI / GPU
jobs) `.batch` only reports the launcher's RSS, so this would
under-report the actual peak memory living on numbered step rows.
Use a unit-aware max across all non-empty step rows instead.
@ultimatile ultimatile force-pushed the feat/5-sacct-status branch from 706a2b8 to 336c354 Compare May 8, 2026 09:46
@ultimatile ultimatile merged commit ac0cb48 into main May 8, 2026
@ultimatile ultimatile deleted the feat/5-sacct-status branch May 8, 2026 10:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integrate sacct info for completed jobs into hpc status

2 participants