wait command returns prematurely for array jobs (parse_status only inspects first sacct line)

## Summary

`hpc wait <array_job_id>` returns long before the array job actually finishes. For a Slurm array job, `parse_status` reads `sacct -j <id> --format=State --noheader` and returns the status from **only the first output line**, ignoring every other array task.

## Affected code

`src/hpc/scheduler.py` (Slurm branch):

```python
def status_cmd(self, job_id: str) -> list[str]:
    return ["sacct", "-j", job_id, "--format=State", "--noheader"]

def parse_status(self, output: str) -> JobStatus:
    lines = output.strip().splitlines()
    status_str = lines[0].strip().rstrip("+") if lines else ""
    return _STATUS_MAP.get(status_str, JobStatus.FAILED)
```

`wait_for_job` in `src/hpc/job.py` checks `status in terminal_states` and returns as soon as `parse_status` reports a terminal state.

## Repro

For an array job with mixed state (some tasks COMPLETED, others still RUNNING/PENDING):

```
$ sacct -j 7701965 --format=State --noheader -X
 COMPLETED 
 COMPLETED 
 COMPLETED 
 ...
 RUNNING
 RUNNING
 PENDING
```

`parse_status` reads only the first line (`COMPLETED`), so `wait_for_job` returns `JobStatus.COMPLETED` while ~150 array tasks are still pending or running.

## Expected

`hpc wait` should block until **all** array tasks reach a terminal state.

## Suggested fix

Aggregate the statuses over all sacct rows in `Slurm.parse_status` (and analogously for `PJM.parse_status`). Sketch:

```python
def parse_status(self, output: str) -> JobStatus:
    lines = [ln.strip().rstrip("+") for ln in output.strip().splitlines() if ln.strip()]
    if not lines:
        return JobStatus.FAILED
    statuses = [_STATUS_MAP.get(s, JobStatus.FAILED) for s in lines]
    if any(s == JobStatus.PENDING for s in statuses):
        return JobStatus.PENDING
    if any(s == JobStatus.RUNNING for s in statuses):
        return JobStatus.RUNNING
    for terminal in (JobStatus.FAILED, JobStatus.CANCELLED, JobStatus.TIMEOUT):
        if any(s == terminal for s in statuses):
            return terminal
    return JobStatus.COMPLETED
```

Notes:
- Adding `-X` to `status_cmd` is helpful too — it suppresses jobsteps so each row is one array task and the aggregation is over a clean set.
- For non-array jobs, `sacct` returns one row anyway, so the aggregated logic still works.
- The PJM branch has the same shape (`lines[1].strip()`) and likely the same issue for PJM step jobs; worth a parallel fix.

## Workaround

Until this is fixed, fall back to a manual loop:

```bash
hpc exec "until [ \$(squeue -j <id> -h 2>/dev/null | wc -l) -eq 0 ]; do sleep 60; done"
```

## Environment

- hpc 0.4.0
- Scheduler: Slurm
- Array job submitted with `[slurm.options].array = "1-N%K"`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wait command returns prematurely for array jobs (parse_status only inspects first sacct line) #7

Summary

Affected code

Repro

Expected

Suggested fix

Workaround

Environment

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

wait command returns prematurely for array jobs (parse_status only inspects first sacct line) #7

Description

Summary

Affected code

Repro

Expected

Suggested fix

Workaround

Environment

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions