Skip to content

feat(jobs): per-cmd live counters + kill button on Jobs page#71

Merged
yukimemi merged 1 commit into
mainfrom
feat/jobs-live-aggregate-and-kill
May 20, 2026
Merged

feat(jobs): per-cmd live counters + kill button on Jobs page#71
yukimemi merged 1 commit into
mainfrom
feat/jobs-live-aggregate-and-kill

Conversation

@yukimemi
Copy link
Copy Markdown
Owner

Summary

PR γ of the v0.30 kill-UX trio (PR α + β still TODO — agent events.started wire + Activity Running tab). γ stands alone so it ships first; per the design discussion the right home for per-cmd aggregate counters is the Jobs catalog page, not Activity (which is per-PC).

What operators see

The Jobs page table gains a new live column. For each cmd it shows:

  • running: N (violet) — at least one PC has reported back, more results still in flight
  • pending: N (secondary) — fan-out published, no result back yet
  • when both are zero (idle)

And a new per-row Kill button (danger, Skull icon):

  • Greyed out when live.running === 0 so operators don't fire no-ops
  • Confirm dialog spells out the kill ≠ revoke distinction: "this does NOT block the next schedule tick — click revoke alongside if you want to stop new fires too"
  • Backed by the existing POST /api/jobs/{cmd_id}/kill route (v0.29 — was a silent no-op before that)

Backend

GET /api/jobs response gains live: { running, pending } per row. Aggregated via a single GROUP BY job_id query against the v0.29 executions table. serde(flatten) keeps existing Manifest fields at JSON root so the SPA's existing job.id / job.version / job.execute reads still work — live is purely additive.

Resilient: live count query failure falls back to zeros (warn! logged) so a SQLite hiccup doesn't take the whole Jobs page down.

Test plan

  • cargo test --workspace — 181 passing (3 new for fetch_live_counts)
  • cargo clippy --workspace --all-targets -- -D warnings clean
  • cargo fmt --all -- --check clean
  • tsc --noEmit SPA clean
  • Manual: fire kanade exec inventory-hw against multi-PC, observe Jobs page chip transitions pending → running → idle
  • Manual: with running execs, click Kill — verify children terminate and chip clears

🤖 Generated with Claude Code

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

Warning

Rate limit exceeded

@yukimemi has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 59 minutes and 33 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 69a67c87-fa96-4da6-b35c-64a5a3901dde

📥 Commits

Reviewing files that changed from the base of the PR and between 1cac552 and 1a4487b.

📒 Files selected for processing (2)
  • crates/kanade-backend/src/api/jobs.rs
  • crates/kanade-backend/web/src/pages/Jobs.tsx
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/jobs-live-aggregate-and-kill

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces in-flight execution counters to the Jobs page, allowing operators to monitor "running" and "pending" tasks. It also adds a "kill" action to terminate all active executions for a specific job. Review feedback recommends optimizing the backend aggregation query by filtering for relevant statuses and updating the frontend logic to ensure the "kill" button and its associated UI messages correctly account for both "running" and "pending" states.

Comment on lines +191 to +197
let rows = sqlx::query(
"SELECT job_id,
SUM(CASE WHEN status = 'running' THEN 1 ELSE 0 END) AS running,
SUM(CASE WHEN status = 'pending' THEN 1 ELSE 0 END) AS pending
FROM executions
GROUP BY job_id",
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The query in fetch_live_counts currently scans the entire executions table to perform the aggregation. As the history of executions grows, this will become a performance bottleneck for the Jobs page. Since we only care about 'running' and 'pending' states, adding a WHERE clause will significantly improve performance, especially if an index exists on the status column.

Suggested change
let rows = sqlx::query(
"SELECT job_id,
SUM(CASE WHEN status = 'running' THEN 1 ELSE 0 END) AS running,
SUM(CASE WHEN status = 'pending' THEN 1 ELSE 0 END) AS pending
FROM executions
GROUP BY job_id",
)
let rows = sqlx::query(
"SELECT job_id,
SUM(CASE WHEN status = 'running' THEN 1 ELSE 0 END) AS running,
SUM(CASE WHEN status = 'pending' THEN 1 ELSE 0 END) AS pending
FROM executions
WHERE status IN ('running', 'pending')
GROUP BY job_id",
)

<Button
variant="danger"
size="sm"
disabled={pendingKill.has(j.id) || j.live.running === 0}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The kill button is currently disabled if live.running is 0, even if there are pending executions. Since the backend kill handler (and the operator's intent) includes pending executions (those that have been published but haven't reported back yet), the button should be enabled if either count is greater than zero. This allows operators to abort a job immediately after firing it before any results land.

Suggested change
disabled={pendingKill.has(j.id) || j.live.running === 0}
disabled={pendingKill.has(j.id) || (j.live.running + j.live.pending) === 0}

Comment on lines +251 to +252
`Kill all running runs of ${j.id}?\n\n` +
`${j.live.running} run${j.live.running === 1 ? '' : 's'} currently in flight. ` +
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The confirmation message only mentions the running count, which is misleading because the kill action targets all in-flight executions, including those in the pending state. It should reflect the total number of runs that will be affected.

Suggested change
`Kill all running runs of ${j.id}?\n\n` +
`${j.live.running} run${j.live.running === 1 ? '' : 's'} currently in flight. ` +
`Kill all running runs of ${j.id}?\n\n` +
`${j.live.running + j.live.pending} run${(j.live.running + j.live.pending) === 1 ? '' : 's'} currently in flight. ` +

Comment on lines +260 to +264
title={
j.live.running === 0
? 'Nothing running for this job right now'
: `Terminate ${j.live.running} in-flight run${j.live.running === 1 ? '' : 's'}`
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The button's tooltip should also account for both running and pending executions to accurately describe what the action will terminate.

Suggested change
title={
j.live.running === 0
? 'Nothing running for this job right now'
: `Terminate ${j.live.running} in-flight run${j.live.running === 1 ? '' : 's'}`
}
title={
(j.live.running + j.live.pending) === 0
? 'Nothing running for this job right now'
: `Terminate ${j.live.running + j.live.pending} in-flight run${(j.live.running + j.live.pending) === 1 ? '' : 's'}`
}

PR γ of the v0.30 kill-UX trio (γ first since it stands alone — see
issue discussion). Surfaces "is anything running for this job right
now" on the Jobs catalog page so the operator has a decision input
for kill / revoke without drilling into Activity.

* Backend `/api/jobs` response gains a `live: { running, pending }`
  object per row, aggregated in one GROUP BY query against the v0.29
  `executions` table. `serde(flatten)` keeps the existing Manifest
  fields at JSON root so the SPA's existing `job.id` / `job.version`
  / `job.execute` reads keep working — `live` is purely additive.
* Jobs page table gains a `live` column with violet `running: N` and
  secondary `pending: N` chips. Both zero renders a muted dash so
  idle rows stay visually quiet.
* Per-row Kill button (Skull icon, danger variant) disabled when
  `live.running === 0` so the operator doesn't fire a no-op. Confirm
  dialog calls out that kill does NOT block the next schedule tick
  (revoke's job) — separation of concerns operators kept asking
  about.
* Backed by the existing v0.29 `POST /api/jobs/{cmd_id}/kill` route
  (which itself was a silent no-op pre-v0.29 — it published
  `kill.{cmd_id}` to a subject no agent subscribes to; v0.29 made it
  SELECT running execs and publish `kill.{exec_id}` per deployment).

Tests:
* `fetch_live_counts_groups_by_job_id` — verifies the GROUP BY
  partition across multiple jobs with mixed status.
* `fetch_live_counts_excludes_completed` — 'completed' execs don't
  count toward live; only 'running' + 'pending' are operationally
  in-flight.
* `fetch_live_counts_empty_when_no_executions` — empty map fallback.

Workspace tests: 181 passing (3 new).
@yukimemi yukimemi force-pushed the feat/jobs-live-aggregate-and-kill branch from db23503 to 1a4487b Compare May 20, 2026 11:36
@yukimemi
Copy link
Copy Markdown
Owner Author

Round-1 Gemini fixes applied (force-push → 1a4487b):

medium #1 (jobs.rs:197): added WHERE status IN ('running', 'pending') to the aggregation query — filters before the GROUP BY so SQLite skips the completed-execs tail instead of summing CASE-zeros across it. The caller's unwrap_or_default() fallback already handled the now-omitted empty groups.

medium #2, #3, #4 (Jobs.tsx kill button): the SPA had running === 0 everywhere, but the backend /api/jobs/{id}/kill handler SELECTs status IN ('pending', 'running') — so a just-fired but not-yet-reported-back exec IS killable. Unified all three sites (disable predicate, confirm dialog count + breakdown, tooltip) to running + pending. Confirm message now shows the breakdown explicitly: "X runs currently in flight (running: A, pending: B)".

cargo test --workspace: 181 passing. clippy + fmt + tsc clean.

@yukimemi yukimemi merged commit 024c471 into main May 20, 2026
13 checks passed
@yukimemi yukimemi deleted the feat/jobs-live-aggregate-and-kill branch May 20, 2026 11:58
yukimemi added a commit that referenced this pull request May 20, 2026
Workspace version bump 0.29.0 → 0.30.0 + 3 inter-crate kanade-shared
refs.

Ships the kill-UX two-PR sequence:

  * #71 (γ): Jobs page per-cmd live counters + kill button
  * #73 (α'): unified execution_results lifecycle + events.started
    + Activity Running filter

The original 3-part design (γ + α + β) collapsed to 2 PRs once the
two-table approach for in-flight tracking was swapped for one
unified `execution_results` with NULL finished_at. SPA changes
shrank to "add a status filter option" rather than a whole new tab.

See README.md "v0.30.0" entry for the operator-facing summary.
yukimemi added a commit that referenced this pull request May 20, 2026
)

Workspace version bump 0.29.0 → 0.30.0 + 3 inter-crate kanade-shared
refs.

Ships the kill-UX two-PR sequence:

  * #71 (γ): Jobs page per-cmd live counters + kill button
  * #73 (α'): unified execution_results lifecycle + events.started
    + Activity Running filter

The original 3-part design (γ + α + β) collapsed to 2 PRs once the
two-table approach for in-flight tracking was swapped for one
unified `execution_results` with NULL finished_at. SPA changes
shrank to "add a status filter option" rather than a whole new tab.

See README.md "v0.30.0" entry for the operator-facing summary.
yukimemi added a commit that referenced this pull request May 20, 2026
…uild (#80)

v0.30.0 release.yml ran cross-compile + `tsc -b && vite build` against
the just-tagged v0.30.0 commit and aborted at:

  src/pages/Jobs.tsx(219,25): error TS2322:
    Type "secondary" is not assignable to type
    "default" | "danger" | "success" | "violet" | "amber"

The Jobs page pending chip used `<Badge variant="secondary">`, but the
`Badge` component's variant union doesn't include `secondary` — that
is `Button`'s set. The typo had been carried since the v0.30.0 Jobs
chip was first added in #71 (γ), undetected because per-PR CI only
runs `cargo` and never built the SPA. `release.yml`'s tag-driven
SPA build was the first time `tsc -b` ran against this code on CI.

Fix: `variant="secondary"` → `variant="amber"` (one-line change in
Jobs.tsx). Amber semantically reads as "waiting in queue" which is
what `pending` means, and visually distinct from running='s
`violet`.

The v0.30.0 git tag remains pointed at the broken commit but no
GitHub Release was ever published from it (release.yml failed before
publish). v0.30.1 ships the same kill-UX features against a working
SPA build.

Follow-up tracked in README backlog: surface SPA build into per-PR
CI so this class of regression fails at PR time, not release time.

Workspace version 0.30.0 → 0.30.1 + 3 inter-crate kanade-shared refs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant