Skip to content

fix(api): defer docker startup tasks until ready#1952

Merged
elibosley merged 4 commits intomainfrom
codex/defer-docker-startup-tasks
Mar 24, 2026
Merged

fix(api): defer docker startup tasks until ready#1952
elibosley merged 4 commits intomainfrom
codex/defer-docker-startup-tasks

Conversation

@elibosley
Copy link
Member

@elibosley elibosley commented Mar 23, 2026

Summary

  • move Docker startup work out of the Nest bootstrap critical path and schedule it only after the API is listening
  • replace the @Timeout decorators with a dedicated startup task helper that logs failures without blocking readiness
  • trim the temporary bootstrap progress logs while keeping the ready-signal and deferred Docker startup behavior intact
  • add focused coverage for delayed execution, rejection logging, and missing-provider behavior

Testing

  • pnpm --dir api test src/unraid-api/graph/resolvers/docker/docker-startup-tasks.spec.ts
  • pnpm --dir api type-check

Summary by CodeRabbit

  • Chores

    • Increased application startup timeout budget (15s → 30s) to improve initialization robustness.
    • Added safeguards to prevent duplicate "process ready" signals and centralized readiness signaling.
  • New Features

    • Added a background startup task scheduler and listener to run template scanning and container digest refresh after the app is ready; failures are logged.
  • Tests

    • Added tests for startup task scheduling, execution timing, and failure logging.

- Purpose: keep PM2 startup on the critical path limited to Nest bootstrap and socket binding.
- Before: Docker template scans and digest refreshes were scheduled with Nest timeout decorators during application startup, which could consume readiness budget before PM2 saw the process as ready.
- Problem: background Docker work could make startup timing noisier and harder to diagnose during clean restarts.
- Now: the API sends the PM2 ready signal once the server is listening, then schedules Docker startup work from a dedicated helper outside the bootstrap path.
- Coverage: adds a focused Vitest spec for delayed execution, rejection logging, and missing-provider behavior.
- Purpose: trim temporary bootstrap progress logging from the Nest startup path.
- Before: the deferred Docker startup change added several step-by-step bootstrap info logs while validating startup sequencing.
- Problem: those extra lines add noise to normal startup output without changing behavior now that the readiness path is confirmed.
- Now: the API keeps the durable startup and ready-signal behavior while removing the temporary progress logs.
- Scope: this follow-up only changes logging in api/src/unraid-api/main.ts and leaves local dev config changes out of the commit.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 23, 2026

Walkthrough

Adds a centralized Docker startup task scheduler and listener, removes NestJS @Timeout decorators from Docker services, extends PM2 startup budget to 30s and adds a PM2-ready dedupe guard, emits an app-ready event after server listen, and adds tests verifying scheduling and logging behavior.

Changes

Cohort / File(s) Summary
Startup Timing Configuration
api/src/index.ts
Increased TOTAL_STARTUP_BUDGET_MS from 15,000ms to 30,000ms and updated adjacent comment.
Docker Startup Task Scheduler & Tests
api/src/unraid-api/graph/resolvers/docker/docker-startup-tasks.ts, api/src/unraid-api/graph/resolvers/docker/docker-startup-tasks.spec.ts
Added scheduleDockerStartupTasks (schedules bootstrapScan and refreshContainerDigestsAfterStartup after a delay) and DockerStartupTasksListener; test file verifies delayed execution, error logging, and no-op behavior when dependencies are missing.
Removed NestJS Timeout Decorators
api/src/unraid-api/graph/resolvers/docker/container-status.job.ts, api/src/unraid-api/graph/resolvers/docker/docker-template-scanner.service.ts
Removed @Timeout(5_000) decorators from methods; retained method logic and retry behavior but moved scheduling responsibility to the centralized scheduler.
Bootstrap & App Lifecycle Changes
api/src/unraid-api/main.ts, api/src/unraid-api/app/app-lifecycle.events.ts
Added APP_READY_EVENT constant and AppReadyEvent interface; introduced signalProcessReady (PM2-ready dedupe via Symbol.for(...)), switched some imports to dynamic Promise.all runtime imports, and emit APP_READY_EVENT after server listen.
Module Registration
api/src/unraid-api/graph/resolvers/docker/docker.module.ts
Registered DockerStartupTasksListener in the Docker module providers array.

Sequence Diagram

sequenceDiagram
    participant Bootstrap as Bootstrap Process
    participant App as NestJS App
    participant EventBus as EventEmitter2
    participant Listener as DockerStartupTasksListener
    participant Scheduler as scheduleDockerStartupTasks
    participant Scanner as DockerTemplateScannerService
    participant Job as ContainerStatusJob

    Bootstrap->>App: start & listen
    App-->>Bootstrap: server listening
    Bootstrap->>EventBus: emit APP_READY_EVENT (app.ready)
    EventBus->>Listener: handleAppReady
    Listener->>Scheduler: schedule tasks (delay)
    Note over Scheduler: setTimeout(delay)
    par Delayed parallel execution
        Scheduler->>Scanner: invoke bootstrapScan()
        Scanner-->>Scheduler: completes / throws (caught & logged)
        Scheduler->>Job: invoke refreshContainerDigestsAfterStartup()
        Job-->>Scheduler: completes / throws (caught & logged)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I nudged the startup, set timers to wait,
Kept PM2 quiet — no double-ready state.
Tasks snooze a while, then wake in a pair,
Logs catch the hiccups; the bunny's aware. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: deferring Docker startup tasks until the application is ready, which aligns with the PR's core objective of moving Docker work out of the bootstrap critical path.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/defer-docker-startup-tasks

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Mar 23, 2026

Codecov Report

❌ Patch coverage is 51.11111% with 44 lines in your changes missing coverage. Please review.
✅ Project coverage is 51.74%. Comparing base (8f4ea98) to head (4f87811).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
api/src/unraid-api/main.ts 0.00% 41 Missing ⚠️
...api/graph/resolvers/docker/docker-startup-tasks.ts 95.45% 2 Missing ⚠️
api/src/index.ts 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1952   +/-   ##
=======================================
  Coverage   51.74%   51.74%           
=======================================
  Files        1026     1028    +2     
  Lines       70725    70792   +67     
  Branches     7890     7884    -6     
=======================================
+ Hits        36595    36630   +35     
- Misses      34007    34039   +32     
  Partials      123      123           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Purpose: move deferred Docker startup work behind a Nest domain event instead of triggering it directly from main.ts.
- Before: main.ts signaled readiness and then manually imported the Docker startup scheduler.
- Problem: the startup flow worked, but the orchestration lived outside the Docker module and felt less idiomatic to Nest.
- Now: main.ts emits an app-ready event after listen(), and a Docker listener provider reacts with @onevent to schedule the delayed scan and digest refresh.
- How: added an app lifecycle event constant, registered a Docker startup listener provider, kept the scheduling helper testable, and updated the spec to cover both the helper and the event-driven listener path.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
api/src/unraid-api/graph/resolvers/docker/docker-startup-tasks.ts (1)

8-15: Make NestAppGetter.get() generic so Line 24 can drop the cast.

Returning unknown here is what forces as T in getOptionalProvider. A generic get<T>(typeOrToken: Type<T>) => T keeps the provider contract intact and avoids weakening the helper right at the DI boundary.

♻️ Proposed refactor
 interface NestAppGetter {
-    get(
-        typeOrToken: unknown,
+    get<T>(
+        typeOrToken: Type<T>,
         options?: {
             strict?: boolean;
         }
-    ): unknown;
+    ): T;
 }
@@
 const getOptionalProvider = <T>(app: NestAppGetter, token: Type<T>): T | null => {
     try {
-        return app.get(token, { strict: false }) as T;
+        return app.get(token, { strict: false });
     } catch {
         return null;
     }
 };

As per coding guidelines, "Avoid using casting whenever possible, prefer proper typing from the start".

Also applies to: 22-25

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/src/unraid-api/graph/resolvers/docker/docker-startup-tasks.ts` around
lines 8 - 15, The NestAppGetter.get signature returns unknown which forces a
cast in getOptionalProvider; change NestAppGetter to be generic (e.g.,
get<T>(typeOrToken: Type<T>, options?: { strict?: boolean }): T | undefined) so
callers like getOptionalProvider can drop the `as T` cast and preserve DI typing
integrity; update any call sites of NestAppGetter.get (notably the
getOptionalProvider helper) to use the new generic form and adjust
return/nullable types accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/src/unraid-api/graph/resolvers/docker/docker-startup-tasks.spec.ts`:
- Line 1: Add an afterEach hook in each top-level describe block that calls
vi.useRealTimers() to guarantee timers are restored even if a test fails; locate
the two describe blocks in docker-startup-tasks.spec.ts (the blocks containing
tests that call vi.useFakeTimers()) and insert afterEach(() => {
vi.useRealTimers(); }) inside each block (alongside or after existing tests that
currently call vi.useRealTimers() manually) so fake timers are always cleaned
up.

In `@api/src/unraid-api/main.ts`:
- Around line 27-39: The ready-sent guard is set unconditionally even if
process.send('ready') fails or returns false; update the code around
process.send to wrap the call in a try/catch and check its boolean return value,
only setting readySignalState[READY_SIGNAL_SENT_SYMBOL] and calling
apiLogger.info when process.send returns a truthy value, and on failure (false
return or caught error) log the error via apiLogger.warn or apiLogger.error
(including the caught exception) and return false so the send can be retried
later; reference process.send, readySignalState[READY_SIGNAL_SENT_SYMBOL],
READY_SIGNAL_SENT_SYMBOL, and apiLogger.info/apiLogger.warn in your fix.

---

Nitpick comments:
In `@api/src/unraid-api/graph/resolvers/docker/docker-startup-tasks.ts`:
- Around line 8-15: The NestAppGetter.get signature returns unknown which forces
a cast in getOptionalProvider; change NestAppGetter to be generic (e.g.,
get<T>(typeOrToken: Type<T>, options?: { strict?: boolean }): T | undefined) so
callers like getOptionalProvider can drop the `as T` cast and preserve DI typing
integrity; update any call sites of NestAppGetter.get (notably the
getOptionalProvider helper) to use the new generic form and adjust
return/nullable types accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 8544651e-7138-49c4-94d7-8233151a8c60

📥 Commits

Reviewing files that changed from the base of the PR and between 5be53a4 and 09937c2.

📒 Files selected for processing (6)
  • api/src/index.ts
  • api/src/unraid-api/graph/resolvers/docker/container-status.job.ts
  • api/src/unraid-api/graph/resolvers/docker/docker-startup-tasks.spec.ts
  • api/src/unraid-api/graph/resolvers/docker/docker-startup-tasks.ts
  • api/src/unraid-api/graph/resolvers/docker/docker-template-scanner.service.ts
  • api/src/unraid-api/main.ts
💤 Files with no reviewable changes (1)
  • api/src/unraid-api/graph/resolvers/docker/docker-template-scanner.service.ts

@@ -0,0 +1,97 @@
import { describe, expect, it, vi } from 'vitest';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n api/src/unraid-api/graph/resolvers/docker/docker-startup-tasks.spec.ts

Repository: unraid/api

Length of output: 5004


Add afterEach hooks to restore real timers in both describe blocks.

Test isolation risk: if any test fails before vi.useRealTimers() executes, fake timers remain active for subsequent tests. Both describe blocks (lines 13–87 and 89–122) have tests using vi.useFakeTimers() without centralized teardown. Use afterEach to ensure reliable cleanup.

  • First describe block: Tests at lines 14–42 and 44–75 restore at lines 41 and 74
  • Second describe block: Test at lines 90–122 restores at line 121
Proposed fix
-import { describe, expect, it, vi } from 'vitest';
+import { afterEach, describe, expect, it, vi } from 'vitest';
 
 describe('scheduleDockerStartupTasks', () => {
+    afterEach(() => {
+        vi.useRealTimers();
+    });
+
     it('schedules docker startup work after the provided delay', async () => {
         vi.useFakeTimers();
         // ... test body ...
-        vi.useRealTimers();
     });
 
     it('warns when a background docker startup task rejects', async () => {
         vi.useFakeTimers();
         // ... test body ...
-        vi.useRealTimers();
     });
 });
 
 describe('DockerStartupTasksListener', () => {
+    afterEach(() => {
+        vi.useRealTimers();
+    });
+
     it('schedules docker startup work when the app ready event is emitted', async () => {
         vi.useFakeTimers();
         // ... test body ...
-        vi.useRealTimers();
     });
 });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/src/unraid-api/graph/resolvers/docker/docker-startup-tasks.spec.ts` at
line 1, Add an afterEach hook in each top-level describe block that calls
vi.useRealTimers() to guarantee timers are restored even if a test fails; locate
the two describe blocks in docker-startup-tasks.spec.ts (the blocks containing
tests that call vi.useFakeTimers()) and insert afterEach(() => {
vi.useRealTimers(); }) inside each block (alongside or after existing tests that
currently call vi.useRealTimers() manually) so fake timers are always cleaned
up.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
api/src/unraid-api/main.ts (1)

39-41: ⚠️ Potential issue | 🟠 Major

Only mark PM2 ready as sent after confirming IPC send success.

On Line 39, process.send('ready') can fail (return false) or throw; Line 40 currently sets the guard unconditionally, which can suppress retries and leave PM2 waiting.

🛡️ Suggested fix
-    process.send('ready');
-    readySignalState[READY_SIGNAL_SENT_SYMBOL] = true;
-    apiLogger.info('Sent PM2 ready signal while %s', reason);
-    return true;
+    try {
+        const sent = process.send('ready');
+        if (sent === false) {
+            apiLogger.warn('PM2 ready signal was not accepted while %s', reason);
+            return false;
+        }
+        readySignalState[READY_SIGNAL_SENT_SYMBOL] = true;
+        apiLogger.info('Sent PM2 ready signal while %s', reason);
+        return true;
+    } catch (error: unknown) {
+        apiLogger.warn(error, 'Failed to send PM2 ready signal while %s', reason);
+        return false;
+    }
Node.js process.send() behavior: does it return false when IPC is unavailable/backpressured, and can it throw synchronously? Please answer with the official Node.js docs link.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/src/unraid-api/main.ts` around lines 39 - 41, Wrap the
process.send('ready') call in a try/catch and check its boolean return; only set
readySignalState[READY_SIGNAL_SENT_SYMBOL] = true after process.send('ready')
returns truthy, and on exception or falsy return log via apiLogger.error
(including the thrown error or a descriptive message) and avoid setting the
READY_SIGNAL_SENT_SYMBOL so retries can occur; update the apiLogger.info('Sent
PM2 ready signal while %s', reason) to run only after confirming send success.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@api/src/unraid-api/main.ts`:
- Around line 39-41: Wrap the process.send('ready') call in a try/catch and
check its boolean return; only set readySignalState[READY_SIGNAL_SENT_SYMBOL] =
true after process.send('ready') returns truthy, and on exception or falsy
return log via apiLogger.error (including the thrown error or a descriptive
message) and avoid setting the READY_SIGNAL_SENT_SYMBOL so retries can occur;
update the apiLogger.info('Sent PM2 ready signal while %s', reason) to run only
after confirming send success.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 51e2b9eb-9f82-48ed-bc59-676ada8ea99a

📥 Commits

Reviewing files that changed from the base of the PR and between 09937c2 and 90f3772.

📒 Files selected for processing (5)
  • api/src/unraid-api/app/app-lifecycle.events.ts
  • api/src/unraid-api/graph/resolvers/docker/docker-startup-tasks.spec.ts
  • api/src/unraid-api/graph/resolvers/docker/docker-startup-tasks.ts
  • api/src/unraid-api/graph/resolvers/docker/docker.module.ts
  • api/src/unraid-api/main.ts
✅ Files skipped from review due to trivial changes (1)
  • api/src/unraid-api/graph/resolvers/docker/docker.module.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • api/src/unraid-api/graph/resolvers/docker/docker-startup-tasks.spec.ts
  • api/src/unraid-api/graph/resolvers/docker/docker-startup-tasks.ts

- Purpose: make PM2 ready signaling resilient when the IPC channel rejects or throws during process.send.
- Before: the code marked READY_SIGNAL_SENT_SYMBOL as sent and logged success even if process.send('ready') failed.
- Problem: a false return or thrown exception could permanently suppress future ready-signal retries and misreport startup state.
- Now: the code wraps process.send in try/catch, only sets the sent guard after a truthy return, and logs warn/error on failure paths.
- How: signalProcessReady checks the boolean return value, preserves retryability by returning false on failure, and keeps apiLogger.info only on confirmed success.
@github-actions
Copy link
Contributor

This plugin has been deployed to Cloudflare R2 and is available for testing.
Download it at this URL:

https://preview.dl.unraid.net/unraid-api/tag/PR1952/dynamix.unraid.net.plg

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
api/src/unraid-api/main.ts (1)

195-198: Wrap APP_READY_EVENT emission to prevent bootstrap failure from non-critical deferred work.

Line 198 runs after listen() and PM2 signal completion. If app.get(EventEmitter2) fails to resolve or a listener throws synchronously, bootstrapNestServer() rejects from work that should be best-effort. Catch and log failures instead.

Suggested fix
-    app.get(EventEmitter2).emit(APP_READY_EVENT, appReadyEvent);
+    try {
+        app.get(EventEmitter2).emit(APP_READY_EVENT, appReadyEvent);
+    } catch (error: unknown) {
+        apiLogger.error(error, 'Failed to emit %s', APP_READY_EVENT);
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/src/unraid-api/main.ts` around lines 195 - 198, The APP_READY_EVENT
emission in bootstrapNestServer() is doing non-critical work synchronously and
may reject bootstrap if app.get(EventEmitter2) fails or a listener throws; wrap
the retrieval and emit in a try/catch and log failures instead of letting them
bubble. Concretely, around the code that constructs AppReadyEvent and calls
app.get(EventEmitter2).emit(APP_READY_EVENT, appReadyEvent) catch errors from
app.get(EventEmitter2) and from emit, use the existing logger (or
app.get(Logger) / console) to warn or debug the error, and ensure the function
continues without throwing so bootstrap completes even if deferred listeners
fail.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@api/src/unraid-api/main.ts`:
- Around line 195-198: The APP_READY_EVENT emission in bootstrapNestServer() is
doing non-critical work synchronously and may reject bootstrap if
app.get(EventEmitter2) fails or a listener throws; wrap the retrieval and emit
in a try/catch and log failures instead of letting them bubble. Concretely,
around the code that constructs AppReadyEvent and calls
app.get(EventEmitter2).emit(APP_READY_EVENT, appReadyEvent) catch errors from
app.get(EventEmitter2) and from emit, use the existing logger (or
app.get(Logger) / console) to warn or debug the error, and ensure the function
continues without throwing so bootstrap completes even if deferred listeners
fail.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: dd80aff9-6c26-46aa-b6ab-884d22785aa3

📥 Commits

Reviewing files that changed from the base of the PR and between 90f3772 and 4f87811.

📒 Files selected for processing (1)
  • api/src/unraid-api/main.ts

@elibosley elibosley merged commit 0e004a7 into main Mar 24, 2026
12 of 13 checks passed
@elibosley elibosley deleted the codex/defer-docker-startup-tasks branch March 24, 2026 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant