-
Notifications
You must be signed in to change notification settings - Fork 13
fix: improve API startup reliability with timeout budget tracking #1824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughA startup budgeting and timeout management system is introduced with two new utility classes: TimeoutBudget (for tracking cumulative time budgets across operations) and withTimeout (for promise timeout wrapping). The main API startup flow is refactored to use these utilities for controlled, time-bounded initialization with structured error handling. Version bumped to 4.27.2. Changes
Sequence DiagramsequenceDiagram
participant Startup as Startup Flow
participant Budget as TimeoutBudget
participant ConfigDir as Config Directory
participant State as Emhttp State
participant Key as Registration Key
participant Dynamix as Dynamix Config
participant Manager as State Manager
participant Watch as Key Watch
participant Bootstrap as NestJS Bootstrap
participant Server as Server
Startup->>Budget: new TimeoutBudget(TOTAL_STARTUP_BUDGET_MS)
rect rgb(200, 240, 200)
Note over Startup,Watch: Pre-Bootstrap Operations (with fallbacks)
Startup->>ConfigDir: mkdir with logging
ConfigDir-->>Startup: success or error
Startup->>State: load emhttp state
State-->>Startup: state loaded
Startup->>Key: load registration key
Key-->>Startup: key loaded
Startup->>Dynamix: load dynamix config
Dynamix-->>Startup: config loaded
Startup->>Manager: initialize state manager
Manager-->>Startup: manager ready
Startup->>Watch: start key watch
Watch-->>Startup: watching
end
rect rgb(220, 220, 240)
Note over Startup,Server: Bootstrap with Budget Constraint
Startup->>Budget: getTimeout(MAX_OPERATION_TIMEOUT_MS, BOOTSTRAP_RESERVED_MS)
Budget-->>Startup: remaining timeout
Startup->>Bootstrap: withTimeout(nestFastify, timeout, "NestJS Bootstrap")
Bootstrap->>Server: initialize server
Server-->>Bootstrap: app created
Bootstrap-->>Startup: app resolved within budget
end
rect rgb(240, 220, 220)
Note over Startup,Server: Graceful Shutdown
Startup->>Server: asyncExitHook listener
Server-->>Startup: exit with logging
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1824 +/- ##
==========================================
- Coverage 52.04% 52.04% -0.01%
==========================================
Files 874 876 +2
Lines 50372 50509 +137
Branches 5017 5023 +6
==========================================
+ Hits 26214 26285 +71
- Misses 24083 24149 +66
Partials 75 75 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- Move TimeoutBudget class to src/core/utils/misc/timeout-budget.ts - Move withTimeout function to src/core/utils/misc/with-timeout.ts - Add comprehensive unit tests for both utilities (31 tests) - Update index.ts to import from new locations 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
This plugin has been deployed to Cloudflare R2 and is available for testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
api/src/core/utils/misc/with-timeout.ts (1)
16-24: Timer not cleared after promise settles.The
setTimeoutcontinues running even after the original promise resolves. For typical startup operations with short timeouts this is acceptable, but be aware that for long timeouts or high-frequency usage, the pending timer could accumulate.If you want to clean up the timer:
export const withTimeout = <T>( promise: Promise<T>, timeoutMs: number, operationName: string ): Promise<T> => { + let timeoutId: ReturnType<typeof setTimeout>; + const timeoutPromise = new Promise<never>((_, reject) => { + timeoutId = setTimeout( + () => reject(new Error(`${operationName} timed out after ${timeoutMs}ms`)), + timeoutMs + ); + }); return Promise.race([ - promise, - new Promise<never>((_, reject) => - setTimeout( - () => reject(new Error(`${operationName} timed out after ${timeoutMs}ms`)), - timeoutMs - ) - ), - ]); + promise.finally(() => clearTimeout(timeoutId)), + timeoutPromise, + ]); };api/src/core/utils/misc/__test__/with-timeout.test.ts (1)
18-28: Testing exact error message strings.Per coding guidelines, prefer
.rejects.toThrow()without arguments unless the message format is specifically what you're testing. The test at lines 23-28 explicitly tests that the operation name appears in the error message, which is valid. However, line 20 duplicates this intent—consider simplifying it.it('rejects when promise takes longer than timeout', async () => { const promise = new Promise<string>((resolve) => setTimeout(() => resolve('late'), 500)); - await expect(withTimeout(promise, 50, 'slowOp')).rejects.toThrow('slowOp timed out after 50ms'); + await expect(withTimeout(promise, 50, 'slowOp')).rejects.toThrow(); });
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
api/dev/configs/api.json(1 hunks)api/src/core/utils/misc/__test__/timeout-budget.test.ts(1 hunks)api/src/core/utils/misc/__test__/with-timeout.test.ts(1 hunks)api/src/core/utils/misc/timeout-budget.ts(1 hunks)api/src/core/utils/misc/with-timeout.ts(1 hunks)api/src/index.ts(6 hunks)
🧰 Additional context used
📓 Path-based instructions (8)
api/**/*
📄 CodeRabbit inference engine (CLAUDE.md)
Prefer adding new files to the NestJS repo located at
api/src/unraid-api/instead of the legacy code
Files:
api/dev/configs/api.jsonapi/src/index.tsapi/src/core/utils/misc/__test__/timeout-budget.test.tsapi/src/core/utils/misc/__test__/with-timeout.test.tsapi/src/core/utils/misc/with-timeout.tsapi/src/core/utils/misc/timeout-budget.ts
**/*
📄 CodeRabbit inference engine (.cursor/rules/default.mdc)
Never add comments unless they are needed for clarity of function
Files:
api/dev/configs/api.jsonapi/src/index.tsapi/src/core/utils/misc/__test__/timeout-budget.test.tsapi/src/core/utils/misc/__test__/with-timeout.test.tsapi/src/core/utils/misc/with-timeout.tsapi/src/core/utils/misc/timeout-budget.ts
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx,js,jsx}: Always use TypeScript imports with.jsextensions for ESM compatibility
Never add comments unless they are needed for clarity of function
Never add comments for obvious things, and avoid commenting when starting and ending code blocks
Files:
api/src/index.tsapi/src/core/utils/misc/__test__/timeout-budget.test.tsapi/src/core/utils/misc/__test__/with-timeout.test.tsapi/src/core/utils/misc/with-timeout.tsapi/src/core/utils/misc/timeout-budget.ts
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: Never use theanytype. Always prefer proper typing
Avoid using casting whenever possible, prefer proper typing from the start
Files:
api/src/index.tsapi/src/core/utils/misc/__test__/timeout-budget.test.tsapi/src/core/utils/misc/__test__/with-timeout.test.tsapi/src/core/utils/misc/with-timeout.tsapi/src/core/utils/misc/timeout-budget.ts
api/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
cache-manager v7 expects TTL values in milliseconds, not seconds (e.g., 600000 for 10 minutes, not 600)
Files:
api/src/index.tsapi/src/core/utils/misc/__test__/timeout-budget.test.tsapi/src/core/utils/misc/__test__/with-timeout.test.tsapi/src/core/utils/misc/with-timeout.tsapi/src/core/utils/misc/timeout-budget.ts
**/*.test.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.test.{ts,tsx,js,jsx}: Use VITEST for test suite, not jest
Use.rejects.toThrow()without arguments to test that functions throw errors, not exact error message strings
Files:
api/src/core/utils/misc/__test__/timeout-budget.test.tsapi/src/core/utils/misc/__test__/with-timeout.test.ts
api/**/*.test.{ts,tsx}
📄 CodeRabbit inference engine (.cursor/rules/api-rules.mdc)
api/**/*.test.{ts,tsx}: Use Vitest for the test suite, not Jest
Prefer to not mock simple dependencies
For error testing, use.rejects.toThrow()without arguments; do not test exact error message strings unless the message format is specifically what you're testing
Files:
api/src/core/utils/misc/__test__/timeout-budget.test.tsapi/src/core/utils/misc/__test__/with-timeout.test.ts
**/*.test.ts
📄 CodeRabbit inference engine (.cursor/rules/web-testing-rules.mdc)
**/*.test.ts: Use.rejects.toThrow()without arguments to test that functions throw errors. Don't test exact error message strings unless the message format is specifically what you're testing
Test what the code does, not implementation details like exact error message wording
Mock external services and API calls
Usevi.mock()for module-level mocks
Specify return values for component methods withvi.spyOn()
Reset mocks between tests withvi.clearAllMocks()
Always await async operations before making assertions
Files:
api/src/core/utils/misc/__test__/timeout-budget.test.tsapi/src/core/utils/misc/__test__/with-timeout.test.ts
🧠 Learnings (14)
📚 Learning: 2025-04-21T18:44:39.643Z
Learnt from: elibosley
Repo: unraid/api PR: 1352
File: packages/unraid-api-plugin-connect/src/config.persistence.ts:0-0
Timestamp: 2025-04-21T18:44:39.643Z
Learning: When working with file operations in Node.js, prefer using the native Promise-based APIs from the fs/promises module instead of callback-based APIs from fs or manually wrapping callbacks in Promises.
Applied to files:
api/src/index.ts
📚 Learning: 2025-03-13T16:17:55.820Z
Learnt from: elibosley
Repo: unraid/api PR: 1211
File: api/src/unraid-api/graph/connect/connect-settings.service.ts:48-53
Timestamp: 2025-03-13T16:17:55.820Z
Learning: In NestJS services, dynamic imports should be wrapped in try/catch blocks with proper error logging and fallback handling to prevent unhandled rejections from crashing the application.
Applied to files:
api/src/index.ts
📚 Learning: 2025-11-24T17:51:46.348Z
Learnt from: CR
Repo: unraid/api PR: 0
File: .cursor/rules/api-rules.mdc:0-0
Timestamp: 2025-11-24T17:51:46.348Z
Learning: Applies to api/**/*.test.{ts,tsx} : Use Vitest for the test suite, not Jest
Applied to files:
api/src/core/utils/misc/__test__/timeout-budget.test.tsapi/src/core/utils/misc/__test__/with-timeout.test.ts
📚 Learning: 2025-11-24T17:52:26.908Z
Learnt from: CR
Repo: unraid/api PR: 0
File: .cursor/rules/web-testing-rules.mdc:0-0
Timestamp: 2025-11-24T17:52:26.908Z
Learning: Applies to **/__test__/store/**/*.ts : Test async operations completely in Pinia store tests
Applied to files:
api/src/core/utils/misc/__test__/timeout-budget.test.tsapi/src/core/utils/misc/__test__/with-timeout.test.ts
📚 Learning: 2025-11-24T17:52:26.907Z
Learnt from: CR
Repo: unraid/api PR: 0
File: .cursor/rules/web-testing-rules.mdc:0-0
Timestamp: 2025-11-24T17:52:26.907Z
Learning: Applies to **/__test__/components/**/*.ts : Test component behavior and output, not implementation details
Applied to files:
api/src/core/utils/misc/__test__/timeout-budget.test.tsapi/src/core/utils/misc/__test__/with-timeout.test.ts
📚 Learning: 2025-11-24T17:52:26.907Z
Learnt from: CR
Repo: unraid/api PR: 0
File: .cursor/rules/web-testing-rules.mdc:0-0
Timestamp: 2025-11-24T17:52:26.907Z
Learning: This is a Nuxt.js app but tests are run with vitest outside of the Nuxt environment
Applied to files:
api/src/core/utils/misc/__test__/timeout-budget.test.tsapi/src/core/utils/misc/__test__/with-timeout.test.ts
📚 Learning: 2025-11-24T17:52:26.908Z
Learnt from: CR
Repo: unraid/api PR: 0
File: .cursor/rules/web-testing-rules.mdc:0-0
Timestamp: 2025-11-24T17:52:26.908Z
Learning: Applies to **/*.test.ts : Always await async operations before making assertions
Applied to files:
api/src/core/utils/misc/__test__/with-timeout.test.ts
📚 Learning: 2025-11-24T17:52:26.907Z
Learnt from: CR
Repo: unraid/api PR: 0
File: .cursor/rules/web-testing-rules.mdc:0-0
Timestamp: 2025-11-24T17:52:26.907Z
Learning: Applies to **/__test__/components/**/*.ts : Use `flushPromises()` for complex promise chains in Vue component tests
Applied to files:
api/src/core/utils/misc/__test__/with-timeout.test.ts
📚 Learning: 2025-11-24T17:52:26.908Z
Learnt from: CR
Repo: unraid/api PR: 0
File: .cursor/rules/web-testing-rules.mdc:0-0
Timestamp: 2025-11-24T17:52:26.908Z
Learning: Applies to **/__test__/store/**/*.ts : Verify proper error handling in Pinia store tests
Applied to files:
api/src/core/utils/misc/__test__/with-timeout.test.ts
📚 Learning: 2025-11-24T17:51:37.915Z
Learnt from: CR
Repo: unraid/api PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T17:51:37.915Z
Learning: Applies to **/*.test.{ts,tsx,js,jsx} : Use VITEST for test suite, not jest
Applied to files:
api/src/core/utils/misc/__test__/with-timeout.test.ts
📚 Learning: 2025-11-24T17:52:26.908Z
Learnt from: CR
Repo: unraid/api PR: 0
File: .cursor/rules/web-testing-rules.mdc:0-0
Timestamp: 2025-11-24T17:52:26.908Z
Learning: Applies to **/__test__/store/**/*.ts : Mock external dependencies appropriately in Pinia store tests
Applied to files:
api/src/core/utils/misc/__test__/with-timeout.test.ts
📚 Learning: 2025-11-24T17:52:26.907Z
Learnt from: CR
Repo: unraid/api PR: 0
File: .cursor/rules/web-testing-rules.mdc:0-0
Timestamp: 2025-11-24T17:52:26.907Z
Learning: Applies to **/__test__/components/**/*.ts : Use `await nextTick()` for DOM updates in Vue component tests
Applied to files:
api/src/core/utils/misc/__test__/with-timeout.test.ts
📚 Learning: 2025-11-24T17:52:26.907Z
Learnt from: CR
Repo: unraid/api PR: 0
File: .cursor/rules/web-testing-rules.mdc:0-0
Timestamp: 2025-11-24T17:52:26.907Z
Learning: Applies to **/*.test.ts : Mock external services and API calls
Applied to files:
api/src/core/utils/misc/__test__/with-timeout.test.ts
📚 Learning: 2025-09-14T14:07:17.439Z
Learnt from: elibosley
Repo: unraid/api PR: 1702
File: web/__test__/components/Wrapper/mount-engine.test.ts:135-138
Timestamp: 2025-09-14T14:07:17.439Z
Learning: vi.waitFor() is a valid Vitest utility function that waits for a callback to execute successfully, continuing to wait until it succeeds or times out. It's useful for waiting for asynchronous operations like DOM updates after component mounting.
Applied to files:
api/src/core/utils/misc/__test__/with-timeout.test.ts
🧬 Code graph analysis (1)
api/src/core/utils/misc/__test__/with-timeout.test.ts (1)
api/src/core/utils/misc/with-timeout.ts (1)
withTimeout(11-25)
🔇 Additional comments (11)
api/dev/configs/api.json (1)
2-2: LGTM!Version bump to 4.27.2 aligns with the introduction of startup budget tracking utilities.
api/src/core/utils/misc/__test__/with-timeout.test.ts (2)
30-33: LGTM!Testing that original promise rejections propagate correctly is important for ensuring the wrapper doesn't mask underlying errors.
52-64: LGTM!Good type preservation test ensuring the generic type parameter works correctly.
api/src/core/utils/misc/timeout-budget.ts (1)
23-70: LGTM!The
TimeoutBudgetclass provides a clean abstraction for managing cumulative time budgets. The 100ms minimum floor ingetTimeoutis a reasonable safeguard against instant failures, and the reserve mechanism allows proper allocation for critical operations like bootstrap.api/src/core/utils/misc/__test__/timeout-budget.test.ts (2)
5-12: LGTM!Proper setup/teardown of fake timers ensures deterministic testing of time-dependent behavior.
165-229: LGTM!Excellent integration tests that simulate realistic startup sequences with the same budget configuration used in production (13s total, 8s bootstrap reserve, 2s max operation). The worst-case scenario test validates that bootstrap time is preserved even when all operations timeout.
api/src/index.ts (5)
33-39: LGTM!Well-chosen budget configuration: 13s total ensures timeout triggers before PM2's 15s kill window, 8s reserve for NestJS bootstrap is reasonable, and 2s per operation prevents any single pre-bootstrap step from consuming the budget.
145-160: LGTM!Good use of
budget.remaining()to give bootstrap all remaining time, with a warning if less than 1 second remains. The dynamic import of@app/unraid-api/main.jskeeps the startup path flexible.
187-189: LGTM!The
throw new Error('Unreachable')aftergracefulExit(1)is a standard pattern to satisfy TypeScript's control flow analysis, sincegracefulExitcallsprocess.exitbut isn't typed asnever.
131-138: VerifysetupRegistrationKeyWatchis synchronous.This function is not wrapped with a timeout. If it performs async file system operations or blocking I/O, it could contribute to startup hangs.
122-129: VerifyStateManager.getInstance()is synchronous.Unlike
loadStateFilesandloadRegistrationKey, this call is not wrapped with a timeout. IfStateManager.getInstance()performs async initialization or file I/O internally, it could still hang during startup. Confirm this is intentional.
Summary
Background
A user reported being unable to start their array on v7.2-beta.1 due to the API failing to start. The root cause was a leftover
dynamix.my.serversfolder from a previously uninstalled Connect plugin. The API would hang during startup with no error messages, and PM2 would eventually kill it after 15 seconds with no diagnostic information.Original syslog:
Solution
Startup Budget Tracking
Instead of fixed timeouts per operation (which could exceed PM2's 15-second limit in aggregate), we now track a total startup budget:
The
StartupBudgetclass dynamically calculates timeouts based on remaining time, ensuring we never exceed PM2's limit and always provide clear timeout messages.Graceful Degradation
Non-critical operations now fail gracefully with warnings instead of crashing:
loadStateFiles()- continues with default stateloadRegistrationKey()- continues without registration keyloadDynamixConfig()- continues with default configStateManager- continues without file watchingsetupRegistrationKeyWatch()- continues without key watchingCritical operations still fail fast:
Improved Logging
Each startup phase now logs its completion, making it easy to identify where hangs occur:
Test plan
Summary by CodeRabbit
Chores
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.