Skip to content

Conversation

@nicktrn
Copy link
Collaborator

@nicktrn nicktrn commented Sep 16, 2025

Replays tend to go through

@changeset-bot
Copy link

changeset-bot bot commented Sep 16, 2025

⚠️ No Changeset found

Latest commit: b6cba90

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 16, 2025

Walkthrough

The change updates the retry logic in packages/core/src/v3/errors.ts by modifying shouldLookupRetrySettings. It now returns true for INTERNAL_ERROR when the error code is TASK_PROCESS_SIGSEGV, in addition to the existing cases TASK_PROCESS_EXITED_WITH_NON_ZERO_CODE and TASK_PROCESS_SIGTERM. No other logic, function signatures, or exports are altered.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description only contains the single line "Replays tend to go through" and does not follow the repository template; it is missing the "Closes #" line, the checklist, detailed Testing steps, a Changelog entry, and any screenshots or logs needed for review, so reviewers cannot verify scope, intent, or testing. Please update the PR description to match the repository template: add "Closes #" if applicable, complete the checklist, and provide detailed testing steps and observed results demonstrating that replays now succeed. Include a concise changelog entry, any relevant logs or screenshots, and a brief rationale referencing the changed file(s) (e.g., why SIGSEGV should be retried); then request a re-review.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title "fix(run-engine): retry SIGSEGV errors" succinctly and accurately summarizes the primary change in the changeset — making SIGSEGV (TASK_PROCESS_SIGSEGV) retryable in the run engine's error handling; it follows the repo's conventional prefix and is specific enough for a teammate scanning history to understand the main intent.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/retry-sigsegv

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/core/src/v3/errors.ts (1)

285-333: Blocking: SIGSEGV is still marked non‑retryable in shouldRetryError, undermining the PR.

TASK_PROCESS_SIGSEGV is listed in the “return false” block, so we won't actually retry even if we look up retry settings. Move it to the retryable list.

Apply this diff:

 case "INTERNAL_ERROR": {
   switch (error.code) {
     case "COULD_NOT_FIND_EXECUTOR":
     case "COULD_NOT_FIND_TASK":
     case "COULD_NOT_IMPORT_TASK":
     case "CONFIGURED_INCORRECTLY":
     case "TASK_ALREADY_RUNNING":
     case "TASK_PROCESS_SIGKILL_TIMEOUT":
-    case "TASK_PROCESS_SIGSEGV":
     case "TASK_PROCESS_OOM_KILLED":
     case "TASK_PROCESS_MAYBE_OOM_KILLED":
     case "TASK_RUN_CANCELLED":
     case "MAX_DURATION_EXCEEDED":
     case "DISK_SPACE_EXCEEDED":
     case "OUTDATED_SDK_VERSION":
     case "TASK_RUN_HEARTBEAT_TIMEOUT":
     case "TASK_DID_CONCURRENT_WAIT":
     case "RECURSIVE_WAIT_DEADLOCK":
     // run engine errors
     case "TASK_DEQUEUED_INVALID_STATE":
     case "TASK_DEQUEUED_QUEUE_NOT_FOUND":
-    case "TASK_HAS_N0_EXECUTION_SNAPSHOT":
+    case "TASK_HAS_N0_EXECUTION_SNAPSHOT":
     case "TASK_RUN_DEQUEUED_MAX_RETRIES":
       return false;

     //new heartbeat error
     //todo
     case "TASK_RUN_STALLED_EXECUTING":
     case "TASK_RUN_STALLED_EXECUTING_WITH_WAITPOINTS":
     case "GRACEFUL_EXIT_TIMEOUT":
     case "HANDLE_ERROR_ERROR":
     case "TASK_INPUT_ERROR":
     case "TASK_OUTPUT_ERROR":
     case "TASK_MIDDLEWARE_ERROR":
     case "POD_EVICTED":
     case "POD_UNKNOWN_ERROR":
     case "TASK_EXECUTION_ABORTED":
     case "TASK_EXECUTION_FAILED":
     case "TASK_RUN_CRASHED":
     case "TASK_PROCESS_EXITED_WITH_NON_ZERO_CODE":
     case "TASK_PROCESS_SIGTERM":
+    case "TASK_PROCESS_SIGSEGV":
       return true;
🧹 Nitpick comments (2)
packages/core/src/v3/errors.ts (2)

849-856: Augment exception events for SIGSEGV too.

exceptionEventEnhancer directly prettifies types for OOM/MAYBE_OOM/SIGTERM but not SIGSEGV. Add it for consistent UX.

-    case TaskRunErrorCodes.TASK_PROCESS_MAYBE_OOM_KILLED:
-    case TaskRunErrorCodes.TASK_PROCESS_OOM_KILLED:
-    case TaskRunErrorCodes.TASK_PROCESS_SIGTERM: {
+    case TaskRunErrorCodes.TASK_PROCESS_MAYBE_OOM_KILLED:
+    case TaskRunErrorCodes.TASK_PROCESS_OOM_KILLED:
+    case TaskRunErrorCodes.TASK_PROCESS_SIGTERM:
+    case TaskRunErrorCodes.TASK_PROCESS_SIGSEGV: {
       return {
         ...exception,
         ...getPrettyExceptionEvent(exception.type),
       };
     }

349-357: Add tests to lock behavior: SIGSEGV retries and settings lookup.

Please add unit tests asserting:

  • shouldRetryError({ type: "INTERNAL_ERROR", code: "TASK_PROCESS_SIGSEGV", ... }) === true
  • shouldLookupRetrySettings(...) === true for SIGSEGV

I can scaffold these if helpful.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9d1aad and b6cba90.

📒 Files selected for processing (1)
  • packages/core/src/v3/errors.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Always prefer using isomorphic code like fetch, ReadableStream, etc. instead of Node.js specific code
For TypeScript, we usually use types over interfaces
Avoid enums
No default exports, use function declarations

Files:

  • packages/core/src/v3/errors.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

We use zod a lot in packages/core and in the webapp

Files:

  • packages/core/src/v3/errors.ts
🧠 Learnings (2)
📓 Common learnings
Learnt from: nicktrn
PR: triggerdotdev/trigger.dev#1418
File: packages/core/src/v3/errors.ts:364-371
Timestamp: 2024-10-18T15:41:52.352Z
Learning: In `packages/core/src/v3/errors.ts`, within the `taskRunErrorEnhancer` function, `error.message` is always defined, so it's safe to directly call `error.message.includes("SIGTERM")` without additional checks.
📚 Learning: 2024-10-18T15:41:52.352Z
Learnt from: nicktrn
PR: triggerdotdev/trigger.dev#1418
File: packages/core/src/v3/errors.ts:364-371
Timestamp: 2024-10-18T15:41:52.352Z
Learning: In `packages/core/src/v3/errors.ts`, within the `taskRunErrorEnhancer` function, `error.message` is always defined, so it's safe to directly call `error.message.includes("SIGTERM")` without additional checks.

Applied to files:

  • packages/core/src/v3/errors.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (22)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (2)
packages/core/src/v3/errors.ts (2)

352-356: LGTM: Include SIGSEGV in retry-settings lookup.

Adding TASK_PROCESS_SIGSEGV to shouldLookupRetrySettings aligns with the PR intent.


306-311: Confirm and fix 'TASK_HAS_N0_EXECUTION_SNAPSHOT' (zero vs letter O)

Repo consistently uses the zero form at:

  • packages/core/src/v3/schemas/common.ts:181
  • packages/core/src/v3/errors.ts:308
  • internal-packages/run-engine/src/engine/errors.ts:55

If the intended code is TASK_HAS_NO_EXECUTION_SNAPSHOT, update the canonical enum in schemas/common.ts and all usages — this alters the public schema and may be a breaking change.

@nicktrn nicktrn merged commit a55294b into main Sep 16, 2025
31 checks passed
@nicktrn nicktrn deleted the fix/retry-sigsegv branch September 16, 2025 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants