fix: handle recovery in queue processing by ren0503 · Pull Request #57 · tinh-tinh/queue

ren0503 · 2025-10-20T13:46:52Z

No description provided.

coderabbitai · 2025-10-20T13:47:16Z

Summary by CodeRabbit

Chores
- Updated Go toolchain from v1.23.0 to v1.24.0 and bumped key dependencies for improved performance and stability.
Bug Fixes
- Enhanced error handling in job execution with panic recovery and proper logging to prevent unexpected crashes.
Tests
- Updated crash handling tests for improved error scenario coverage.

Summary by CodeRabbit

Chores
- Upgraded Go toolchain from 1.23.0 to 1.24.0 and updated external dependencies for compatibility and stability.
Bug Fixes
- Improved job queue robustness by adding panic recovery and centralizing error handling.

Walkthrough

Renamed the job error handler to exported HandlerError, added a defer-recover that logs panics in job execution goroutines, and bumped Go/tooling and several dependencies; tests updated to exercise an early panic path.

Changes

Cohort / File(s)	Summary
Dependency Updates `go.mod`	Bumped Go version (1.23.0 → 1.24.0) and toolchain (go1.24.1); upgraded `github.com/go-redsync/redsync/v4` v4.13.0→v4.14.0, `github.com/redis/go-redis/v9` v9.10.0→v9.14.1, `github.com/tinh-tinh/tinhtinh/v2` v2.1.3→v2.3.4; removed indirect `golang.org/x/sync` v0.7.0.
Job Error Handling `job.go`	Renamed `handlerError` → exported `HandlerError`; updated call sites to use `Job.HandlerError(...)`; error storage key changed to use queue name directly.
Panic Recovery & Logging `queue.go`	Added `fmt` import; wrapped job execution goroutines with `defer`+`recover` to convert panics into formatted logs via `fmt.Sprintf` and `q.formatLog`; refactored log dispatch to switch on log type.
Test Coverage `queue_test.go`	Updated Test_Crash: enqueued a second job (`Id == "2"`) and added an early panic path before `Process()` to validate recovery behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Queue
    participant Goroutine
    participant Job
    participant Logger

    Queue->>Goroutine: spawn job execution
    activate Goroutine
    Note right of Goroutine: defer { recover -> format log } (NEW)
    Goroutine->>Job: invoke Process()
    alt Process returns error
        Job->>Job: HandlerError(reason)
        Job-->>Goroutine: status handled
    else Process panics
        Goroutine->>Logger: recover panic -> fmt.Sprintf -> q.formatLog (NEW)
        Logger-->>Goroutine: fatal/log emitted
    end
    deactivate Goroutine

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped through code where panics played,

Caught by a defer and softly laid.
Logs now bloom where crashes stood,
Gentle fixes, tidy and good. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Description Check	❓ Inconclusive	No pull request description was provided by the author. The PR description field is completely empty, which means there is no content to evaluate against the criterion of being related to the changeset. Without any description present, it is impossible to determine whether there is a meaningful relation to the changes or not, making a definitive pass or fail assessment impossible.	The author should add a pull request description that explains the purpose and context of the changes. While the title provides some information about handling recovery in queue processing, a more detailed description would help reviewers understand the motivation for the changes, particularly regarding the dependency updates, the error handler renaming in job.go, and the panic recovery mechanism in queue.go.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The pull request title "fix: handle recovery in queue processing" directly aligns with the primary changes in the changeset. The most significant modification is in queue.go, which adds a defer-recover block to the Run method to handle panics during job execution and convert them to formatted logs. This is precisely what the title describes. While the changeset includes secondary modifications such as dependency updates in go.mod and a method rename in job.go, the title appropriately captures the main purpose of the PR as indicated by the branch name "fix/ren/56-recovery-job-in-queue-process" and the Codecov comments highlighting queue.go coverage.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/ren/56-recovery-job-in-queue-process

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov-commenter · 2025-10-20T13:56:16Z

Codecov Report

❌ Patch coverage is 84.61538% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
queue.go	77.77%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

queue.go (1)

256-266: Data race on finishedJob slice in Retry().

Multiple goroutines append to finishedJob without synchronization. This is a race and can corrupt the slice.

-       var finishedJob []string
+       var finishedJob []string
+       var mu sync.Mutex
        for i := range numJobs {
            job := numJobs[i]
            wg.Add(1)
            go func(job *Job) {
                defer wg.Done()
                q.jobFnc(job)
                if job.IsFinished() {
-                   finishedJob = append(finishedJob, job.Id)
+                   mu.Lock()
+                   finishedJob = append(finishedJob, job.Id)
+                   mu.Unlock()
                }
            }(job)
        }

🧹 Nitpick comments (4)

queue.go (1)
181-183: Avoid deferring cancel() inside the processing loop.

Deferring cancel in the loop delays timer cleanup until Run returns, leaking timers per batch. Call cancel() after the select instead.
-       ctx, cancel := context.WithTimeout(context.Background(), q.config.Timeout)
-       defer cancel()
+       ctx, cancel := context.WithTimeout(context.Background(), q.config.Timeout)
...
        select {
        case <-done:
            q.formatLog(LoggerInfo, "All jobs done\n")
        case <-ctx.Done():
            q.MarkJobFailedTimeout(numJobs)
        }
+       // Clean up the timer promptly per-iteration
+       cancel()
queue_test.go (1)
181-183: Test_Crash may still fail if recovered panics are logged with Fatal.

If the production code logs recovered panics with LoggerFatal, the process exits (os.Exit) and tests abort. After applying the non-fatal logging fix in queue.go, this should be stable. If you need a stopgap, set Logger to LoggerInfo or LoggerDisabled here.
 userQueue := queue.New("crash", &queue.Options{
@@
-    RetryFailures: 3,
+    RetryFailures: 3,
+    // Optional: keep CI resilient even if logging behavior changes
+    // Logger:        queue.LoggerInfo,
 })
Also applies to: 194-197
job.go (2)
73-79: Optional: capture stack trace for diagnostics on panic.

You already convert the panic into a failure via HandlerError. Capturing debug.Stack() here improves postmortem debugging.
 import (
     "context"
     "fmt"
+    "runtime/debug"
     "time"
 )
@@
     defer func() {
         if r := recover(); r != nil {
-            failedReason := fmt.Sprintf("%v", r)
-            job.HandlerError(failedReason)
+            job.Stacktrace = append(job.Stacktrace, string(debug.Stack()))
+            job.HandlerError(fmt.Sprintf("%v", r))
         }
     }()
89-105: Use the queue’s context when storing errors.

HandlerError should use job.queue.ctx (or a passed-in ctx) instead of context.Background(), to respect caller cancellation and timeouts.
-   _, err := client.HSet(context.Background(), job.queue.Name, job.Id, job.FailedReason).Result()
+   _, err := client.HSet(job.queue.ctx, job.queue.Name, job.Id, job.FailedReason).Result()

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ff38f41 and 1c917fc.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (4)

go.mod (1 hunks)
job.go (2 hunks)
queue.go (2 hunks)
queue_test.go (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

queue.go (1)

constant.go (1)

LoggerFatal (8-8)

queue_test.go (1)

queue.go (1)

AddJobOptions (110-114)

🔇 Additional comments (1)

go.mod (1)

3-5: Verify minimum Go 1.24 across CI/dev; forward compatibility with 1.25+ confirmed.

The go 1.24.0 directive is correct, but the concern about "builds will fail" without go1.24.x is imprecise. Builds succeed on go 1.25+ (verified: system running go 1.25.1 resolves all modules and compiles successfully). The actual requirement is go ≥ 1.24.0; versions older than 1.24.0 will cause build failures.

Edge cases to verify in your CI fleet:

If any runner is on macOS 11: Go 1.25+ drops macOS 11 support; stay on 1.24.x or upgrade runners to macOS 12+.

If tests are floating-point sensitive (AMD64): Go 1.25+ uses FMA instructions, which may alter precision; confirm test stability.

If using cgo with C bindings: Go 1.25 enables ASAN leak detection by default; may flag previously silent leaks (disable with ASAN_OPTIONS if needed).

Otherwise, ensuring runners have go1.24.0+ (including 1.25+) is sufficient.

coderabbitai · 2025-10-20T13:56:29Z

queue.go

+				defer func() {
+					if r := recover(); r != nil {
+						failedReason := fmt.Sprintf("%v", r)
+						q.formatLog(LoggerFatal, "Error when processing job: %v\n", failedReason)
+					}
+				}()


⚠️ Potential issue | 🔴 Critical

Recovered panic should not terminate the process; also mark the job failed/delayed.

log.Fatalf in a recover path exits the process, defeating recovery and likely breaking Test_Crash.

The job isn’t marked failed/delayed here, so an early panic (before job.Process) leaves it stuck in Wait.

Fix by logging non-fatally and recording the failure via HandlerError:

- defer func() { - if r := recover(); r != nil { - failedReason := fmt.Sprintf("%v", r) - q.formatLog(LoggerFatal, "Error when processing job: %v\n", failedReason) - } - }() + defer func() { + if r := recover(); r != nil { + // Ensure the job transitions out of Wait/Active + job.HandlerError(fmt.Sprint(r)) + // Do not exit the process; just log the incident + q.formatLog(LoggerInfo, "Recovered panic while processing job %s: %v\n", job.Id, r) + } + }()

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

job.go (1)
73-78: Consider capturing the stack trace for panics.

The defer-recover block correctly handles panics, but it only captures the panic value without the stack trace. The Job struct has a Stacktrace field (line 27) that could be populated here for better debugging.

Apply this diff to capture the stack trace:
+import (
+	"runtime/debug"
+)
+
 defer func() {
 	if r := recover(); r != nil {
 		failedReason := fmt.Sprintf("%v", r)
+		job.Stacktrace = append(job.Stacktrace, string(debug.Stack()))
 		job.HandlerError(failedReason)
 	}
 }()

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c917fc and 278848f.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (4)

go.mod (1 hunks)
job.go (2 hunks)
queue.go (3 hunks)
queue_test.go (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

queue.go
queue_test.go
go.mod

🔇 Additional comments (3)

job.go (3)

69-87: LGTM: Consistent error handling flow.

The Process method now handles both panic recovery (line 76) and normal errors (line 85) through the same HandlerError method, providing a consistent error handling path.

89-89: Let me search for any documentation or test references to confirm whether this export is intentional:

Verify that exporting HandlerError is intentional and necessary.

Based on a search of the codebase, HandlerError is not called from any external packages—only internally within job.go. According to Go best practices, unexported methods are a perfectly good design choice when they're not needed to satisfy an interface, keeping them from becoming part of the externally visible API.

Consider whether this method should remain exported or be reverted to an unexported method, as minimizing the exported API surface keeps the package's usage clear and intentions explicit.

94-94: Breaking change confirmed: Redis error storage key was simplified, but collision concern is unsupported.

The change from <queue>store to <queue> is intentional (commit 278848f). However, the review's concern about collisions is unfounded—no other data in the codebase uses queue.Name as a Redis key. The application itself won't break since error data is never read back.

Verify: Is this simplification intentional? If so, document it as a breaking change for any external tools reading the old <queue>store key and provide migration guidance for existing deployments.

ren0503 added this to the Queue v2.1.1 milestone Oct 20, 2025

ren0503 linked an issue Oct 20, 2025 that may be closed by this pull request

Processing In Queue Not Recovery When Panic #56

Closed

fix: handle recovery in queue processing

278848f

ren0503 force-pushed the fix/ren/56-recovery-job-in-queue-process branch from 1c917fc to 278848f Compare October 20, 2025 13:54

coderabbitai bot reviewed Oct 20, 2025

View reviewed changes

ren0503 merged commit a76f848 into master Oct 20, 2025
2 checks passed

coderabbitai bot mentioned this pull request Jan 17, 2026

feat: add methods to list and manage failed jobs in Redis #69

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle recovery in queue processing#57

fix: handle recovery in queue processing#57
ren0503 merged 1 commit intomasterfrom
fix/ren/56-recovery-job-in-queue-process

ren0503 commented Oct 20, 2025

Uh oh!

coderabbitai bot commented Oct 20, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Oct 20, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 20, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ren0503 commented Oct 20, 2025

Uh oh!

coderabbitai bot commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

codecov-commenter commented Oct 20, 2025

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Oct 20, 2025 •

edited

Loading