Cooldown Worker Only on Failed Processes #1566

dbpolito · 2025-06-13T18:29:23Z

Problem

Currently, Horizon adds a 1-minute cooldown if a worker dies in less than 1 second. While this is a sensible default to prevent rapid restarts, it causes issues in some scenarios.

For example, I have a job that runs in under 1 second but exceeds the memory limit of the worker (e.g., >128MB). This causes the worker to be gracefully stopped (exit code 0) and replaced. However, because the cooldown logic treats all quick exits the same, it enforces a 1-minute delay before restarting the worker.

As a result, even with autoscaling set to 10 workers, I’m only processing 10 jobs per minute, one per worker, because each is subject to the cooldown.

Solution

This PR updates the cooldown logic to only apply when a worker fails (i.e., crashes or exits with a non-zero code). If the worker exits gracefully (exit code 0), no cooldown is applied.

This change ensures that autoscaling and memory limits can work together effectively without unintentionally throttling throughput.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cooldown Worker Only on Failed Processes #1566

Cooldown Worker Only on Failed Processes #1566

Uh oh!

dbpolito commented Jun 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Cooldown Worker Only on Failed Processes #1566

Cooldown Worker Only on Failed Processes #1566

Uh oh!

Conversation

dbpolito commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Related

Uh oh!

Uh oh!

Uh oh!

dbpolito commented Jun 13, 2025 •

edited

Loading