Skip to content

Conversation

m90
Copy link
Contributor

@m90 m90 commented Mar 25, 2024

Ticket https://phabricator.wikimedia.org/T342866

It seems that when using Horizon to supervise jobs, job level timeouts are not being considered. Instead only the Horizon config itself will be used. Jobs can still timeout themselves.

Currently, this means that long running jobs like PollForMediaWikiJobs will timeout after 60s in production.

As we probably still want job level granularity for setting this value, this PR changes:

  • Use a timeout of 1h at Horizon level. This will catch jobs that are left hanging when they really shouldn't be
  • Jobs themselves now define a default timeout of 60s (as previously), and jobs that need to run longer need to explicitly declare a $timeout property

class PollForMediaWikiJobsJob extends Job implements ShouldQueue, ShouldBeUnique
{
public $timeout = 3600;
public $timeout = 1800;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this takes something around 3 to 7 minutes in production (depending on current load), so I think it's fine to lower this while we're at it.

@m90 m90 force-pushed the fr/horizon-timeout branch from 719d92e to 3af457a Compare March 25, 2024 12:37
@m90 m90 force-pushed the fr/horizon-timeout branch from 3af457a to d33783d Compare March 25, 2024 12:37
Copy link
Contributor

@deer-wmde deer-wmde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sensible to me

@m90 m90 merged commit b787cb9 into main Mar 25, 2024
@m90 m90 deleted the fr/horizon-timeout branch March 25, 2024 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants