Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] wait before checking scheduler is running on standby node #40

Merged
merged 3 commits into from
Aug 23, 2022

Conversation

prakshalj0512
Copy link
Member

@prakshalj0512 prakshalj0512 commented Aug 22, 2022

  • add timeout before checking scheduler is running on standby node

Issue:

  • if active node completely goes down, the standby node accurate identifies the situation and attempts to start the scheduler on the standby node
  • however, not enough time is allocated to allow for the restart to happen on standby node

Resolution:

  • moved the time.sleep logic to the startup_scheduler method so it’s always applied

@prakshalj0512 prakshalj0512 requested review from rssanders3 and removed request for rssanders3 August 23, 2022 03:21
Copy link
Contributor

@rssanders3 rssanders3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By my review of the logic, the sleep will still be executed in the original version of the code The only difference appears to be that the self.logger.info("Finished starting Scheduler on host '" + str(host) + "'") Line will be executed only after the pause. I don't see how this would address waiting for the scheduler to start up any more then what its already doing.

@prakshalj0512
Copy link
Member Author

prakshalj0512 commented Aug 23, 2022

root cause: https://github.com/teamclairvoyant/airflow-scheduler-failover-controller/blob/master/scheduler_failover_controller/failover/failover_controller.py#L103-L108
in the standby logic, it just calls the startup_scheduler method and immediately checks if the scheduler is up & running. moving the sleep to the startup_scheduler func will ensure that the sleep happens both when the scheduler is started on the active node (line 98) as well as the standby node.

@rssanders3
Copy link
Contributor

Ok gotcha, makes sense. LGTM

@prakshalj0512 prakshalj0512 merged commit 665db0b into master Aug 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants