Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The beaker reboot needs to cross minute boundaries based on the method used #1674

Closed
trevor-vaughan opened this issue Aug 11, 2020 · 6 comments · Fixed by #1675 or #1677
Closed

The beaker reboot needs to cross minute boundaries based on the method used #1674

trevor-vaughan opened this issue Aug 11, 2020 · 6 comments · Fixed by #1675 or #1677
Assignees

Comments

@trevor-vaughan
Copy link
Contributor

Currently the reboot command for Unix uses who -b to detect a reboot. Unfortunately, this means that there is a race condition that may result in the system being unable to detect a successful reboot if it doesn't cross the minute boundary.

The last -F reboot command does not exist on all systems, but if it works, it can get second-level granularity. If not, we can fall back to who -b and sleep past the minute boundary before proceeding.

I'm working on a patch for this now.

@igalic
Copy link
Contributor

igalic commented Aug 11, 2020

on FreeBSD, OpenBSD, NetBSD and Solaris (Illumos) -F is not required, by which i mean, it'll probably throw an error

@trevor-vaughan
Copy link
Contributor Author

@igalic Can you provide the output from BSD?

@igalic
Copy link
Contributor

igalic commented Aug 11, 2020

in about three hours ;)

@trevor-vaughan
Copy link
Contributor Author

@igalic I just checked via Vagrant (because I remembered that's a thing). It won't work but it doesn't give the granularity required anyway so the fallback method will work properly.

trevor-vaughan added a commit to trevor-vaughan/beaker that referenced this issue Aug 11, 2020
Tries a more granular method for getting the seconds on a reboot.

Failing that, falls back to 'who -b' and sleeps for the remaining
seconds up to the next minute plus one to ensure that the reboot time
gets updated sufficiently.

Fixes voxpupuli#1674
@trevor-vaughan trevor-vaughan self-assigned this Aug 11, 2020
@trevor-vaughan
Copy link
Contributor Author

I'm reopening this issue because I just hit a weird edge case where the execution of a Beaker::Command on a remote host is returning nil.

I honestly don't see how this is possible in the code base, but it's definitely happening.

I've moved to catching all standard exceptions (yeah, I know) as the main retry block since reboots can cause really weird side effects overall.

Will upload a new patch when this is no longer occurring.

@trevor-vaughan
Copy link
Contributor Author

The existing patch is OK for most cases but it turns out that VMs occasionally reboot and have a start time that is prior to the shutdown time.

I'm working on another update to address this (absolutely horrible) case.

trevor-vaughan added a commit to trevor-vaughan/beaker that referenced this issue Aug 25, 2020
Updates the UNIX reboot method to handle systems that go back in time
after reboot.

This can happen often on virtual machines that are running on a heavily
loaded system.

The error handling had to be loosened to allow the loop to handle
whatever bizarre circumstances get thrown at it by the underlying
system.

Fixes voxpupuli#1674
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants