Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sleep before retrying SSH#establish_connection. #399

Merged
merged 1 commit into from
May 1, 2014
Merged

Conversation

fnichol
Copy link
Contributor

@fnichol fnichol commented Mar 28, 2014

This addresses a race condition where the remote instance might be
restarting its sshd process and is therefore unavailable for a second or
two.

This addresses a race condition where the remote instance might be
restarting its sshd process and is therefore unavailable for a second or
two.
@@ -124,6 +124,7 @@ def establish_connection
rescue *rescue_exceptions => e
if (retries -= 1) > 0
logger.info("[SSH] connection failed, retrying (#{e.inspect})")
sleep 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 but should this actually be a configurable? It feels dirty to hardcode it when we have such a nice config 😄

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That race condition is quite funny when your running low on memory or CPU.

sleep 1 gets in; then ssh closes and then you can log in actually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retries is currently hard coded, perhaps we should get this in to address the immediate issue, and create an issue to make both configurable. Of course there will be cases where we cannot save the run/connection.

@fnichol
Copy link
Contributor Author

fnichol commented Mar 29, 2014

@sethvargo, @damm: agreed, I'd like to make this tunable at least to Driver authors and possibly to end-users. This was a bit of a placeholder PR as @portertech and I were testing init support in the Docker driver today and found that sshd could restart just after the #wait_for_sshd method returns. Ah race conditions!

A tiny bit of cleanup, then should be ready to merge.

@damm
Copy link

damm commented Mar 29, 2014

@fnichol worst part is ssh restarting is expected behavior on Ubuntu. :(

@portertech
Copy link
Contributor

@miah miah mentioned this pull request Apr 28, 2014
@sethvargo
Copy link
Contributor

I believe this fixes #418. 👍

sethvargo added a commit that referenced this pull request May 1, 2014
Sleep before retrying SSH#establish_connection.
@sethvargo sethvargo merged commit 3d3d024 into master May 1, 2014
@sethvargo sethvargo deleted the ssh-retry-sleep branch May 1, 2014 13:43
@miah
Copy link

miah commented May 5, 2014

I wish this fixed #418. I updated my bundle to git master and I'm still hanging on this. :(

miah@~/projects/gitlab/chef_redis$: time bundle exec kitchen test .*-ubuntu-1404.*
-----> Starting Kitchen (v1.2.2.dev)
-----> Cleaning up any prior instances of <sentinel-ubuntu-1404-11124-redis>
-----> Destroying <sentinel-ubuntu-1404-11124-redis>...
       ==> default: Stopping the VMware VM...
       ==> default: Deleting the VM...
       Vagrant instance <sentinel-ubuntu-1404-11124-redis> destroyed.
       Finished destroying <sentinel-ubuntu-1404-11124-redis> (0m8.03s).
-----> Testing <sentinel-ubuntu-1404-11124-redis>
-----> Creating <sentinel-ubuntu-1404-11124-redis>...
       Bringing machine 'default' up with 'vmware_fusion' provider...
       ==> default: Cloning VMware VM: 'opscode_ubuntu-14.04'. This can take some time...
       ==> default: Verifying vmnet devices are healthy...
       Skipping Berkshelf with --no-provision
       ==> default: Preparing network adapters...
       ==> default: Starting the VMware VM...
       ==> default: Waiting for the VM to finish booting...
^C
real    1m0.435s
user    0m5.528s
sys     0m1.269s

fnichol added a commit that referenced this pull request Jul 21, 2014
@test-kitchen test-kitchen locked and limited conversation to collaborators Nov 16, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants