Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry Kitchen::Provisioner#run_command after allowed exit codes #1055

Merged
merged 10 commits into from Jun 17, 2016

Conversation

smurawski
Copy link
Contributor

@smurawski smurawski commented Jun 13, 2016

  • Allow user to define exit codes to retry Provision#run_command
  • Allow user to define wait time between attempts
  • Allow user to define exit codes to retry
  • Allow user to define maximum number of retry attempts

Resolves #1016

@smurawski smurawski changed the title Retry Kitchen::Provisioner#run_command after allowed exit codes WIP - Retry Kitchen::Provisioner#run_command after allowed exit codes Jun 13, 2016
@smurawski smurawski self-assigned this Jun 13, 2016
@smurawski
Copy link
Contributor Author

I've tested this against Windows, Centos, and Ubuntu guests. The Windows guests worked the most consistently. The Centos and Ubuntu guests were more of a race to see if Chef could finish before the system shut down. (See chef/chef#5026)

@smurawski smurawski changed the title WIP - Retry Kitchen::Provisioner#run_command after allowed exit codes Retry Kitchen::Provisioner#run_command after allowed exit codes Jun 15, 2016
@smurawski
Copy link
Contributor Author

With this PR - all provisioners support three new configuration settings.

  • retry_on_exit_code - which takes an array of exit codes that can indicate that kitchen should retry the converge command. Defaults to an empty array.
  • max_retries - number of times to retry the converge before passing along the failed status. Defaults to 1.
  • wait_for_retry - number of seconds to wait between converge attempts. Defaults to 30.

@smurawski
Copy link
Contributor Author

smurawski commented Jun 15, 2016

Appveyor build failure seems specific to the build box. More "symlink is unimplemented" errors, which seem to be common amongst appveyor builds recently.

Rebased on master after merging #1057 to fix appveyor tests.

@smurawski smurawski mentioned this pull request Jun 16, 2016
@cheeseplus
Copy link

+1

@@ -170,17 +170,25 @@ def run_action(action, instances, *args)
concurrency.times do
threads << Thread.new do
while instance = queue.pop
puts "running #{instance.name}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this debug output? If not, we should probably use the logger methods instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same question and then noticed puts getting used for these in several places in TK. a bit weird but separate from this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops.. that should be pulled. I was troubleshooting the difference between actionfailed and instancefailed. Thanks.

@adamleff
Copy link
Contributor

Some minor things, all non-blockers. 👍

@mwrock
Copy link
Member

mwrock commented Jun 16, 2016

👍

1 similar comment
@lamont-granquist
Copy link
Contributor

👍

@carpnick
Copy link

Thanks for doing this @smurawski. This one was a long time coming to TK. Thanks for putting in the effort on chef and TK to get this done.

@test-kitchen test-kitchen locked and limited conversation to collaborators Nov 16, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants