Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Platform] Release node should proceed further incase of ssh failure after retries #4171

Closed
ramkumarvs opened this issue Apr 8, 2020 · 0 comments
Assignees
Labels
area/platform Yugabyte Platform priority/high High Priority
Projects

Comments

@ramkumarvs
Copy link
Contributor

Currently our release node fails if the node is not reachable via SSH. We should add some retry logic into our SSH and if it fails, then we should go ahead with release process and go ahead blacklisting etc.

@ramkumarvs ramkumarvs added priority/high High Priority area/platform Yugabyte Platform labels Apr 8, 2020
WesleyW added a commit that referenced this issue Apr 15, 2020
Summary:
While removing nodes, if the node is unreachable, the task will never stop looping. Now the
WaitForDataMove task will not be created unless the tserver is available.

While releasin nodes, if the node is unreachable, the task will fail - made the call a forceDelete
so it will ignore errors.

Test Plan:
Create 2 AWS instances.
Create onprem provider.
Create universe with both instances.
Stop one instance.
Remove and release the node

Reviewers: arnav, bogdan, rao, ram

Reviewed By: ram

Subscribers: jenkins-bot, yugaware

Differential Revision: https://phabricator.dev.yugabyte.com/D8293
@WesleyW WesleyW closed this as completed Apr 15, 2020
@rkarthik007 rkarthik007 added this to Done in Platform Jun 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform Yugabyte Platform priority/high High Priority
Projects
Platform
  
Closed
Development

No branches or pull requests

2 participants