You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Clearly explain that draining a swarm node does not wait for replcas to be started on an active node before stopping tasks on a node being drained
#9917
Open
airmnichols opened this issue
Nov 20, 2019
· 6 comments
"Sometimes, such as planned maintenance times, you need to set a node to DRAIN availability. DRAIN availability prevents a node from receiving new tasks from the swarm manager. It also means the manager stops tasks running on the node and launches replica tasks on a node with ACTIVE availability."
This is misleading in that a drain operation has no logic to maintain the configured number of replicas during a drain operation.
This should be clearly explained.
If you have a two worker node swarm and have performed maintenance on worker node 1, this has all replicas running on worker node 2.
If you then drain worker node 2 for patching, it causes downtime because swarm doesn't for example, stop replica 1 on node 2, start replica 1 on node 1 before moving on to do the same for replica 2.
The current design causes downtime for applications.
Support advised this is expected behavior and a workaround is to reconfigure all running services to have more replicas to force them to start on another worker node before issuing a drain command for a node.
The text was updated successfully, but these errors were encountered:
There hasn't been any activity on this issue for a long time.
If the problem is still relevant, mark the issue as fresh with a /remove-lifecycle stale comment.
If not, this issue will be closed in 14 days. This helps our maintainers focus on the active issues.
Prevent issues from auto-closing with a /lifecycle frozen comment.
@docker-robot It's not our fault that the maintaniners are busy. That doesn't make the issue invalid. I'd like every damn bot (and their masters) to know this. I understand that having these bots helps triage important issues like a garbage collector, but a human should decide if it's garbage or not. Not a "timeout".
This is really confusing and reduces flexibility and reliability, now I need to manually configure a label instead of relying on this built-in availability feature, hope this can be improved.
File: engine/swarm/swarm-tutorial/drain-node.md
States:
"Sometimes, such as planned maintenance times, you need to set a node to DRAIN availability. DRAIN availability prevents a node from receiving new tasks from the swarm manager. It also means the manager stops tasks running on the node and launches replica tasks on a node with ACTIVE availability."
This is misleading in that a drain operation has no logic to maintain the configured number of replicas during a drain operation.
This should be clearly explained.
If you have a two worker node swarm and have performed maintenance on worker node 1, this has all replicas running on worker node 2.
If you then drain worker node 2 for patching, it causes downtime because swarm doesn't for example, stop replica 1 on node 2, start replica 1 on node 1 before moving on to do the same for replica 2.
The current design causes downtime for applications.
Support advised this is expected behavior and a workaround is to reconfigure all running services to have more replicas to force them to start on another worker node before issuing a drain command for a node.
The text was updated successfully, but these errors were encountered: