Clearly explain that draining a swarm node does not wait for replcas to be started on an active node before stopping tasks on a node being drained #9917

airmnichols · 2019-11-20T19:28:30Z

File: engine/swarm/swarm-tutorial/drain-node.md

States:

"Sometimes, such as planned maintenance times, you need to set a node to DRAIN availability. DRAIN availability prevents a node from receiving new tasks from the swarm manager. It also means the manager stops tasks running on the node and launches replica tasks on a node with ACTIVE availability."

This is misleading in that a drain operation has no logic to maintain the configured number of replicas during a drain operation.

This should be clearly explained.

If you have a two worker node swarm and have performed maintenance on worker node 1, this has all replicas running on worker node 2.

If you then drain worker node 2 for patching, it causes downtime because swarm doesn't for example, stop replica 1 on node 2, start replica 1 on node 1 before moving on to do the same for replica 2.

The current design causes downtime for applications.
Support advised this is expected behavior and a workaround is to reconfigure all running services to have more replicas to force them to start on another worker node before issuing a drain command for a node.

daliborfilus · 2022-03-30T09:45:13Z

Yes! Bitten by this just now.

airmnichols · 2022-03-30T11:22:21Z

Yes! Bitten by this just now.

Kubernetes with pod disruption budgets is the way honestly.
After moving from swarm to k8s things have been so much more reliable.

docker-robott · 2022-11-24T01:00:36Z

There hasn't been any activity on this issue for a long time.
If the problem is still relevant, mark the issue as fresh with a /remove-lifecycle stale comment.
If not, this issue will be closed in 14 days. This helps our maintainers focus on the active issues.

Prevent issues from auto-closing with a /lifecycle frozen comment.

/lifecycle stale

daliborfilus · 2022-11-25T22:20:58Z

@docker-robot It's not our fault that the maintaniners are busy. That doesn't make the issue invalid. I'd like every damn bot (and their masters) to know this. I understand that having these bots helps triage important issues like a garbage collector, but a human should decide if it's garbage or not. Not a "timeout".

everyx · 2022-11-29T01:32:51Z

This is really confusing and reduces flexibility and reliability, now I need to manually configure a label instead of relying on this built-in availability feature, hope this can be improved.

everyx · 2022-11-29T01:46:25Z

related moby/moby#34139

traci-morrison added the area/engine label Dec 4, 2019

docker-robott added the lifecycle/stale label Nov 24, 2022

docker-robott removed the lifecycle/stale label Nov 25, 2022

mat007 added the lifecycle/frozen label Nov 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clearly explain that draining a swarm node does not wait for replcas to be started on an active node before stopping tasks on a node being drained #9917

Clearly explain that draining a swarm node does not wait for replcas to be started on an active node before stopping tasks on a node being drained #9917

airmnichols commented Nov 20, 2019

daliborfilus commented Mar 30, 2022

airmnichols commented Mar 30, 2022

docker-robott commented Nov 24, 2022

daliborfilus commented Nov 25, 2022 •

edited

Loading

everyx commented Nov 29, 2022

everyx commented Nov 29, 2022

Clearly explain that draining a swarm node does not wait for replcas to be started on an active node before stopping tasks on a node being drained #9917

Clearly explain that draining a swarm node does not wait for replcas to be started on an active node before stopping tasks on a node being drained #9917

Comments

airmnichols commented Nov 20, 2019

daliborfilus commented Mar 30, 2022

airmnichols commented Mar 30, 2022

docker-robott commented Nov 24, 2022

daliborfilus commented Nov 25, 2022 • edited Loading

everyx commented Nov 29, 2022

everyx commented Nov 29, 2022

daliborfilus commented Nov 25, 2022 •

edited

Loading