Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clearly explain that draining a swarm node does not wait for replcas to be started on an active node before stopping tasks on a node being drained #9917

Open
airmnichols opened this issue Nov 20, 2019 · 6 comments
Labels
area/engine Issue affects Docker engine/daemon lifecycle/frozen

Comments

@airmnichols
Copy link

File: engine/swarm/swarm-tutorial/drain-node.md

States:

"Sometimes, such as planned maintenance times, you need to set a node to DRAIN availability. DRAIN availability prevents a node from receiving new tasks from the swarm manager. It also means the manager stops tasks running on the node and launches replica tasks on a node with ACTIVE availability."

This is misleading in that a drain operation has no logic to maintain the configured number of replicas during a drain operation.

This should be clearly explained.

If you have a two worker node swarm and have performed maintenance on worker node 1, this has all replicas running on worker node 2.

If you then drain worker node 2 for patching, it causes downtime because swarm doesn't for example, stop replica 1 on node 2, start replica 1 on node 1 before moving on to do the same for replica 2.

The current design causes downtime for applications.
Support advised this is expected behavior and a workaround is to reconfigure all running services to have more replicas to force them to start on another worker node before issuing a drain command for a node.

@traci-morrison traci-morrison added the area/engine Issue affects Docker engine/daemon label Dec 4, 2019
@daliborfilus
Copy link

Yes! Bitten by this just now.

@airmnichols
Copy link
Author

Yes! Bitten by this just now.

Kubernetes with pod disruption budgets is the way honestly.
After moving from swarm to k8s things have been so much more reliable.

@docker-robott
Copy link
Collaborator

There hasn't been any activity on this issue for a long time.
If the problem is still relevant, mark the issue as fresh with a /remove-lifecycle stale comment.
If not, this issue will be closed in 14 days. This helps our maintainers focus on the active issues.

Prevent issues from auto-closing with a /lifecycle frozen comment.

/lifecycle stale

@daliborfilus
Copy link

daliborfilus commented Nov 25, 2022

@docker-robot It's not our fault that the maintaniners are busy. That doesn't make the issue invalid. I'd like every damn bot (and their masters) to know this. I understand that having these bots helps triage important issues like a garbage collector, but a human should decide if it's garbage or not. Not a "timeout".

@everyx
Copy link

everyx commented Nov 29, 2022

This is really confusing and reduces flexibility and reliability, now I need to manually configure a label instead of relying on this built-in availability feature, hope this can be improved.

@everyx
Copy link

everyx commented Nov 29, 2022

related moby/moby#34139

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/engine Issue affects Docker engine/daemon lifecycle/frozen
Projects
None yet
Development

No branches or pull requests

6 participants