-
Notifications
You must be signed in to change notification settings - Fork 944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify warnings about unmovable pods #389
Conversation
pkg/controller/node.go
Outdated
@@ -142,14 +149,17 @@ func (c *Controller) moveMasterPodsOffNode(node *v1.Node) { | |||
cl.Unlock() | |||
} | |||
|
|||
totalPods := len(masterPods) | |||
totalPods := len(nodePods) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's wrong - we want to report not all the pods, but only master pods that we wanted to migrate, since:
%d out of %d master pods have been moved out from the node %q"
Which means that totalPods
should be equal to the number of all master pods, that we want to move, and probably we need another counter at line 133. Although it's already a bit too complicated for such task, so I would probably even create a filter util function to filter pods with different conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we want to report not all the pods, but only master pods that we wanted to migrate
that was exactly the issue with the original code - it reported only the master pods that the operator wanted to migrated and omitted those that originated from no-longer-existing clusters. It took us some time to figure out that there are other leftover (master) pods on the node that prevent the node from being shut down.
we need another counter at line 133
to count leftover pods separately ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then the message could not move %d master pods from the node %q
is wrong, since we count all the pods that we can't move, not only masters - right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double checked, both the method's name moveMasterPodsOffNode
and the code suggest we only move master pods. This makes sense to me since replicas can be transparently re-created by deployments, but re-creating a master requires a failover first that has to be explicitly triggered
UPD I realized from where the confusion stems and will update the pr
@@ -55,15 +58,16 @@ func (c *Controller) nodeUpdate(prev, cur interface{}) { | |||
return | |||
} | |||
|
|||
if util.MapContains(nodeCur.Labels, map[string]string{"master": "true"}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this check is covered by nodeIsReady(nodeCur)
This reverts commit 4fa09e0.
Ease debugging the situation when a PG master pod cannot be moved from a node that became unschedulable: