Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify warnings about unmovable pods #389

Merged
merged 6 commits into from
Dec 21, 2018

Conversation

sdudoladov
Copy link
Member

Ease debugging the situation when a PG master pod cannot be moved from a node that became unschedulable:

  1. Report all such pods in one place instead of scattering warnings through the operator log
  2. Report all the master ports on a node even if they are leftovers from the PG clusters that no longer exist

@coveralls
Copy link

coveralls commented Sep 17, 2018

Coverage Status

Coverage remained the same at 23.766% when pulling 60a2c2e on fix-reporting-masters-under-migration into ff5c63d on master.

@@ -142,14 +149,17 @@ func (c *Controller) moveMasterPodsOffNode(node *v1.Node) {
cl.Unlock()
}

totalPods := len(masterPods)
totalPods := len(nodePods)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's wrong - we want to report not all the pods, but only master pods that we wanted to migrate, since:

%d out of %d master pods have been moved out from the node %q"

Which means that totalPods should be equal to the number of all master pods, that we want to move, and probably we need another counter at line 133. Although it's already a bit too complicated for such task, so I would probably even create a filter util function to filter pods with different conditions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want to report not all the pods, but only master pods that we wanted to migrate

that was exactly the issue with the original code - it reported only the master pods that the operator wanted to migrated and omitted those that originated from no-longer-existing clusters. It took us some time to figure out that there are other leftover (master) pods on the node that prevent the node from being shut down.

we need another counter at line 133

to count leftover pods separately ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then the message could not move %d master pods from the node %q is wrong, since we count all the pods that we can't move, not only masters - right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked, both the method's name moveMasterPodsOffNode and the code suggest we only move master pods. This makes sense to me since replicas can be transparently re-created by deployments, but re-creating a master requires a failover first that has to be explicitly triggered

UPD I realized from where the confusion stems and will update the pr

Jan-M
Jan-M previously approved these changes Dec 21, 2018
@@ -55,15 +58,16 @@ func (c *Controller) nodeUpdate(prev, cur interface{}) {
return
}

if util.MapContains(nodeCur.Labels, map[string]string{"master": "true"}) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check is covered by nodeIsReady(nodeCur)

@sdudoladov sdudoladov merged commit 4fa09e0 into master Dec 21, 2018
sdudoladov pushed a commit that referenced this pull request Dec 21, 2018
sdudoladov added a commit that referenced this pull request Dec 21, 2018
This reverts commit 4fa09e0.

Reason: the reverted commit bloats the logs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants