Skip to content

Conversation

@alexeyklyukin
Copy link
Contributor

Assign the list of clusters in the controller with the up-to-date list
of Postgres manifests on Kubernetes during the startup.

Node migration routines launched asynchronously to the cluster
processing rely on an up-to-date list of clusters in the controller to
detect clusters affected by the migration of the node and lock them
when doing migration of master pods. Without the initial list the
operator was subject to race conditions like the one described at
#363

Restructure the code to decouple list cluster function required by the
postgresql informer from the one that emits cluster sync events. No
extra work is introduced, since cluster sync already runs in a separate
goroutine (clusterResync).

Introduce explicit initial cluster sync at the end of
acquireInitialListOfClusters instead of relying on an implicit one
coming from list function of the PostgreSQL informer.

Some minor refactoring.

Assign the list of clusters in the controller with the up-to-date list
of Postgres manifests on Kubernetes during the startup.

Node migration routines launched asynchronously to the cluster
processing rely on an up-to-date list of clusters in the controller to
detect clusters affected by the migration of the node and lock them
when doing migration of master pods. Without the initial list the
operator was subject to race conditions like the one described at
#363

Restructure the code to decouple list cluster function required by the
postgresql informer from the one that emits cluster sync events. No
extra work is introduced, since cluster sync already runs in a separate
goroutine (clusterResync).

Introduce explicit initial cluster sync at the end of
acquireInitialListOfClusters instead of relying on an implicit one
coming from list function of the PostgreSQL informer.

Some minor refactoring.
@coveralls
Copy link

Coverage Status

Coverage remained the same at 4.606% when pulling 5da4951 on wip/restartable_node_migration into 1405058 on master.

func (c *Controller) Run(stopCh <-chan struct{}, wg *sync.WaitGroup) {
c.initController()

// start workers reading from the events queue to prevent the initial sync from blocking on it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that basically means that workers must always start before acquireInitialListOfClusters executes since it queues the 1st Sync event, right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and if there are no consumers of that queue acquireInitialListOfClusters function will block forever...

@alexeyklyukin alexeyklyukin merged commit 199aa65 into master Aug 8, 2018
@alexeyklyukin alexeyklyukin deleted the wip/restartable_node_migration branch August 8, 2018 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants