removing inner goroutine in cluster.Switchover #1876

FxKu · 2022-04-29T19:30:43Z

Switchover function has this inner go routine which is not needed. I guess, it was once introduced with the idea to have two go routines that wait for each other when sending and consuming evens on the podEvent channels. But even with a WaitGroup added later, chances for a race condition between closing the channel (consumer side) and sending events were still high as the operator did panic in most of my tests. And because of the fact that we wait anyway for the go routine to finish the need for concurrency in this part of the code looks questionable.

With the go routine removed from Switchover, chances for hitting the race condition are already a lot lower. To fully eliminate the possibility of running into it, the PR moves c.podSubscribersMu.RUnlock() after the event handling. Thus, it will always send on the channel first before unregisterPodSubscriber called from Switchover can close it. Since unbuffered channels are usually blocking when there's no receiver (Switchover might already be done), a select block with a do nothing default case has to be provided. See this go example.

The PR changes the check and timeout setting for deleting and patching Pods. PatroniAPICheck* config values are used because their defaults correspond to the hard coded values of 1s and 5s that we used before. But PatroniAPICheck* constants should only be used when calling the Patroni REST API instead.

The PR also fixes WaitGroup when PostgresTeam informer is enabled. Thanks @dmvolod for the hint in #1874 .

…en processPodEvent and unregisterPodSubscriber

FxKu · 2022-05-16T13:36:39Z

👍

pkg/controller/controller.go

pkg/cluster/pod.go

sdudoladov · 2022-05-17T12:08:01Z

pkg/cluster/pod.go

 	}

-	close(c.podSubscribers[podName])
 	delete(c.podSubscribers, podName)


close() before delete() looks a bit safer here. (what does Goalng do to ch when the entry that contains it is removed from c.podSubscribers ?)

Delete only removes the channel from c.podSubscribers map. We can still safely close it afterwards. As the go routine processPodEvent first gets the channel, then writes to it, the idea was to remove the channel from the map first then close it - saving some nanoseconds in this race condition we faced.

pkg/cluster/pod.go

pkg/cluster/cluster.go

sdudoladov · 2022-05-17T16:03:11Z

👍

FxKu · 2022-05-17T16:09:57Z

👍

FxKu requested review from sdudoladov, Jan-M, CyberDem0n, jopadi and idanovinda as code owners April 29, 2022 19:30

FxKu added this to the 1.8.1 milestone Apr 29, 2022

removing inner goroutine in cluster.Switchover and resolve race betwe…

3641911

…en processPodEvent and unregisterPodSubscriber

FxKu force-pushed the switchover-subscriber-fix branch from c9b8dc7 to 3641911 Compare May 5, 2022 14:06

FxKu added 2 commits May 5, 2022 17:31

defer unlocking podSubscriber for Switchover

edd5d0a

panic vs. deadlock

125512e

This was referenced May 11, 2022

signal processPodEvent routine to close PodEvent channel #1888

Closed

Use second channel for podSubscribers to trigger its removal by cluster go routine #1891

Closed

unlock mutex after handling event, now with non-blocking default case

264e66d

sdudoladov reviewed May 17, 2022

View reviewed changes

reflect code review

fcb74e9

FxKu closed this May 17, 2022

FxKu reopened this May 17, 2022

FxKu merged commit 268a86a into master May 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

removing inner goroutine in cluster.Switchover #1876

removing inner goroutine in cluster.Switchover #1876

Uh oh!

FxKu commented Apr 29, 2022 •

edited

Loading

Uh oh!

FxKu commented May 16, 2022

Uh oh!

Uh oh!

Uh oh!

sdudoladov May 17, 2022

Uh oh!

FxKu May 17, 2022

Uh oh!

Uh oh!

Uh oh!

sdudoladov commented May 17, 2022

Uh oh!

FxKu commented May 17, 2022

Uh oh!

Uh oh!

removing inner goroutine in cluster.Switchover #1876

removing inner goroutine in cluster.Switchover #1876

Uh oh!

Conversation

FxKu commented Apr 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FxKu commented May 16, 2022

Uh oh!

Uh oh!

Uh oh!

sdudoladov May 17, 2022

Choose a reason for hiding this comment

Uh oh!

FxKu May 17, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sdudoladov commented May 17, 2022

Uh oh!

FxKu commented May 17, 2022

Uh oh!

Uh oh!

FxKu commented Apr 29, 2022 •

edited

Loading