-
Notifications
You must be signed in to change notification settings - Fork 1k
removing inner goroutine in cluster.Switchover #1876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…en processPodEvent and unregisterPodSubscriber
c9b8dc7
to
3641911
Compare
👍 |
} | ||
|
||
close(c.podSubscribers[podName]) | ||
delete(c.podSubscribers, podName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
close() before delete() looks a bit safer here. (what does Goalng do to ch
when the entry that contains it is removed from c.podSubscribers ?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete only removes the channel from c.podSubscribers map. We can still safely close it afterwards. As the go routine processPodEvent
first gets the channel, then writes to it, the idea was to remove the channel from the map first then close it - saving some nanoseconds in this race condition we faced.
👍 |
1 similar comment
👍 |
fixes #1867
Switchover function has this inner go routine which is not needed. I guess, it was once introduced with the idea to have two go routines that wait for each other when sending and consuming evens on the podEvent channels. But even with a WaitGroup added later, chances for a race condition between closing the channel (consumer side) and sending events were still high as the operator did panic in most of my tests. And because of the fact that we wait anyway for the go routine to finish the need for concurrency in this part of the code looks questionable.
With the go routine removed from Switchover, chances for hitting the race condition are already a lot lower. To fully eliminate the possibility of running into it, the PR moves c.podSubscribersMu.RUnlock() after the event handling. Thus, it will always send on the channel first before unregisterPodSubscriber called from Switchover can close it. Since unbuffered channels are usually blocking when there's no receiver (Switchover might already be done), a select block with a do nothing default case has to be provided. See this go example.
The PR changes the check and timeout setting for deleting and patching Pods.
PatroniAPICheck*
config values are used because their defaults correspond to the hard coded values of 1s and 5s that we used before. ButPatroniAPICheck*
constants should only be used when calling the Patroni REST API instead.The PR also fixes WaitGroup when PostgresTeam informer is enabled. Thanks @dmvolod for the hint in #1874 .