Skip to content

Conversation

FxKu
Copy link
Member

@FxKu FxKu commented May 11, 2022

This is a continuation of #1876 that already improved on the situation that operator runs into a panic during switchover. However, there's still no guarantee that Switchover closes the PodEvent channel when calling unregisterPodSubscriber while the processPodEvent routine still sends something.

The idea of this PR is to send a new PodEvent type to the processPodEvent routine and make it responsible for closing the channel. Unfortunately, the race condition remains the same, but instead of a panic we will face a deadlock. Switchover method can send the special END event which will be added to processPodEventQueue, but there might still be more events in the queue that will then be send to the channel. However, the consumer (in this case waitForPodLabel) is already finished. Now the go routine is blocked and will no proceed.

One solution could be to define a buffered channel when we do the switchover. Buffered channels do not block go routines when events sent to a channel are not consumed. However, using the correct size for a channel might be a challenge, since it's not known how long the actual switchover will take and how many events are really sent during one second. And when the channel is full we have a deadlock again.

stopCh := make(chan struct{})
ch := c.registerUnbufferedPodSubscriber(podName)
// send special end event to trigger removal of PodEvent channel by processPodEvent
defer c.ReceivePodEvent(PodEvent{PodName: types.NamespacedName(podName), EventType: PodEventEnd})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FxKu a bit confusing with the function name, as it sends event, not receives.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also stumbled over the name. But maybe I can get around sending artificial podEvents, after all, see #1891

@FxKu FxKu force-pushed the switchover-subscriber-fix-2 branch from 0b38d9b to 7951e9c Compare May 17, 2022 16:23
@FxKu FxKu closed this May 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants