Add lock to watcher hash map to prevent concurrent access panics #161
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I rolled out Wave 0.7 to a cluster with about 1k deployments. Migation went smooth. However, it crashed due to a concurrent access to the new watcher hashmap. Apparently, watchers are separate goroutines and run concurrently:
I fixed that by adding mutexes. Obviously, access to this hashmap can be optimized further.
However, even with this crash it performs well. Restarts are fast enough and the other controller will take over within a few seconds. Even though it crashes occasionally the overall CPU load is lower than before:
Same appears to be the case for memory usage:
Upgrade happened around 21:00 which explains the small CPU spike during that time.