Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backports for 0.13.3 #4570

Merged
merged 5 commits into from
Nov 22, 2021
Merged

Conversation

We don't expect to kexec on these actions, so no need to run kexec which
might fail blocking those two actions.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
(cherry picked from commit 030fd34)
Fixes siderolabs#4407 fixes siderolabs#4489

This PR started by enabling simple restart of the `kubelet` service via
services API, but it turned out there's a problem:

When kubelet restarts, CNI is already up, so there's an interface on the
host with CNI node IP, the code which picks kubelet node IP finds it and
tries to add it to the list of kubelet node IPs which completely breaks
kubelet.

Solution was easy: allow node IPs to be filtered out - e.g. we never
want kubelet node IP to be from the pod CIDR.

But this filtering feature is also useful in other cases, so I added
that as well.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
(cherry picked from commit a76f6d6)
As now Talos picks up node IPs by default, we need to make sure kubelet
never picks up the VIP as the node IP.

This issue doesn't show up with HA control plane nodes, as Talos
releases VIP when kubelet restarts, but with single-node control plane
nodes VIP stays on the node (as there's nowhere to  move it to), so we
need to filter the VIP out.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
(cherry picked from commit 2d11b59)
Fixes siderolabs#4557

When running `reset` for a node which was already deleted from
Kubernetes, we should ignore failure to cordon and proceed with other
actions.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
(cherry picked from commit c6a67b8)
Fixes siderolabs#4653

This fixes an issue when `etcd` get stuck in `Pre` state so that even
`bootstrap` request doesn't kill it. This looks like a cluster which
fails to bootstrap itself and get stuck on reboot.

Even though `Watch()` aborts on cancel, channel receive will block
forever as no events are delivered.

Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com>
(cherry picked from commit 58892cd)
@smira smira added this to the v0.13 milestone Nov 22, 2021
@smira
Copy link
Member Author

smira commented Nov 22, 2021

/approve

@smira
Copy link
Member Author

smira commented Nov 22, 2021

/m --ff

@talos-bot talos-bot merged commit f375ba1 into siderolabs:release-0.13 Nov 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants