backports for 0.13.3 #4570

smira · 2021-11-22T13:30:17Z

PRs backported:

This change is

We don't expect to kexec on these actions, so no need to run kexec which might fail blocking those two actions. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com> (cherry picked from commit 030fd34)

Fixes siderolabs#4407 fixes siderolabs#4489 This PR started by enabling simple restart of the `kubelet` service via services API, but it turned out there's a problem: When kubelet restarts, CNI is already up, so there's an interface on the host with CNI node IP, the code which picks kubelet node IP finds it and tries to add it to the list of kubelet node IPs which completely breaks kubelet. Solution was easy: allow node IPs to be filtered out - e.g. we never want kubelet node IP to be from the pod CIDR. But this filtering feature is also useful in other cases, so I added that as well. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com> (cherry picked from commit a76f6d6)

As now Talos picks up node IPs by default, we need to make sure kubelet never picks up the VIP as the node IP. This issue doesn't show up with HA control plane nodes, as Talos releases VIP when kubelet restarts, but with single-node control plane nodes VIP stays on the node (as there's nowhere to move it to), so we need to filter the VIP out. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com> (cherry picked from commit 2d11b59)

Fixes siderolabs#4557 When running `reset` for a node which was already deleted from Kubernetes, we should ignore failure to cordon and proceed with other actions. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com> (cherry picked from commit c6a67b8)

Fixes siderolabs#4653 This fixes an issue when `etcd` get stuck in `Pre` state so that even `bootstrap` request doesn't kill it. This looks like a cluster which fails to bootstrap itself and get stuck on reboot. Even though `Watch()` aborts on cancel, channel receive will block forever as no events are delivered. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com> (cherry picked from commit 58892cd)

smira · 2021-11-22T13:30:38Z

/approve

smira · 2021-11-22T13:54:18Z

/m --ff

smira added 5 commits November 22, 2021 16:21

fix: don't run kexec prepare on shutdown and reset

0018fbf

We don't expect to kexec on these actions, so no need to run kexec which might fail blocking those two actions. Signed-off-by: Andrey Smirnov <andrey.smirnov@talos-systems.com> (cherry picked from commit 030fd34)

smira added this to the v0.13 milestone Nov 22, 2021

talos-bot added the status/approved label Nov 22, 2021

Unix4ever approved these changes Nov 22, 2021

View reviewed changes

frezbo approved these changes Nov 22, 2021

View reviewed changes

talos-bot merged commit f375ba1 into siderolabs:release-0.13 Nov 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backports for 0.13.3 #4570

backports for 0.13.3 #4570

smira commented Nov 22, 2021 •

edited by Ulexus

Loading

smira commented Nov 22, 2021

smira commented Nov 22, 2021

backports for 0.13.3 #4570

backports for 0.13.3 #4570

Conversation

smira commented Nov 22, 2021 • edited by Ulexus Loading

smira commented Nov 22, 2021

smira commented Nov 22, 2021

smira commented Nov 22, 2021 •

edited by Ulexus

Loading