Skip to content

Add script for kind with kubelet server certs CSRs #34923

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

g-gaston
Copy link
Contributor

@g-gaston g-gaston commented Jun 4, 2025

This allows to have nodes obtain serving certificates signed by the CP. Enables to exercise certificate validation on any test making exec/logs/port-forward requests.

It's a fork of https://github.com/kubernetes-sigs/kind/blob/main/hack/ci/e2e-k8s.sh with:

  • Extra config for kube api server so it validates the kubelet's serving cert (kubelet-certificate-authority)
  • Extra config for the kubelet so it uses CSRs so get signed serving certs (rotate-server-certificates)
  • Auto-approve any CSR for kubelet serving certificates (implemented in approve_kubelet_serving_csrs)

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 4, 2025
@k8s-ci-robot k8s-ci-robot requested review from aojea and BenTheElder June 4, 2025 19:26
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 4, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: g-gaston
Once this PR has been reviewed and has the lgtm label, please assign aojea for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the sig/testing Categorizes an issue or PR as relevant to SIG Testing. label Jun 4, 2025
@BenTheElder
Copy link
Member

Maybe create a sub-folder for this under experiment and add yourself + anyone else working on it to OWNERS?

@aojea
Copy link
Member

aojea commented Jun 11, 2025

Maybe create a sub-folder for this under experiment and add yourself + anyone else working on it to OWNERS?

lol, I'm the last one standing here 😄

@g-gaston g-gaston force-pushed the kind-signed-kubelet-server-certs branch from b7a9c8c to 8dbd400 Compare June 12, 2025 17:26
@g-gaston
Copy link
Contributor Author

Maybe create a sub-folder for this under experiment and add yourself + anyone else working on it to OWNERS?

Done!

@enj
Copy link
Member

enj commented Jun 12, 2025

It's a fork of https://github.com/kubernetes-sigs/kind/blob/main/hack/ci/e2e-k8s.sh with:

--- /tmp/1	2025-06-12 15:08:52
+++ /tmp/2	2025-06-12 15:09:01
@@ -14,6 +14,7 @@
 # limitations under the License.
 
 # hack script for running a kind e2e
+# Enable kubelet server certificates bootstrap and CSR auto-approval
 # must be run with a kubernetes checkout in $PWD (IE from the checkout)
 # Usage: SKIP="ginkgo skip regex" FOCUS="ginkgo focus regex" kind-e2e.sh
 
@@ -105,7 +106,8 @@
   CLUSTER_LOG_FORMAT=${CLUSTER_LOG_FORMAT:-}
   scheduler_extra_args="      \"v\": \"${KIND_CLUSTER_LOG_LEVEL}\""
   controllerManager_extra_args="      \"v\": \"${KIND_CLUSTER_LOG_LEVEL}\""
-  apiServer_extra_args="      \"v\": \"${KIND_CLUSTER_LOG_LEVEL}\""
+  apiServer_extra_args="      \"v\": \"${KIND_CLUSTER_LOG_LEVEL}\"
+      \"kubelet-certificate-authority\": \"/etc/kubernetes/pki/ca.crt\""
   if [ -n "$CLUSTER_LOG_FORMAT" ]; then
       check_structured_log_support "CLUSTER_LOG_FORMAT"
       scheduler_extra_args="${scheduler_extra_args}
@@ -117,7 +119,8 @@
   fi
   kubelet_extra_args="      \"v\": \"${KIND_CLUSTER_LOG_LEVEL}\"
       \"container-log-max-files\": \"10\"
-      \"container-log-max-size\": \"100Mi\""
+      \"container-log-max-size\": \"100Mi\"
+      \"rotate-server-certificates\": \"true\""
   KUBELET_LOG_FORMAT=${KUBELET_LOG_FORMAT:-$CLUSTER_LOG_FORMAT}
   if [ -n "$KUBELET_LOG_FORMAT" ]; then
       check_structured_log_support "KUBECTL_LOG_FORMAT"
@@ -214,8 +217,110 @@
   # Patch kube-proxy to set the verbosity level
   kubectl patch -n kube-system daemonset/kube-proxy \
     --type='json' -p='[{"op": "add", "path": "/spec/template/spec/containers/0/command/-", "value": "--v='"${KIND_CLUSTER_LOG_LEVEL}"'" }]'
+
+  approve_kubelet_serving_csrs
 }
 
+# Approve all the existing CSRs for kubelet serving certs
+approve_kubelet_serving_csrs() {
+  local NODE_NAMES
+  NODE_NAMES=$(kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}')
+  
+  echo "Approving kubelet server CSRs for nodes: ${NODE_NAMES}" >&2
+  
+  local approved_nodes=""
+  local approved_csrs=""
+  local max_retries=10
+  local retry_count=0
+  local sleep_interval=2
+  
+  while [ ${retry_count} -lt ${max_retries} ]; do
+    retry_count=$((retry_count + 1))
+    echo "Attempt ${retry_count}/${max_retries}: Checking for kubelet-serving CSRs..." >&2
+    
+    # Get all pending CSRs with signerName kubernetes.io/kubelet-serving
+    local csrs
+    csrs=$(kubectl get csr -o jsonpath='{range .items[?(@.spec.signerName=="kubernetes.io/kubelet-serving")]}{.metadata.name}{" "}{.spec.username}{"\n"}{end}' 2>/dev/null || true)
+    
+    if [ -z "${csrs}" ]; then
+      echo "No kubelet-serving CSRs found in this attempt" >&2
+      echo "Waiting ${sleep_interval} seconds before next attempt..." >&2
+      sleep ${sleep_interval}
+      continue
+    fi
+
+    while IFS=' ' read -r csr_name username; do
+      if [ -z "${csr_name}" ]; then
+        continue
+      fi
+      
+      # Check if this CSR was already approved
+      case " ${approved_csrs} " in
+        *" ${csr_name} "*)
+          echo "CSR ${csr_name} already approved, skipping" >&2
+          continue
+          ;;
+      esac
+      
+      echo "Checking CSR: ${csr_name} for user: ${username}" >&2
+      
+      # Check if username matches system:node:<node-name> for any of our nodes
+      for node_name in ${NODE_NAMES}; do
+        if [ "${username}" = "system:node:${node_name}" ]; then
+          echo "Approving CSR ${csr_name} for node ${node_name}" >&2
+          if kubectl certificate approve "${csr_name}" 2>/dev/null; then
+            echo "Successfully approved CSR ${csr_name}" >&2
+            # Track this CSR as approved
+            approved_csrs="${approved_csrs} ${csr_name}"
+            # Add node to approved list if not already there
+            case " ${approved_nodes} " in
+              *" ${node_name} "*)
+                # Node already in list
+                ;;
+              *)
+                approved_nodes="${approved_nodes} ${node_name}"
+                ;;
+            esac
+          else
+            echo "Failed to approve CSR ${csr_name}" >&2
+          fi
+          break
+        fi
+      done
+    done <<EOF
+${csrs}
+EOF
+    
+    # Check if all nodes have approved CSRs
+    local all_approved=true
+    for node_name in ${NODE_NAMES}; do
+      case " ${approved_nodes} " in
+        *" ${node_name} "*)
+          # Node is approved
+          ;;
+        *)
+          echo "Node ${node_name} still needs CSR approval" >&2
+          all_approved=false
+          ;;
+      esac
+    done
+    
+    if [ "${all_approved}" = "true" ]; then
+      echo "All nodes have approved kubelet-serving CSRs: ${approved_nodes}" >&2
+      return 0
+    fi
+    
+    if [ ${retry_count} -lt ${max_retries} ]; then
+      echo "Waiting ${sleep_interval} seconds before next attempt..." >&2
+      sleep ${sleep_interval}
+    fi
+  done
+  
+  echo "Warning: Not all nodes got approved CSRs after ${max_retries} attempts" >&2
+  echo "Approved nodes: ${approved_nodes}" >&2
+  return 1
+}
+
 # run e2es with ginkgo-e2e.sh
 run_tests() {
   # IPv6 clusters need some CoreDNS changes in order to work in k8s CI:
@@ -297,7 +402,7 @@
   kind version
 
   # build kubernetes
-  build
+  # build
   # in CI attempt to release some memory after building
   if [ -n "${KUBETEST_IN_DOCKER:-}" ]; then
     sync || true
@@ -307,8 +412,8 @@
   # create the cluster and run tests
   res=0
   create_cluster || res=$?
-  run_tests || res=$?
-  cleanup || res=$?
+  # run_tests || res=$?
+  # cleanup || res=$?
   exit $res
 }

@enj
Copy link
Member

enj commented Jun 12, 2025

@g-gaston did you mean to comment out build, run_tests, and cleanup?

This allows to have nodes obtain serving certificates signed by the CP.
Enables to exercise certificate validation on any test making
exec/logs/port-forward requests.
@g-gaston g-gaston force-pushed the kind-signed-kubelet-server-certs branch from 8dbd400 to 785edc5 Compare June 12, 2025 19:27
@enj
Copy link
Member

enj commented Jun 12, 2025

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 12, 2025
@k8s-ci-robot
Copy link
Contributor

@g-gaston: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-test-infra-unit-test-race-detector-nonblocking 785edc5 link false /test pull-test-infra-unit-test-race-detector-nonblocking
pull-test-infra-unit-test 785edc5 link true /test pull-test-infra-unit-test
pull-test-infra-verify-lint 785edc5 link true /test pull-test-infra-verify-lint

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants