Skip to content

Race condition in webhook certificate renewal with cert-manager self-signed issuer without a dedicated CA certificate #4019

Open
@frittentheke

Description

@frittentheke

Describe the bug
The Helm chart allows to use the cert-manager to create and manage the certificate used to serve the webhook endpoints (https://github.com/kubernetes-sigs/aws-load-balancer-controller/blob/2ff2e59711c8e749c93b46c62dd60975598115c3/helm/aws-load-balancer-controller/templates/webhook.yaml#L225C1-L250C11). To issue these certificates a selfSigned issuer is used (

).

This causes cert-manager to actually self-sign the generated certificates and not use a dedicated CA certificate. With the default cert lifetime of 60 days and a resulting renewal every 30 days the "CA" is also replaced with each renewal.

While the ALC does simply notice the cert file being updated and reloads, there also is the Kubernetes API and the mutatingwebhookconfiguration and validatingwebhookconfiguration named aws-load-balancer-webhook which get their caBundle injected by the ca-injector from cert-manager.

This process is independent from the update of the certificate and is therefore racy and causes the CA to not match for some time until both mechanisms have converged. This results in webhook invocations to fail:

  • 2025/01/15 08:26:55 http: TLS handshake error from 127.0.0.1:12345: remote error: tls: bad certificate

When NOT using cert-manager a CA certificate with 10-year lifetime is created, see

{{- $cert := genSignedCert (include "aws-load-balancer-controller.fullname" .) nil $altNames 3650 $ca -}}

The same approach can and should also be done for the cert-manager approach. See NGINX Ingress Controller helm chart for how they do exactly that but also having cert-manager issue a dedicated CA cert, see https://github.com/kubernetes/ingress-nginx/blob/8111b07adbe4ade4aba96bd52457b05fc737628f/charts/ingress-nginx/templates/admission-webhooks/cert-manager.yaml#L3-L28

Steps to reproduce

  • Use cert-manager to manage webhook certificates
  • Trigger certificate renewals while actively making requests to the webhooks (e.g. by scheduling pods)

Expected outcome
A concise description of what you expected to happen.

Environment

  • AWS Load Balancer controller version: 2.10.1
  • Kubernetes version: 1.30.x
  • Using EKS: yes

Additional Context:

Metadata

Metadata

Assignees

No one assigned

    Labels

    good first issueDenotes an issue ready for a new contributor, according to the "help wanted" guidelines.kind/bugCategorizes issue or PR as related to a bug.triage/needs-investigation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions