Skip to content

cert renewal can't succeed if account key has been recreated #18251

@irbekrm

Description

@irbekrm

Since #15595 that added the ARI 'replaces' extension field to cert renewal, orders will fail if the ACME account key with which the renewal order is created does not match that with which the old cert was being issued, see check in Boulder. Note that this check is only run for order with the 'replaces' extension.

Our issuance logic just creates ACME accounts if needed, so it is not impossible to end up in a situation where account key got deleted, but not a cert and when renewal time comes, we just create a new account key, attempt to renew the cert with a 'replaces' order and fail. Currently the only fix would be to delete the old cert, which means downtime.

Users can hit this with Kubernetes operator HA Ingress where account keys are stored in a Secret associated with the hosts (ProxyGroup Pods), but certs for each Ingress get stored in a separate Secret whose lifecycle is tied to the Ingress- if they delete and recreate the ProxyGroup but not Ingress resources, the account key will be recreated, but not the certs (the reasoning is that we don't want to unneccessary delete certs whose issuance is limited, but also we don't want to leave old Secrets lying around).
However, currently this flow leaves users with broken renewal further down the line that they cannot recover from without deleting the certs.

What we could do:

  • figure out if it is safe to add replaces extension to order in core cert client.
    I think this can't be done though - afaik we don't have a way to verify if a cert and account key we have are a match. We could retry without the field on errors/error string matches, but that seems too hacky.

  • add some config knob that allows us to opt out of the replaces in the operator. This would be the easiest to do

  • We could also put ProxyGroup account keys in a separate Secret that we never delete, not even when the ProxyGroup has been deleted. That would have to be one Secret per operator because we cannot assume that users will recreate ProxyGroup with the same name etc. We could do this as well - but it's probably not good enough UX- for example, if users explicitly deleted the account key Secret while debugging something, we could start failing renewals monts after with the only way to recover being to delete certs

  • opt out of replaces extension altogether - users could have deleted the account key for whatever reasons and failing renewals possibly months after that is not a good UX

As a side note, it could be nice if as part of ARI API there was also some method that helps in cases like ours, perhaps allows to check if a cert is for an account or something.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions