Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

acme: don't store outdated certificates #8647

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

xaurx
Copy link

@xaurx xaurx commented Dec 20, 2021

What does this PR do?

Traefik keeps and tries to update outdated certificates forever even if they are no longer in use or/and can't be renewed anymore. This PR removes outdated certificates from the acme store to avoid polluting it.

Motivation

When one provisions white label app and manages domains on his behalf he inevitable removes some domains from time to time. But traefik keeps outdated certificates and moreover tries to update them with no reason.

@ldez ldez self-requested a review December 21, 2021 07:23
@rtribotte rtribotte added this to To review in v2 via automation Dec 21, 2021
@kevinpollet kevinpollet linked an issue Aug 8, 2022 that may be closed by this pull request
@ddtmachado ddtmachado self-assigned this Aug 25, 2022
@nmengin nmengin removed this from To review in v2 Sep 2, 2022
@ldez ldez assigned ldez and unassigned ddtmachado Sep 10, 2022
@psavva

This comment was marked as off-topic.

@Blackstareye
Copy link

Is there an update, regarding this issue ?
I think it would help many sys admins, if the certs would be removed automatically :)

@ldez
Copy link
Contributor

ldez commented Dec 2, 2022

This PR is very interesting but from the point of the design, we will try to create something with a more extensive scope (revocation, ...).
We will work on that soon, stay tuned.

As workaround you can use traefik-certs-cleaner.

@Blackstareye
Copy link

Blackstareye commented Dec 2, 2022

Thanks, for the quick reply! :)

Okay, so what is your recommended 'work around' for this problem, regarding this issue ?
something discussed in this issue #3376 or something else?

I have this problem right now in the company I work, and it would be nice, having a solution, that includes the larger scope of this project, you mentioned.

@nmengin nmengin added the kind/enhancement a new or improved feature. label Jan 11, 2023
@flopana
Copy link

flopana commented Jun 12, 2023

It has been 1 1/2 years. It would be really nice if Traefik would remove outdated certificates or not try to renew them. It's not remotely practical to manually remove unused certificates and restart Traefik every time a domain is no longer in use.

For example:
A client configures their own domain in my application. Traefik receives this configuration and obtains a certificate from LetsEncrypt.
This client moves on and stops using my service, changes the DNS records or lets the domain expire. This will cause unnecessary errors when Traefik tries to renew the certificate and will mask any real domain problems I might have by writing error logs every day.

@msnisha
Copy link

msnisha commented Apr 26, 2024

Any idea on when this issue going to be resolved?

@nmengin nmengin self-assigned this Apr 29, 2024
@emilevauge
Copy link
Member

emilevauge commented Apr 29, 2024

Hey here, on behalf of the maintainer team, I would like to apologize for this radio silence. This is not great. This PR was put on hold because we wanted to provide a more extensive solution, but at the end of the day, we never found the time to finalize this. If we had to do it again, we would instead move quickly with this PR, as a first step.
We will review it in the next few weeks and move forward as fast as possible.

@emilevauge emilevauge requested review from nmengin and removed request for ldez April 29, 2024 12:29
@ldez ldez self-requested a review April 29, 2024 13:26
@nmengin
Copy link
Contributor

nmengin commented Jun 4, 2024

Hello there,

First and foremost, please apologize again for the long time with no action on this PR. As Emile explained, we had planned to improve the TLS certificate management globally, so we haven’t moved forward on this PR.

Such a modification hasn’t been brought yet and it’s now time to review it.

Let’s talk about the PR 🙂

I speak on my behalf, and even if the use case makes sense, I don’t think this PR is the best way to tackle this topic.

My main concern is the condition checked to delete the certificates from the renewal list: this PR proposes to delete the certificates that are outdated for more than 7 days.

IMO, using the expiry date is not the right approach: if you delete a certificate that is used by a router, the situation will be worse: instead of failing during the certificate renewal once a day, Traefik will try to create this certificate during each dynamic configuration reload.

In my PoV, it would make more sense to check if a certificate is used by at least one router before renewing it. Thus Traefik could delete the unnecessary certificates, and serving an outdated certificate would indicate an error during the challenge resolution.

For this reason, I propose the following actions:

  • Closing this PR (it is pretty outdated and modifying it would be complicated now)
  • Opening a proposal describing another approach based on the router's domain check.

Before moving forward on it, I’d like to have your feedback. I may miss something.

Many thanks for considering my request.

@flopana
Copy link

flopana commented Jun 4, 2024

In my PoV, it would make more sense to check if a certificate is used by at least one router before renewing it. Thus Traefik could delete the unnecessary certificates, and serving an outdated certificate would indicate an error during the challenge resolution.

Hello @nmengin

This was also my first intuition to solving the problem. Just don't renew certificates that are no longer used by a router.
But I had a discussion with @ldez about this via email.
The concern he raised as to why renewing unused certificates is necessary is as follows.

If you have thousands of certificates that are used on and off, the problem is that when you add routers that use these certificates back in, Traefik may need to renew more certificates than the limits of Let's Encrypt allow.

However, if Traefik renews these certificates gradually over a period of time, you can stay under the limits.
This means that removing unused certificates would break the functionality of Traefik for some customers and it would take weeks or even months to get the certificates back.

The solution that we both agreed would work for everyone is to introduce a time limit (or failed attempts, which is basically the same thing).

Let's Encrypt certificates are valid for 90 days.
After 60 days, Traefik will try to renew them once a day. If the renewal fails, Traefik will try again the next day.
If you could configure a limit to the number of days (or attempts) that traefik will try to renew before removing the certificates, we wouldn't break traefik for some customers and outdated certificates would actually be removed.

Quote from ldez:

Example:

  • Expiration date: 2024/06/30
  • First attempt at renewal: 2024/05/31
  • Time limit: 2 days

-> The certificate will be removed the 2024/06/02.

ldez please correct me if I'm wrong, I wrote this partly from memory and didn't read our whole conversation again.

@pfaffman
Copy link

pfaffman commented Jun 4, 2024

That's an interesting point that people might need certs for domains they aren't using.

The opposite problem that can occur is if you are renewing certs for a former customer who no longer points their DNS to your traefik server then renewing that certificate will fail, so you really want not to try to renew that cert that isn't routed.

@flopana
Copy link

flopana commented Jun 4, 2024

The opposite problem that can occur is if you are renewing certs for a former customer who no longer points their DNS to your traefik server then renewing that certificate will fail, so you really want not to try to renew that cert that isn't routed.

Yes this is exactly the scenario I am facing in my company and was the reason why I spoke to ldez. The solution I mentioned earlier would still produce failed renewals but these would stop after n days. I personally would be ok with this.

The only solution I've come up with that does not break Traefik in some cases and still does not spam my logs with "failed" renewals is implementing a different storage option for certificates other than a json file.

If the certificates could be stored in for example HashiCorp Vault I could just remove the certificate from the store programmatically after my customer has deleted their domain from my service.

I know I could also edit the json file programmatically and restart traefik, but I don't want to do that every time a customer decides to delete a domain on my service.

If anyone has any further input on this, I would love to hear your thoughts.

@ldez ldez mentioned this pull request Jun 4, 2024
2 tasks
@ldez
Copy link
Contributor

ldez commented Jun 4, 2024

This PR #10782 should resolve the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do not attempt to renew certificate no longer used