Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect TLS cert changes automatically #3083

Open
spawnia opened this issue Mar 27, 2018 · 16 comments
Open

Detect TLS cert changes automatically #3083

spawnia opened this issue Mar 27, 2018 · 16 comments
Labels
area/tls contributor/wanted Participation from an external contributor is highly requested kind/enhancement a new or improved feature. priority/P3 maybe

Comments

@spawnia
Copy link

spawnia commented Mar 27, 2018

Do you want to request a feature or report a bug?

This could be seen as both, dynamically updating certificates on change seems like it would surely make Traefik better.

What did you do?

I am dynamically providing certificates, which i generate by myself and put onto the server. Those should be hot-reloaded on change.

What did you expect to see?

When switching out a certificate on the disk, i would expect traefik to pick up on this change and start delivering the new certificate.

What did you see instead?

The new certificate only got applied after either restarting traefik or making a change in the dynamic config file, which causes traefik to update.

Output of traefik version: (What version of Traefik are you using?)

Version:      v1.5.4
Codename:     cancoillotte
Go version:   go1.9.4
Built:        2018-03-15_01:35:21PM
OS/Arch:      linux/amd64

What is your environment & configuration (arguments, toml, provider, platform, ...)?

CLI

docker run
  -v=/var/run/docker.sock:/var/run/docker.sock
  -v=/home/core/traefik-config:/traefik-config
  -p=80:80
  -p=443:443
  -p=8080:8080
  traefik
  --web
  --docker
  --docker.domain=XXX
  --docker.exposedbydefault=false
  --file.directory=traefik-config
  --file.watch=true
  --defaultEntryPoints='http,https'
  --entryPoints='Name:http Address::80'
  --entryPoints='Name:https Address::443 TLS'

Contents of the mounted folder:

core@localhost ~ $ ls -al traefik-config/
total 40
drwxrwxr-x. 2  100 users 4096 Mar 27 07:43 .
drwxr-xr-x. 4 core core  4096 Mar 23 08:59 ..
-rw-rw-r--. 1  100 users 1822 Mar 27 07:52 mll.crt
-rw-rw-r--. 1  100 users 1704 Mar 27 07:52 mll.key
-rw-rw-r--. 1  100 users 1407 Mar 27 07:52 rules.toml

Contents of rules.toml

<backends/frontends definitions>
...
[[tls]]
  entryPoints = ["https"]
  [tls.certificate]
    certFile = "traefik-config/mll.crt"
    keyFile = "traefik-config/mll.key"

If applicable, please paste the log output at DEBUG level (--logLevel=DEBUG switch)

Nothing happens when changing the certificate files, the changes do get picked up though when i make a change in rules.toml

@nmengin
Copy link
Contributor

nmengin commented Mar 27, 2018

Hello @spawnia .

If I understand correctly your use case, I presume it's not a bug but the expected behavior.
Indeed, Træfik is only looking for modifications in all the TOML files contained infile.directory.
That's why modifications in the other files are not automatically detected.

Do you want to propose a feature to listen to the certificate files modifications?

@spawnia
Copy link
Author

spawnia commented Mar 27, 2018

Surely that would be a great feature to have. I can not think of a use case where you update the certificate files but would like to hold off on publishing those changes.

Reflecting the change immediately makes sense and goes nicely with traefik's killer feature, the way it can dynamically update its configuration.

@dtomcej
Copy link
Contributor

dtomcej commented Mar 27, 2018

One concern here is that if you update the certificate using a new key, the keypair will be mismatched for a period of time, which will cause a TLS failure. I'm wondering if there should be a delay after a triggered update to prevent this.

@spawnia
Copy link
Author

spawnia commented Mar 27, 2018

Good point, there are a few ways to ensure no such mismatch happens. The common pattern would be that this mechanism should ensure that key updates are atomic.

Adding a delay has the advantage of being relatively simple and should cause little to no performance hit for the time inbetween, where only one part of the keypair has been changed.

Another way could be this flow:

  1. Key or Cert file is changed
  2. Go into a loop, periodically checking if the keypair matches
  3. Once it matches, apply the update
  4. After a certain period, timeout and stop looping and allow the failure to happen

This would make it so that a successful update could be reflected right away.

Both approaches have the disadvantage that if the user is unaware of this mechanism, a problematic key change will look like it worked fine when checking the page right after. Because of this, the delay/timeout should be kept relatively short.

@nmengin
Copy link
Contributor

nmengin commented Mar 29, 2018

Hello @spawnia,

Many thanks for your propositions.
When the dynamic TLS configuration was implemented, we decided to only listen to the configuration file for the reason explained by @dtomcej.
This mechanism allows users to change the certificates files and sending the information to Træfik with a simple touch command on the TOML file.

The solution you described is more complicated and, even if it can work, it contains disadvantages and can generate misunderstoods for users, as you noticed.

I guess none of the solutions (the current, yours) allows addressing entirely the problem and this one needs more investigations.

WDYT?

@spawnia
Copy link
Author

spawnia commented Mar 30, 2018

I think that what needs to be figured out is how to make this the most convenient and clear for users.

Watching for TLS updates could be a configuration option, something like this:

[[entryPoints.https.tls.certificates]]
      certFile = "tests/traefik.crt"
      keyFile = "tests/traefik.key"
      watch = true

Choosing the correct default seems difficult. Having traefik pick up the changes automatically might lead to a nice experience and fits with traefiks dynamic configuration approach - in most cases this should not cause problems and feel like it just works™.

The underlying challenge here is finding the right balance between convenience and magic. Anyways, users should be made aware of the underlying limitations of this option in a short paragraph.

For some use cases, manually refreshing seems like the most suitable approach. Calling touch on the configuration is a nice way to do this, however doing so is not immediately obvious. Do you think it might make sense to add a new command to traefik which refreshes the current configuration? If not, this could be put in the docs under the TLS section.

@ffilippopoulos
Copy link
Contributor

ffilippopoulos commented Oct 4, 2018

@nmengin Updating certificates, putting them in place and simply calling a touch on the toml file doesn't seem to be enough. Doing so makes traefik log:

time="2018-10-04T11:34:14Z" level=info msg="Skipping same configuration for provider file"

Looks like there is a need for an actual change in the definitions inside the toml file to trigger a reload, which is quite hard/disturbing to automate.
I ended up killing traefik pods as well to successfully pick up new certs.
Am I missing something here?

Edit: looks like a known bug #3272

@miro-grapeup
Copy link

miro-grapeup commented Jun 22, 2019

Hi, I have faced similar issue when trying to use traefik with dynamic tls configuration. But instead it ends up with traefik generated certs. Here is my configuration (using traefik version 1.7.12):

configmap:
apiVersion: v1
kind: ConfigMap
metadata:
  name: traefik-configmap
  namespace: traefik-ingress
data:
  traefik.toml: |
    defaultEntryPoints = ["http","https"]
    insecureSkipVerify = true

    [entryPoints]
      [entryPoints.http]
      address = ":80"

      [entryPoints.https]
      address = ":443"
        [entryPoints.https.tls]

      [entryPoints.traefik]
        address = ":8080"

    [kubernetes]
      [kubernetes.ingressEndpoint]
        publishedService = "traefik/traefik"

    [ping]
    entryPoint = "http"

    [api]
    entryPoint = "traefik"

    [file]

    [[tls]]
      entryPoints = ["https"]
      [tls.certificate]
        certFile = "/ssl/tls.crt"
        keyFile = "/ssl/tls.key"
and traefik itself as DaemonSet:
kind: DaemonSet
apiVersion: extensions/v1beta1
metadata:
  name: traefik-ingress-controller
  namespace: traefik-ingress
  labels:
    k8s-app: traefik-ingress-lb
spec:
  template:
    metadata:
      labels:
        k8s-app: traefik-ingress-lb
        name: traefik-ingress-lb
    spec:
      updateStrategy:
        type: RollingUpdate
      serviceAccountName: traefik-ingress-controller
      terminationGracePeriodSeconds: 60
      volumes:
      - name: traefik-tls-cert
        secret:
          secretName: traefik-tls-cert
      - name: traefik-configmap
        configMap:
          name: traefik-configmap
      containers:
      - image: traefik
        name: traefik-ingress-lb
        volumeMounts:
          - mountPath: "/ssl"
            name: "traefik-tls-cert"
          - mountPath: "/config"
            name: "traefik-configmap"
        ports:
        - name: http
          containerPort: 80
          hostPort: 80
        - name: https
          containerPort: 443
          hostPort: 443
        - name: admin
          containerPort: 8080
          hostPort: 8080
        securityContext:
          capabilities:
            drop:
            - ALL
            add:
            - NET_BIND_SERVICE
        args:
        - --logLevel=INFO
        - --configFile=/config/traefik.toml

My cert is stored in k8s secret and as you can see it is attached to traefik pods. And in case it is changed traefik does not update. Seems like fix #3272 and #4022 does not fully work.

@dduportal
Copy link
Contributor

Hi @miro-grapeup , thanks for your interest in the project.

It looks like a question for the community support channel.
Could you join us at https://slacl.traefik.io and give us the reference of your issue + reproduction case, so the community will help you?

The reason is that we keep the issue trackers for issues, and this one is already "triaged" (ref. https://docs.traefik.io/v2.0/contributing/maintainers/#labels).

@Richard87
Copy link

Richard87 commented Jul 3, 2019

I'm just jumping in to comment :)

(I have tried Traefik 1.7.12, witch watches my ingresses, and CertManager that creates/recreates missing/invalid certificates).

I'm not using Traefik's ACME client.

When any ingress or secret is updated, Traefik updates it's Certificate store immidiatly:
https://github.com/containous/traefik/blob/06df6017dfc4464b81106e22bd7fcc61de5c3786/pkg/tls/tlsmanager.go#L67

But the Server pick any certificate with the correct domain name, and caches it for 1 hour:
https://github.com/containous/traefik/blob/f1b085fa364f3d3184bccd294974db408e77cbf3/pkg/tls/certificate_store.go#L27

In other words, let it pass at least 1 hour before checking if Traefik have refreshed the certificate (also, Traefiks global default certificate is not catched, but the cert-managers temporary cert will be cached for 1 hour by Traefik.
https://docs.cert-manager.io/en/latest/tasks/issuing-certificates/#temporary-certificates-whilst-issuing

@sainipankaj90k
Copy link

sainipankaj90k commented Aug 20, 2020

I agree with @spawnia.
A simple touch is equal to manual intervention required. That defeats the purpose of automation.

I would love it if traefik could watch for changes in volume for the certs it is consuming. There are already very good suggestions on how we can avoid the misconfiguration/wrong key-cert pair.

This feature is very much required.

@rfgamaral

This comment has been minimized.

@muru
Copy link

muru commented Nov 18, 2020

@Richard87 's comment led me to the right track. For context, we're using Traefik 1.7 in Kubernetes with cert-manager to create the certificates. Earlier, the certificate was mounted in the traefik pod, and we used [entryPoints.https.tls.certificates] to set static certificates pointing to the mounted certificate. However, following the advice in Dynamic certificates, I tried something similar to the corresponding example in the file provider:

[file]
...
# HTTPS certificates
[[tls]]
  entryPoints = ["https"]
  [tls.certificate]
    certFile = "path/to/my.cert"
    keyFile = "path/to/my.key"

The expectation was that changes to path/to/my.cert would be picked up - which of course it wasn't.

The correct solution was what @Richard87 mentioned - use the tls configuration of an Kubernetes Ingress to specify the secret (which is updated by cert-manager). After about an hour after doing that, the updated certificate started being used.

Oddly, the logs indicate that the certificates were skipped:

time="2020-11-18T09:35:40Z" level=warning msg="Skipping addition of certificate for domain(s) <...>, to EntryPoint https, as it already exists for this Entrypoint."
time="2020-11-18T09:35:40Z" level=warning msg="Skipping addition of certificate for domain(s) <...>, to EntryPoint http, as it already exists for this Entrypoint."
time="2020-11-18T09:35:40Z" level=info msg="Server configuration reloaded on :80"
time="2020-11-18T09:35:40Z" level=info msg="Server configuration reloaded on :443"
time="2020-11-18T09:35:40Z" level=info msg="Server configuration reloaded on :8080"

But the updated certificate was indeed used:

% curl -svI https://<my-site> |& grep start
*  start date: Nov 18 08:35:40 2020 GMT

(Note that the start date is intentionally 1 hour before the issue date for Let's Encrypt certificates, so the timestamps do match)

@Richard87
Copy link

I wonder if this check could be changed to 1 or 5 minutes instead if every 60 minutes without any notable perfomance change?

@nmengin
Copy link
Contributor

nmengin commented Nov 28, 2023

Hello,

I link this issue to the PR #9993 which is IMO a good enough workaround.
Indeed, sending a signal once the TLS certificates are updated allows Traefik to refresh the configuration.

@nmengin nmengin removed their assignment Nov 28, 2023
@rtribotte rtribotte self-assigned this Nov 30, 2023
@rtribotte
Copy link
Member

Hello,

Just to add precision to what has been said previously by @nmengin in his last comment.
This issue is linked to the PR #9993, because, even if sending a signal doesn't solve the issue, it looks like a good enough workaround.

If any contributor finds a better solution, we would love community support to address it.
Let us know, and we will work with you to make sure you have all the information needed so that it can be merged.

@rtribotte rtribotte added the contributor/wanted Participation from an external contributor is highly requested label Nov 30, 2023
@rtribotte rtribotte removed their assignment Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tls contributor/wanted Participation from an external contributor is highly requested kind/enhancement a new or improved feature. priority/P3 maybe
Projects
None yet
Development

No branches or pull requests