Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACME HTTP-01 challenge fails by timeout #2763

Closed
deargonaut opened this issue Jan 25, 2018 · 50 comments
Closed

ACME HTTP-01 challenge fails by timeout #2763

deargonaut opened this issue Jan 25, 2018 · 50 comments
Assignees
Labels
area/acme kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. resolution/declined status/5-frozen-due-to-age

Comments

@deargonaut
Copy link

deargonaut commented Jan 25, 2018

Do you want to request a feature or report a bug?

Bug

What did you do?

I am trying to fetch automatic certificates from Let's Encrypt with HTTP-01.

What did you expect to see?

Fetching certificates like before TLS-SNI problems.

What did you see instead?

No new certificates.

Possible problems / fixes

It looks like it has something to do with adding the http route to each domain (domain.com/.well-known/acme-challenge/[token]). When visiting the same route over https I receive an 404 directly. But via http timeouts.

https://github.com/containous/traefik/blob/5140bbe99a79b45f98c27fbb8e9b6833194af4cb/acme/challenge_http_provider.go#L22

Via Slack someone (maverick) tried my same configuration but with a consul backend. Maybe it has something to do with that?

When checking de debug logs it seems it "CleansUp" token for that domain before hitting the timeout. Maybe it has something to do with that?

Output of traefik version: (What version of Traefik are you using?)

Traefik version v1.5.0 built on 2018-01-23_04:42:32PM

What is your environment & configuration (arguments, toml, provider, platform, ...)?

defaultEntryPoints = ["http", "https"]
debug = true
logLevel = "DEBUG"

[entryPoints]
  [entryPoints.http]
  address = ":80"
#    [entryPoints.http.redirect]
#    entryPoint = "https"
  compress = true
  [entryPoints.https]
    address = ":443"
    compress = true
    [entryPoints.https.tls]

[acme]
  email = "email@address.com"
  caServer = "https://acme-staging.api.letsencrypt.org/directory"
  # Tried it on production as well
  storage = "/etc/traefik/acme/acme.json"
  entryPoint = "https"
  OnHostRule = true
  acmeLogging = true
  [acme.httpChallenge]
    entryPoint = "http"

# Enable Docker configuration backend
[docker]
  endpoint = "unix:///var/run/docker.sock"
  domain = "sandbox.domain.com"
  watch = true
  swarmmode = true
  exposedbydefault = true

[api]
  entryPoint = "traefik"
  dashboard = true
  address = ":8080"

  [api.statistics]
    recentErrors = 10

docker-compose.yml

version: '3'
services:
  nginx:
    image: nginx:1.13
    volumes:
      - "../workspace:/srv"
      - "./nginx/default.conf:/etc/nginx/conf.d/default.conf"
    deploy:
      labels:
        - "traefik.backend=rest-api"
        - "traefik.port=80"
        - "traefik.frontend.rule=Host:rest-api.sandbox.domain.com"
        - "traefik.docker.network=frontend"
        - "traefik.backend.loadbalancer.method=drr"
    networks:
      - frontend
      - backend

  php:
    image: php-fpm:7.1
    volumes:
      - "../workspace:/srv"
    networks:
      - backend

networks:
  backend:
    external:
      name: rest-api
  frontend:
    external:
      name: frontend

If applicable, please paste the log output in debug mode (--debug switch)

logs
time="2018-01-25T10:05:56Z" level=debug msg="LoadCertificateForDomains [rest-api.sandbox.domain.com]..." 
time="2018-01-25T10:05:56Z" level=debug msg="Looking for provided certificate to validate [rest-api.sandbox.domain.com]..." 
time="2018-01-25T10:05:56Z" level=debug msg="No provided certificate found for domains [rest-api.sandbox.domain.com], get ACME certificate." 
time="2018-01-25T10:05:56Z" level=debug msg="Loading ACME certificates [rest-api.sandbox.domain.com]..." 
legolog: 2018/01/25 10:05:56 [INFO][rest-api.sandbox.domain.com] acme: Obtaining bundled SAN certificate
legolog: 2018/01/25 10:05:56 [INFO][rest-api.sandbox.domain.com] AuthURL: https://acme-staging.api.letsencrypt.org/acme/authz/w3M__oDqozE[...]T_SPCiF7p5CYLFI
legolog: 2018/01/25 10:05:56 [INFO][rest-api.sandbox.domain.com] acme: Could not find solver for: dns-01
legolog: 2018/01/25 10:05:56 [INFO][rest-api.sandbox.domain.com] acme: Trying to solve HTTP-01
time="2018-01-25T10:05:56Z" level=debug msg="Challenge Present rest-api.sandbox.domain.com" 
time="2018-01-25T10:06:07Z" level=debug msg="Challenge CleanUp rest-api.sandbox.domain.com" 
time="2018-01-25T10:06:07Z" level=error msg="map[rest-api.sandbox.domain.com:acme: Error 400 - urn:acme:error:connection - Fetching http://rest-api.sandbox.domain.com/.well-known/acme-challenge/GECQ9JRWb4pA[...]Bc3rmeveJd611YowU: Timeout
Error Detail:
	Validation for rest-api.sandbox.domain.com:80
	Resolved to:
		***.***.***.***
		***:*:*:*::*
	Used: ***:*:*:*::*

]" 
time="2018-01-25T10:06:07Z" level=error msg="Error getting ACME certificates [rest-api.sandbox.domain.com] : cannot obtain certificates map[rest-api.sandbox.domain.com:acme: Error 400 - urn:acme:error:connection - Fetching http://rest-api.sandbox.domain.com/.well-known/acme-challenge/GECQ9JRWb4pA0OlC[...]eJd611YowU: Timeout
Error Detail:
	Validation for rest-api.sandbox.domain.com:80
	Resolved to:
		***.***.***.***
		***:*:*:*::*
	Used: ***:*:*:*::*

]" 
time="2018-01-25T10:06:07Z" level=debug msg="LoadCertificateForDomains []..." 
legolog: 2018/01/25 10:06:07 [INFO][exceptions.sandbox.domain.com] acme: Obtaining bundled SAN certificate
time="2018-01-25T10:06:07Z" level=debug msg="LoadCertificateForDomains [exceptions.sandbox.domain.com]..." 
time="2018-01-25T10:06:07Z" level=debug msg="Looking for provided certificate to validate [exceptions.sandbox.domain.com]..." 
time="2018-01-25T10:06:07Z" level=debug msg="No provided certificate found for domains [exceptions.sandbox.domain.com], get ACME certificate." 
time="2018-01-25T10:06:07Z" level=debug msg="Loading ACME certificates [exceptions.sandbox.domain.com]..." 
legolog: 2018/01/25 10:06:07 [INFO][exceptions.sandbox.domain.com] AuthURL: https://acme-staging.api.letsencrypt.org/acme/authz/oUlowLzxA9hKGib[...]MpTqEWA4ksu345xc
legolog: 2018/01/25 10:06:07 [INFO][exceptions.sandbox.domain.com] acme: Could not find solver for: dns-01
legolog: 2018/01/25 10:06:07 [INFO][exceptions.sandbox.domain.com] acme: Trying to solve HTTP-01
time="2018-01-25T10:06:07Z" level=debug msg="Challenge Present exceptions.sandbox.domain.com" 
time="2018-01-25T10:06:09Z" level=debug msg="Filtering container without port and no traefik.port label traefik.1 : strconv.Atoi: parsing "": invalid syntax" 
time="2018-01-25T10:06:09Z" level=debug msg="Filtering container without port and no traefik.port label payment_php.1 : strconv.Atoi: parsing "": invalid syntax" 
time="2018-01-25T10:06:09Z" level=debug msg="Filtering container without port and no traefik.port label my_php.1 : strconv.Atoi: parsing "": invalid syntax" 
time="2018-01-25T10:06:09Z" level=debug msg="Filtering container without port and no traefik.port label webfrontend_php.1 : strconv.Atoi: parsing "": invalid syntax" 
time="2018-01-25T10:06:09Z" level=debug msg="Filtering container without port and no traefik.port label rest-api_php.1 : strconv.Atoi: parsing "": invalid syntax" 
time="2018-01-25T10:06:09Z" level=debug msg="Filtering container without port and no traefik.port label order_php.1 : strconv.Atoi: parsing "": invalid syntax" 
time="2018-01-25T10:06:09Z" level=debug msg="Filtering container without port and no traefik.port label catalog_php.1 : strconv.Atoi: parsing "": invalid syntax" 
time="2018-01-25T10:06:09Z" level=debug msg="Filtering container without port and no traefik.port label price_php.1 : strconv.Atoi: parsing "": invalid syntax" 
time="2018-01-25T10:06:09Z" level=debug msg="Filtering container without port and no traefik.port label notifications_php.1 : strconv.Atoi: parsing "": invalid syntax" 
time="2018-01-25T10:06:09Z" level=debug msg="Filtering container without port and no traefik.port label exceptions_php.1 : strconv.Atoi: parsing "": invalid syntax" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.whitelistSourceRange labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.entryPoints labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.auth.basic labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.whitelistSourceRange labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.auth.basic labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.whitelistSourceRange labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.entryPoints labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.auth.basic labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.whitelistSourceRange labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.entryPoints labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.auth.basic labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.whitelistSourceRange labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.entryPoints labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.auth.basic labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.whitelistSourceRange labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.entryPoints labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.auth.basic labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.whitelistSourceRange labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.entryPoints labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.auth.basic labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.whitelistSourceRange labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.entryPoints labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.auth.basic labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.whitelistSourceRange labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.entryPoints labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.auth.basic labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.whitelistSourceRange labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.entryPoints labels" 
time="2018-01-25T10:06:09Z" level=debug msg="Could not load traefik.frontend.auth.basic labels" 
@nmengin
Copy link
Contributor

nmengin commented Jan 25, 2018

Hello @deargonaut.
Tthanks for your interest in the project.

This kind of timeout is generated by LEGO (the Let's Encrypt GO library used by Træfik).
It happens when LE cannot access to Træfik in the way to do a HTTP challenge.

Even if the log appears after the CleanUp log, it's generated before during the challenge step as you can see in the Træfik code.

Can you check if :

  • The subdomain rest-api.sandbox.domain.com is mapped to the host where Træfik is deployed
  • The port 80 of the host where Træfik is deployed is reachable by LE in the port 80.

Thanks in advance.

@deargonaut
Copy link
Author

Hi @nmengin.
Thanks for your prompt reply.

For this setup everything is deployed on one node.
Traefik is deployed on sandbox.domain.com. And reachable by port 80. As well as rest-api.sandbox.domain.com.

It will only time-out (also in the browser) when I request the specific ACME hash, like: http://rest-api.sandbox.domain.com/.well-known/acme-challange/GECQ9JRWb4pABc3rmeveJd611YowU.
When I type an other hash it will immediately trigger a 404 on my application.

Does this give you enough information?

@cheeseweasel

This comment has been minimized.

@deargonaut

This comment has been minimized.

@nmengin
Copy link
Contributor

nmengin commented Jan 26, 2018

Hello @deargonaut .

Is it possible for you to continue the discussion with the team in our Slack.
@juliens created a thread.

I guess thanks to this more interactive way it should be easier to help you.

Thanks in advance

@christiaangoossens

This comment has been minimized.

@ldez

This comment has been minimized.

@christiaangoossens

This comment has been minimized.

@christiaangoossens

This comment has been minimized.

@ldez

This comment has been minimized.

@christiaangoossens

This comment has been minimized.

@deargonaut
Copy link
Author

While debugging with Juliens we found a fix for this error.

It seemed that while trying to reach the .well-known/acme-challange url it always wanted to go via IPv6. When we removed the IPv6-interface and cleared it from DNS it got authenticated and I received my certificates.

Issue will remain open for Julien to come up how to reproduce and maybe fix this.

@emilevauge emilevauge added kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. priority/P2 need to be fixed in the future and removed status/0-needs-triage labels Feb 1, 2018
@juliens juliens removed their assignment Feb 1, 2018
@schasse
Copy link

schasse commented Feb 7, 2018

Hi, I ran into the same issue and I am intrested in the fix which @deargonaut described. I have two questions, though.

it always wanted to go via IPv6

What is it? The Let's Encrypt client trying to reach .well-known/acme-challenge url?

we removed the IPv6-interface

From where did you remove the IPv6-interface? Did you remove it from the host?

@deargonaut
Copy link
Author

deargonaut commented Feb 7, 2018

Hi @schasse,

It refers to the acme mechanism indeed. The client used IPv6 while trying the HTTP challenge.

I removed the IPv6 interface from the host, yes. I am running instances on OpenStack and removed the net-public-ipv6 interface. Thus it released the ipv6 on the eth0 (in my case).

Does this make sense?

@schasse
Copy link

schasse commented Feb 8, 2018

Makes sense. Thanks for clarifying!

@christiaangoossens

This comment has been minimized.

@glitchroy
Copy link

glitchroy commented Feb 28, 2018

EDIT: There was a problem on my end, port 80 was blocked by another firewall. It's opened now and the certificate was requested without a problem.


Hey, I have the same problem. I'm not using docker swarm or cluster mode, so it's only one instance of traefik.
The output seems to be the same

traefik    | time="2018-02-28T11:48:07Z" level=error msg="map[test.domain.com:acme: Error 400 - urn:acme:error:connection - Fetching http://test.domain.com/.well-known/acme-challenge/gA7GL[...]lhDA: Timeout
traefik    | Error Detail:
traefik    |    Validation for test.domain.com:80
traefik    |    Resolved to:
traefik    |            XXX.XXX.XXX.XX
traefik    |    Used: XXX.XXX.XXX.XX
traefik    |
traefik    | ]"

However, no IPv6 address is being reported, so I'm guessing that's not the problem.
I don't know if I should open a seperate issue with my whole setup, because it's the same error after all. The .well-known path is not reachable per browser. I usually have an apache instance among other things on port 80 in a different docker-compose file, but it makes no difference if I put that down.

@schemen

This comment has been minimized.

@ldez

This comment has been minimized.

@schemen

This comment has been minimized.

@ldez

This comment has been minimized.

@lawrencegripper
Copy link
Contributor

I saw this issue when using Traefik on Azure ACI. Moving from the standard scratch based docker image to “1.7-alpine” tag resolved it for me. I can’t say why but may help others.

@abouquet

This comment has been minimized.

@ldez

This comment has been minimized.

@richsanram
Copy link

I had this issue with the tag 2.0-alpine (I know that is an alpha version yet), and the way I solved this was replacing /etc/resolv.conf with a custom resolv.conf file, with 'nameserver 1.1.1.1'

After this, traefik works like a charm.

@ghost
Copy link

ghost commented May 5, 2019

I have the same issue, and none of the above solved it. I don't have IPv6, ports are forwarded, still got the 400 timeout from Traefik, and 404 if I want to get the URL myself.

@mephinet
Copy link

mephinet commented May 6, 2019

I've been debugging an issue for a few days now: In a setup like https://docs.traefik.io/user-guide/examples/#onhostrule-option-and-provided-certificates-with-http-challenge where we have a default wildcard certificate and use letsentrypt for all other domains, traefik constantly used the wildcard certificate even for domains that were not matched by the wildcard certificate. The logs were repeatedly showing

level=error msg="Error getting challenge for token retrying in ...s"

I was able to solve the issue by temporarily disabling the HTTPS redirect (the [entryPoints.http.redirect] section).
Maybe someone who still has this issue can try to check whether this is in fact the root cause for the timeouts...

@ghost
Copy link

ghost commented May 6, 2019 via email

@pimjansen
Copy link

Also havin this issue but as far as i can see no ipv6 on the dns atleast.

Is there any update or workaround for this?

@jonkristian
Copy link

@mephinet I too have this issue, but on 2.1. I feel it's quite similar because we also have a default wildcard cert for main domain, and use LE for other domains and I am seeing the same errors. I've created a post in discourse about it but for now I am still at a loss.

https://community.containo.us/t/cannot-retrieve-the-acme-challenge-for-token/4391/5

Did you ever figure out what was going on?

@mephinet
Copy link

Did you ever figure out what was going on?

No, unfortunately I never figured it out and finally switched to https://kubernetes.github.io/ingress-nginx/ ...

@jonkristian
Copy link

That's too bad, thanks for replying :)

@trajano
Copy link

trajano commented Jun 5, 2020

This just happened to me today. I had a Traefik 1.7 setup for a while I just did a reboot to test something out and now its timing out.

@trajano
Copy link

trajano commented Jun 5, 2020

I had this issue with the tag 2.0-alpine (I know that is an alpha version yet), and the way I solved this was replacing /etc/resolv.conf with a custom resolv.conf file, with 'nameserver 1.1.1.1'

The /etc/resolv.conf in traefik or on the server itself?

@wbsouza
Copy link

wbsouza commented Oct 10, 2020

I have the same error occurring with the v2.3.1 on AWS ECS running with Fargate.

After the cluster creation via terraform, the HTTP proxy works fine but when we try to call the app using HTTPS the browser the error code is: SSL_ERROR_RX_RECORD_TOO_LONG (traefik is not responding with HTTPS but with HTTP).

With this code you can reproduce this error
git clone https://github.com/wbsouza/traefik-ecs

What is causing it?
The HTTPS does not work because there is a timeout from traefik when Letsencrypt try to validate the certificate.

Hipotetical domain: mycompany.com (change it by a true domain in the variables.tf file)
app_hostname = "myrealdomain.com"

First I tried to use the hostname from AWS and it seems that Letsencrypt blocks it:

time="2020-10-10T00:10:29Z" level=error msg="Unable to obtain ACME certificate for domains \"api-126145927.sa-east-1.elb.amazonaws.com\": unable to generate a certificate for the domains [api-126145927.sa-east-1.elb.amazonaws.com]: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-order :: urn:ietf:params:acme:error:rejectedIdentifier :: Error creating new order :: Cannot issue for \"api-126145927.sa-east-1.elb.amazonaws.com\": The ACME server refuses to issue a certificate for this domain name, because it is forbidden by policy, url: " providerName=le.acme routerName=whoami-secure@ecs rule="Host('api-126145927.sa-east-1.elb.amazonaws.com')"

Another try was to add one CNAME on my DNS entry to associate the AWS address to my FQDN
Initially I got one error because the DNS was still not updated:

time="2020-10-09T23:42:59Z" level=error msg="Unable to obtain ACME certificate for domains \"app.mycompany.com\": unable to generate a certificate for the domains [app.mycompany.com]: error: one or more domains had a problem:\n[app.mycompany.com] acme: error: 400 :: urn:ietf:params:acme:error:dns :: DNS problem: NXDOMAIN looking up A for app.mycompany.com - check that a DNS record exists for this domain, url: \n" rule="Host('app.mycompany.com')" providerName=le.acme routerName=whoami-secure@ecs

But later when the DNS had it updated I got another error:

time="2020-10-09T23:57:59Z" level=error msg="Unable to obtain ACME certificate for domains \"api.mycompany.com\": unable to generate a certificate for the domains [api.mycompany.com]: error: one or more domains had a problem:\n[api.mycompany.com] acme: error: 403 :: urn:ietf:params:acme:error:unauthorized :: Invalid response from http://api.mycompany.com/.well-known/acme-challenge/STzh6jw8b6p7ZiyAiZAIe9IVYzXwInsnOYCs1hw0U_I [18.229.176.157]: 404, url: \n" routerName=whoami-secure@ecs rule="Host('app.mycompany.com')"

@pascalgross
Copy link

I see the same error on multiple stacks. Obtaining LE certificates worked on others, with almost identical traefik config.

version: '3.7'

volumes:
    prometheus_data: {}
    grafana_data: {}

networks:
  monitor-network:
    driver: overlay
    name: inbound
  traefik-public:
    external: true

services:
  prometheus:
    image: prom/prometheus
    volumes:
      - ./prometheus/:/etc/prometheus/
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/usr/share/prometheus/console_libraries'
      - '--web.console.templates=/usr/share/prometheus/consoles'
    networks:
      - monitor-network
      - traefik-public
    deploy:
      placement:
        constraints:
          - node.role==manager
      labels:
          - traefik.enable=true
          - traefik.docker.network=traefik-public
          - traefik.constraint-label=traefik-public
          - traefik.http.routers.prometheus-http.rule=Host(`prometheus.mycompany.com`)
          - traefik.http.routers.prometheus-http.entrypoints=web
          - traefik.http.routers.prometheus-http.middlewares=redirecttls
#          - traefik.http.routers.prometheus-http.middlewares=auth
          - traefik.http.routers.prometheus-https.rule=Host(`prometheus.mycompany.com`)
          - traefik.http.routers.prometheus-https.entrypoints=websecure
          - traefik.http.routers.prometheus-https.tls=true
          - traefik.http.routers.prometheus-https.tls.certresolver=letsencrypt
          - traefik.http.services.prometheus.loadbalancer.server.port=9090
          - traefik.http.routers.prometheus-https.middlewares=auth
      restart_policy:
        condition: on-failure

My traefik yaml file looks as follows:

version: '3'

services:
  reverse_proxy:
    image: traefik:v2.3.4
    command:
      # Docker swarm configuration
      - "--providers.docker.endpoint=unix:///var/run/docker.sock"
      - "--providers.docker.swarmMode=true"
      - "--providers.docker.exposedbydefault=false"
      - "--providers.docker.network=traefik-public"
      # Configure entrypoint
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"
      # SSL configuration
      - "--certificatesresolvers.letsencrypt.acme.httpchallenge=true"
      - "--certificatesresolvers.letsencrypt.acme.httpchallenge.entrypoint=web"
#      - "--certificatesresolvers.letsencrypt.acme.tlschallenge=true"
      - "--certificatesresolvers.letsencrypt.acme.email=ssl@mycompany.com"
      - "--certificatesresolvers.letsencrypt.acme.storage=/letsencrypt/acme.json"
#      - "--certificatesresolvers.letsencrypt.acme.caServer=https://acme-staging-v02.api.letsencrypt.org/directory"
      - "--api=true"
      - "--api.dashboard=true"
      - "--accesslog=true"
      - "--accesslog.filepath=/logs/access.log"
      - "--metrics.prometheus=true"
      - "--entryPoints.metrics.address=:8082"
      - "--metrics.prometheus.entryPoint=metrics"
      - "--metrics.prometheus.buckets=0.1,0.3,1.2,5.0"
      - "--pilot.token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
    ports:
      - 80:80
      - 443:443
    volumes:
      # To persist certificates
      - traefik-certificates:/letsencrypt
      - traefik-logs:/logs
      # So that Traefik can listen to the Docker events
      - /var/run/docker.sock:/var/run/docker.sock:ro
    networks:
      - traefik-public
    deploy:
      mode: global
      placement:
        constraints:
          - node.role == manager
      update_config:
        parallelism: 1
        delay: 10s
      restart_policy:
        condition: on-failure
      labels:
        - "traefik.enable=true"
        - "traefik.http.routers.traefik.service=api@internal"
        - "traefik.http.routers.traefik.rule=Host(`traefik.mycompany.com`)"
        - "traefik.http.routers.traefik.entrypoints=web"
        - "traefik.http.routers.traefik.middlewares=redirecttls"
        - "traefik.http.services.traefik.loadbalancer.server.port=80"
        - "traefik.http.middlewares.redirecttls.redirectscheme.scheme=https"
        - "traefik.http.routers.traefiktls.service=api@internal"
        - "traefik.http.routers.traefiktls.rule=Host(`traefik.mycompany.com`)"
        - "traefik.http.routers.traefiktls.entrypoints=websecure"
        - "traefik.http.routers.traefiktls.tls.certresolver=letsencrypt"
        - "traefik.http.routers.traefiktls.middlewares=auth"
        - "traefik.http.services.traefik.loadbalancer.server.port=443"
        - "traefik.http.middlewares.auth.basicauth.users=abc:$$apr1$$xyz"
volumes:
  traefik-certificates:
  traefik-logs:
networks:
  traefik-public:
    external: true

Traefik logs the following error:

time="2020-12-18T06:46:02Z" level=error msg="Unable to obtain ACME certificate for domains "prometheus.mycompany.com": unable to generate a certificate for the domains [prometheus.mycompany.com]: error: one or more domains had a problem:\n[prometheus.mycompany.com] acme: error: 400 :: urn:ietf:params:acme:error:connection :: Fetching http://prometheus.mycompany.com/.well-known/acme-challenge/W1CkrdBQ552lsSo9H9rfQhb8rxuVDlorhGx-VLbC3jY: Timeout after connect (your server may be slow or overloaded), url: \n" routerName=prometheus-https@docker rule="Host(prometheus.mycompany.com)" providerName=letsencrypt.acme

The domain prometheus.mycompany.com resolved via CNAME to an A and AAAA record.

dig prometheus.mycompany.com A prometheus.mycompany.com AAAA @8.8.8.8 +short
srv03.mycompany.com.
123.45.67.89
srv03.mycompany.com.
2aaa:f88:123:1234::2

@pascalgross
Copy link

Removing the AAAA IPv6 IP from the srv03.mycompany.com resolves the problem. How can that be?

@trajano
Copy link

trajano commented Dec 19, 2020

@pascalgross can you confirm that accessing your server from the outside using the IPV6 address works correctly? Maybe that's why it failed.

@pascalgross
Copy link

@trajano I can ping the Server using IPv6, I can ssh using IPv6, but accessing a Webserver (e.g. traefik instance) using ipv6 fails. So I guess there is a) a configuration failure b) a bug in traefik.

@trajano
Copy link

trajano commented Dec 19, 2020

But can you access the HTTP port using IPV6? (not just HTTPS). I guess curl -v http://ipv6address somehow

@CarlQLange
Copy link

I don't know if this helps anybody, but in Azure AKS, I needed to set "Outbound source network address translation" to "Outbound and inbound use the same IP. SNAT port exhaustion may occur." in the load balancer that pointed to Traefik. Otherwise I had this timeout issue.

@SmallhillCZ
Copy link

SmallhillCZ commented Dec 24, 2021

Just spent a day on this one, so to summarize for anyone with similar problem:

  • If you have AAAA DNS record (i.e. IPv6) on your domain, Let's Encrypt will always use that for ACME verification instead of IPv4
  • Docker Swarm doesn't bind published ports on IPv6 interfaces

=> Let's Encrypt will not be able to access verification code at domain.com/.well-known/acme-challenge/[token]

Solutions:

@ddtmachado ddtmachado assigned rtribotte and unassigned mmatur Mar 22, 2022
@rtribotte
Copy link
Member

Hello,

For v2 version, the challenges mechanism has been rewritten since v2.4.0 by PR #7458.
As the original author seems to have a fix, and since we think we are not affected by the bug in v2, we are closing this issue.

Feel free to reopen if you can reproduce the issue with the latest v2 version.

@rtribotte rtribotte added resolution/declined and removed priority/P2 need to be fixed in the future labels Mar 22, 2022
@traefik traefik locked and limited conversation to collaborators Apr 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/acme kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. resolution/declined status/5-frozen-due-to-age
Projects
Archived in project
Development

No branches or pull requests