New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ACME HTTP-01 challenge fails by timeout #2763
Comments
Hello @deargonaut. This kind of timeout is generated by LEGO (the Let's Encrypt GO library used by Træfik). Even if the log appears after the CleanUp log, it's generated before during the challenge step as you can see in the Træfik code. Can you check if :
Thanks in advance. |
Hi @nmengin. For this setup everything is deployed on one node. It will only time-out (also in the browser) when I request the specific ACME hash, like: http://rest-api.sandbox.domain.com/.well-known/acme-challange/GECQ9JRWb4pABc3rmeveJd611YowU. Does this give you enough information? |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Hello @deargonaut . Is it possible for you to continue the discussion with the team in our Slack. I guess thanks to this more interactive way it should be easier to help you. Thanks in advance |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
While debugging with Juliens we found a fix for this error. It seemed that while trying to reach the .well-known/acme-challange url it always wanted to go via IPv6. When we removed the IPv6-interface and cleared it from DNS it got authenticated and I received my certificates. Issue will remain open for Julien to come up how to reproduce and maybe fix this. |
Hi, I ran into the same issue and I am intrested in the fix which @deargonaut described. I have two questions, though.
What is it? The Let's Encrypt client trying to reach .well-known/acme-challenge url?
From where did you remove the IPv6-interface? Did you remove it from the host? |
Hi @schasse, It refers to the acme mechanism indeed. The client used IPv6 while trying the HTTP challenge. I removed the IPv6 interface from the host, yes. I am running instances on OpenStack and removed the net-public-ipv6 interface. Thus it released the ipv6 on the eth0 (in my case). Does this make sense? |
Makes sense. Thanks for clarifying! |
This comment has been minimized.
This comment has been minimized.
EDIT: There was a problem on my end, port 80 was blocked by another firewall. It's opened now and the certificate was requested without a problem. Hey, I have the same problem. I'm not using docker swarm or cluster mode, so it's only one instance of traefik.
However, no IPv6 address is being reported, so I'm guessing that's not the problem. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I saw this issue when using Traefik on Azure ACI. Moving from the standard scratch based docker image to “1.7-alpine” tag resolved it for me. I can’t say why but may help others. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I had this issue with the tag 2.0-alpine (I know that is an alpha version yet), and the way I solved this was replacing /etc/resolv.conf with a custom resolv.conf file, with 'nameserver 1.1.1.1' After this, traefik works like a charm. |
I have the same issue, and none of the above solved it. I don't have IPv6, ports are forwarded, still got the 400 timeout from Traefik, and 404 if I want to get the URL myself. |
I've been debugging an issue for a few days now: In a setup like https://docs.traefik.io/user-guide/examples/#onhostrule-option-and-provided-certificates-with-http-challenge where we have a default wildcard certificate and use letsentrypt for all other domains, traefik constantly used the wildcard certificate even for domains that were not matched by the wildcard certificate. The logs were repeatedly showing
I was able to solve the issue by temporarily disabling the HTTPS redirect (the [entryPoints.http.redirect] section). |
Turned out in my case it was the router. Linksys Velop (cursing the day I bought that) simply ignored my port forward on 80 so it can show it’s admin page in internal network, and custom 404 on external. Had to put my server to it’s DMZ (default forward target for all) to register. At a _different_ page on it’s ui I could add the port forward, so I think it is okay now. Worst case I need to manually put to DMZ again once certs expired.
2019. máj. 6. dátummal, 19:21 időpontban Philipp Gortan <notifications@github.com> írta:
… I've been debugging an issue for a few days now: In a setup like https://docs.traefik.io/user-guide/examples/#onhostrule-option-and-provided-certificates-with-http-challenge where we have a default wildcard certificate and use letsentrypt for all other domains, traefik constantly used the wildcard certificate even for domains that were not matched by the wildcard certificate. The logs were repeatedly showing
level=error msg="Error getting challenge for token retrying in ...s"
I was able to solve the issue by temporarily disabling the HTTPS redirect (the [entryPoints.http.redirect] section).
Maybe someone who still has this issue can try to check whether this is in fact the root cause for the timeouts...
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Also havin this issue but as far as i can see no ipv6 on the dns atleast. Is there any update or workaround for this? |
@mephinet I too have this issue, but on 2.1. I feel it's quite similar because we also have a default wildcard cert for main domain, and use LE for other domains and I am seeing the same errors. I've created a post in discourse about it but for now I am still at a loss. https://community.containo.us/t/cannot-retrieve-the-acme-challenge-for-token/4391/5 Did you ever figure out what was going on? |
No, unfortunately I never figured it out and finally switched to https://kubernetes.github.io/ingress-nginx/ ... |
That's too bad, thanks for replying :) |
This just happened to me today. I had a Traefik 1.7 setup for a while I just did a reboot to test something out and now its timing out. |
The /etc/resolv.conf in traefik or on the server itself? |
I have the same error occurring with the v2.3.1 on AWS ECS running with Fargate. After the cluster creation via terraform, the HTTP proxy works fine but when we try to call the app using HTTPS the browser the error code is: SSL_ERROR_RX_RECORD_TOO_LONG (traefik is not responding with HTTPS but with HTTP). With this code you can reproduce this error What is causing it? Hipotetical domain: First I tried to use the hostname from AWS and it seems that Letsencrypt blocks it:
Another try was to add one CNAME on my DNS entry to associate the AWS address to my FQDN
But later when the DNS had it updated I got another error:
|
I see the same error on multiple stacks. Obtaining LE certificates worked on others, with almost identical traefik config.
My traefik yaml file looks as follows:
Traefik logs the following error:
The domain prometheus.mycompany.com resolved via CNAME to an A and AAAA record.
|
Removing the AAAA IPv6 IP from the srv03.mycompany.com resolves the problem. How can that be? |
@pascalgross can you confirm that accessing your server from the outside using the IPV6 address works correctly? Maybe that's why it failed. |
@trajano I can ping the Server using IPv6, I can ssh using IPv6, but accessing a Webserver (e.g. traefik instance) using ipv6 fails. So I guess there is a) a configuration failure b) a bug in traefik. |
But can you access the HTTP port using IPV6? (not just HTTPS). I guess curl -v http://ipv6address somehow |
I don't know if this helps anybody, but in Azure AKS, I needed to set "Outbound source network address translation" to "Outbound and inbound use the same IP. SNAT port exhaustion may occur." in the load balancer that pointed to Traefik. Otherwise I had this timeout issue. |
Just spent a day on this one, so to summarize for anyone with similar problem:
=> Let's Encrypt will not be able to access verification code at Solutions:
|
Do you want to request a feature or report a bug?
Bug
What did you do?
I am trying to fetch automatic certificates from Let's Encrypt with HTTP-01.
What did you expect to see?
Fetching certificates like before TLS-SNI problems.
What did you see instead?
No new certificates.
Possible problems / fixes
It looks like it has something to do with adding the http route to each domain (domain.com/.well-known/acme-challenge/[token]). When visiting the same route over https I receive an 404 directly. But via http timeouts.
https://github.com/containous/traefik/blob/5140bbe99a79b45f98c27fbb8e9b6833194af4cb/acme/challenge_http_provider.go#L22
Via Slack someone (maverick) tried my same configuration but with a consul backend. Maybe it has something to do with that?
When checking de debug logs it seems it "CleansUp" token for that domain before hitting the timeout. Maybe it has something to do with that?
Output of
traefik version
: (What version of Traefik are you using?)What is your environment & configuration (arguments, toml, provider, platform, ...)?
docker-compose.yml
If applicable, please paste the log output in debug mode (
--debug
switch)logs
The text was updated successfully, but these errors were encountered: