Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS_REGION is no longer optional #10195

Closed
2 tasks done
hpwjnijs opened this issue Oct 26, 2023 · 14 comments
Closed
2 tasks done

AWS_REGION is no longer optional #10195

hpwjnijs opened this issue Oct 26, 2023 · 14 comments

Comments

@hpwjnijs
Copy link

Welcome!

  • Yes, I've searched similar issues on GitHub and didn't find any.
  • Yes, I've searched similar issues on the Traefik community forum and didn't find any.

What did you do?

We updated to Traefik 2.10.5 .
Renewal of certificates was not possible anymore automatically.

What did you see instead?

linnaeus_ng-traefik-1       | time="2023-10-25T11:39:42Z" level=error msg="Unable to obtain ACME certificate for domains \"bseai.dryrun.link,malesianbutterflies.dryrun.link\": unable to generate a certificate for the domains [bseai.dryrun.link malesianbutterflies.dryrun.link]: error: one or more domains had a problem:\n[bseai.dryrun.link] [bseai.dryrun.link] acme: error presenting token: route53: failed to determine hosted zone ID: operation error Route 53: ListHostedZonesByName, failed to resolve service endpoint, an AWS region is required, but was not found\n[malesianbutterflies.dryrun.link] [malesianbutterflies.dryrun.link] acme: error presenting token: route53: failed to determine hosted zone ID: operation error Route 53: ListHostedZonesByName, failed to resolve service endpoint, an AWS region is required, but was not found\n" rule="Method(`GET`, `POST`) && Host(`bseai.dryrun.link`,`malesianbutterflies.dryrun.link`) && PathPrefix(`/`)" routerName=bare@docker providerName=route53.acme ACME CA="https://acme-v02.api.letsencrypt.org/directory"

When adding the environment variable AWS_REGION, it worked again.

What version of Traefik are you using?

/etc # traefik version
Version: 2.10.5
Codename: saintmarcelin
Go version: go1.21.3
Built: 2023-10-11T13:54:02Z
OS/Arch: linux/amd64

What is your environment & configuration?

  traefik:
    image: registry.gitlab.com/naturalis/bii/linnaeus/linnaeus_ng/traefik:${IMAGE_VERSION:?}
    restart: always
    command:
      - --global.sendAnonymousUsage=false
      - --providers.docker.exposedByDefault=false
      - --providers.docker.endpoint=tcp://docker-proxy:2375
      - --providers.docker.network=docker-proxy
      - --entrypoints.web.address=:80
      - --entrypoints.web.http.redirections.entryPoint.to=websecure
      - --entrypoints.web.http.redirections.entrypoint.permanent=true
      - --entrypoints.websecure.address=:443
      - --certificatesresolvers.route53.acme.dnschallenge=true
      - --certificatesresolvers.route53.acme.dnschallenge.provider=route53
      - --certificatesresolvers.route53.acme.storage=/letsencrypt/route53.json
    environment:
      AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID:?Variable AWS_ACCESS_KEY_ID is empty}
      AWS_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY:?Variable AWS_SECRET_ACCESS_KEY is empty}

If applicable, please paste the log output in DEBUG level

No response

@mmatur
Copy link
Member

mmatur commented Oct 26, 2023

Since v1.14.0, lego is using the new aws-sdk-go-v2 introduced by the following PR

Lego has been updated to this version on the latest Traefik version.

@ldez Does the AWS_REGION is required with the new aws-sdk-go-v2?

@ldez
Copy link
Contributor

ldez commented Oct 26, 2023

AWS_REGION is an internal env var of the SDK.
The region is required but not the env var.
From my memories, the region was already a requirement with SDKv1.

The region can be provided in several ways, I don't know the exhaustivity of these ways, but I know that some files are read to get this information.

@loganmarchione
Copy link

loganmarchione commented Oct 31, 2023

I was also running into the same error

acme: error presenting token: route53: failed to determine hosted zone ID: operation error Route 53: ListHostedZonesByName, failed to resolve service endpoint, an AWS region is required, but was not found

I set AWS_REGION=us-east-1 in my docker-compose.yml file. Traefik stopped producing the error message, but now I'm getting this error.

time="2023-10-31T12:02:10-04:00" level=info msg="Renewing certificate from LE : {Main:nextcloud.internal.mydomain.com SANs:[]}" ACME CA="https://acme-v02.api.letsencrypt.org/directory" providerName=route53.acme
time="2023-10-31T12:02:10-04:00" level=debug msg="legolog: [INFO] [nextcloud.internal.mydomain.com] acme: Trying renewal with 432 hours remaining"
time="2023-10-31T12:02:10-04:00" level=debug msg="legolog: [INFO] [nextcloud.internal.mydomain.com] acme: Obtaining bundled SAN certificate"
time="2023-10-31T12:02:11-04:00" level=debug msg="legolog: [INFO] [nextcloud.internal.mydomain.com] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/279280525686"
time="2023-10-31T12:02:11-04:00" level=debug msg="legolog: [INFO] [nextcloud.internal.mydomain.com] acme: Could not find solver for: tls-alpn-01"
time="2023-10-31T12:02:11-04:00" level=debug msg="legolog: [INFO] [nextcloud.internal.mydomain.com] acme: Could not find solver for: http-01"
time="2023-10-31T12:02:11-04:00" level=debug msg="legolog: [INFO] [nextcloud.internal.mydomain.com] acme: use dns-01 solver"
time="2023-10-31T12:02:11-04:00" level=debug msg="legolog: [INFO] [nextcloud.internal.mydomain.com] acme: Preparing to solve DNS-01"
time="2023-10-31T12:02:12-04:00" level=debug msg="legolog: [INFO] Wait for route53 [timeout: 2m0s, interval: 4s]"
time="2023-10-31T12:04:15-04:00" level=debug msg="legolog: [INFO] [nextcloud.internal.mydomain.com] acme: Cleaning DNS-01 challenge"
time="2023-10-31T12:04:16-04:00" level=debug msg="legolog: [INFO] Wait for route53 [timeout: 2m0s, interval: 4s]"
time="2023-10-31T12:06:19-04:00" level=debug msg="legolog: [WARN] [nextcloud.internal.mydomain.com] acme: cleaning up failed: route53: route53: time limit exceeded: last error: unable to retrieve change: ID=/change/C0878160389IQSMCZOJEK "
time="2023-10-31T12:06:19-04:00" level=debug msg="legolog: [INFO] Deactivating auth: https://acme-v02.api.letsencrypt.org/acme/authz-v3/279280525686"
time="2023-10-31T12:06:19-04:00" level=error msg="Error renewing certificate from LE: {nextcloud.internal.mydomain.com []}" ACME CA="https://acme-v02.api.letsencrypt.org/directory" providerName=route53.acme error="error: one or more domains had a problem:\n[nextcloud.internal.mydomain.com] [nextcloud.internal.mydomain.com] acme: error presenting token: route53: route53: time limit exceeded: last error: unable to retrieve change: ID=/change/XXXXXXXXXXXXXXXXXXXXX\n"

Update: At some point overnight, the certs renewed 🤷‍♂️

@triplepoint
Copy link

triplepoint commented Nov 12, 2023

Note that providing the AWS_REGION env var (which wasn't necessary before) just got me past that error - the next problem seems to still be AWS authentication related:

traefik | time="2023-11-12T00:19:12Z" level=error msg="Error renewing certificate from LE: {redacted.com []}" ACME CA="https://acme-v02.api.letsencrypt.org/directory" error="error: one or more domains had a problem:\n[redacted.com] [redacted.com] acme: error presenting token: route53: failed to determine hosted zone ID: operation error Route 53: ListHostedZonesByName, failed to sign request: failed to retrieve credentials: failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, canceled, context deadline exceeded\n" providerName=letsencrypt.acme

I am not setting the AWS environment variables (except now the region); I've been relying on the instance role IAM policy as described in the documentation. This worked fine until now, so clearly something has changed behavior.

@triplepoint
Copy link

triplepoint commented Nov 12, 2023

Confirmed that rolling back from traefik 2.10.5 to traefik 2.10.4 unblocked this problem and allowed my credentials to update. Presumably this is due to the lego version bump in 2.10.5. There's a related ticket over there that I think is the same issue as this one.

I'd consider this a pretty significant change in behavior, for a .1 bugfix increment.

@ldez
Copy link
Contributor

ldez commented Nov 12, 2023

Can you try to set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY?

@meonkeys
Copy link

meonkeys commented Nov 23, 2023

Can you try to set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY?

This worked for me.

Clarification: I already had AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, but I was missing AWS_REGION so I started seeing the same error reported in the original issue (failed to determine hosted zone ID). After adding AWS_REGION and restarting Traefik, the error went away and auto cert renewal worked again.

@hpwjnijs
Copy link
Author

hpwjnijs commented Nov 24, 2023

Can you try to set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY?

We already (always) did that, before and after the issue. So no solution for us.

@nmengin
Copy link
Contributor

nmengin commented Nov 27, 2023

Hey @hpwjnijs,

Could you provide a minimal reproducible case (for instance, full Docker manifest to reproduce the issue)?

@hpwjnijs
Copy link
Author

hpwjnijs commented Dec 4, 2023

Dockerfile:

ARG TRAEFIK_VERSION=2.10
FROM traefik:${TRAEFIK_VERSION}
#10233 ENV AWS_REGION="eu-central-1"
# hadolint ignore=DL3017
RUN apk update && apk upgrade --no-cache

docker buildx build -t traefik:testAWSREGION .

---
version: '3.8'

services:
  traefik:
    image: traefik:testAWSREGION
    container_name: "traefik"
    restart: ${CONTAINER_RESTART:-unless-stopped}
    networks:
      - web
      - docker-socket-proxy
    depends_on:
      - whoami
    ports:
      - "${HOST_IP:-0.0.0.0}:80:80"
      - "${HOST_IP:-0.0.0.0}:443:443"
      - "127.0.0.1:${TRAEFIK_API_PORT:-8079}:8080"
    volumes:
      - "./letsencrypt:/letsencrypt"
    command:
      - "--global.sendAnonymousUsage=false"
      - "--log.level=${TRAEFIK_LOG_LEVEL:-ERROR}"
      - "--api=true"
      - "--api.insecure=true"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.web.http.redirections.entrypoint.to=websecure"
      - "--entrypoints.web.http.redirections.entrypoint.scheme=https"
      - "--entrypoints.web.http.redirections.entrypoint.permanent=true"
      - "--entrypoints.websecure.address=:443"
      - "--certificatesresolvers.route53.acme.dnschallenge=true"
      - "--certificatesresolvers.route53.acme.dnschallenge.provider=route53"
      - "--certificatesresolvers.route53.acme.storage=/letsencrypt/route53.json"
      - "--providers.docker=true"
      - "--providers.docker.endpoint=tcp://docker-socket-proxy:2375"
      - "--providers.docker.network=docker-socket-proxy"
      - "--providers.docker.exposedbydefault=false"
    env_file:
      - .env

  docker-socket-proxy:
    image: tecnativa/docker-socket-proxy:${DOCKERPROXY_VERSION:-latest}
    container_name: "docker-socket-proxy"
    restart: unless-stopped
    networks:
      - docker-socket-proxy
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro,delegated
    environment:
      CONTAINERS: 1

  whoami:
    image: traefik/whoami  
    ports: 
      - 2001:2001
    networks:
      -  web
    command:
       # It tells whoami to start listening on 2001 instead of 80
       - --port=2001
       - --name=iamfoo
    labels:
      - "traefik.enable=true"
      - "traefik.docker.network=web"
      - "traefik.http.services.homepage.loadbalancer.server.port=2001"
      - "traefik.http.routers.homepage.entrypoints=websecure"
      - "traefik.http.routers.homepage.rule=Host(`whoami.dryrun.link`)"
      - "traefik.http.routers.homepage.tls.certresolver=route53"
networks:
  web:
    name: web
    internal: false
  default:
    internal: false
  docker-socket-proxy:
    internal: true

In the .env I provide the credentials for AWS/ROUTE53 domain dryrun.link
I get

root@dev-huub2:/opt/compose/test/compose# docker compose logs -f traefik
traefik  | time="2023-12-04T14:19:14Z" level=info msg="Configuration loaded from flags."
traefik  | time="2023-12-04T14:19:18Z" level=error msg="Unable to obtain ACME certificate for domains \"whoami.dryrun.link\": unable to generate a certificate for the domains [whoami.dryrun.link]: error: one or more domains had a problem:\n[whoami.dryrun.link] [whoami.dryrun.link] acme: error presenting token: route53: failed to determine hosted zone ID: operation error Route 53: ListHostedZonesByName, failed to resolve service endpoint, an AWS region is required, but was not found\n" routerName=homepage@docker rule="Host(`whoami.dryrun.link`)" providerName=route53.acme ACME CA="https://acme-v02.api.letsencrypt.org/directory"

If I uncomment the ENV AWS_REGION="eu-central-1" , I get a certificate.

Thanks in advance for your time

@benz0li
Copy link

benz0li commented Dec 7, 2023

Can you try to set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY?

@ldez I already have AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY set. Only setting a valid AWS_REGION in addition resolves this issue.

@benz0li
Copy link

benz0li commented Dec 7, 2023

@ldez The problem must have occured with the v2.10.6 release because I got an email containing

Your certificate (or certificates) for the names listed below will expire in 19 days (on 2023-12-27)

(traefik v2.10.6 was released on 2023-11-28)

@jspdown
Copy link
Contributor

jspdown commented Dec 7, 2023

Thanks for reporting the bug. I have been able to reproduce it locally.

I dig into the issue and there is indeed a regression in lego v4.14.0 with the upgrade to aws-sdk-go-v2.

The prior version aws-sdk-go v1.39.0 didn't forced us to have an AWS_REGION on some services. Route53 is part of the list

The new version aws-sdk-go-v2 v1.19.0 is more restrictive on the matter and forces an explicit AWS_REGION to be set.

I opened an issue on Lego to report the regression. I will keep this issue opened until it's addressed on Lego's side.

@jspdown jspdown added area/acme kind/bug/confirmed a confirmed bug (reproducible). and removed status/0-needs-triage labels Dec 7, 2023
@jspdown
Copy link
Contributor

jspdown commented Dec 7, 2023

The issue won't be fixed on Lego's side. The v1 and v2 of aws-sdk-go are incompatible on this matter and the upgrade
won't be reverted.

Therefore, I will have to close this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests