New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
acme resolver not working with persistence enabled #396
Comments
Seems the lego provider is nicely writing with 600 permissions. |
I now figured that on initial creation the permissions are there as 600. When deleting the pod, the deployment automatically creates a new one and mounts the now already existing volume with the acme.json file. Now the file has 660 permissions and breaks… Then using cli on the pod itself I can do a When pod is replaced with another one again the permissions are back to Running now with the latest version of the Helm chart As soon I disable persistence, acme works again. However without persistence it is very easy to hit the API rate limits on Letsencrypt. |
I found a solution by adding a initContainer. persistence:
enabled: true
# this is required to ensure the acme.json has the required 600 permissions when remounting the volume
deployment:
initContainers:
- name: fix-permissions-acme-json
image: alpine:3.12.4
command:
- chmod
- "600"
- /data/acme.json
volumeMounts:
- name: data
mountPath: /data Would a PR be accepted that adds this initcontainer by default when persistence is enabled? |
Hello @marcofranssen As a workaround you can try to enable initcontainers section: initContainers:
- name: volume-permissions
image: busybox:1.31.1
command: ["sh", "-c", "chmod -Rv 600 /data/*"]
volumeMounts:
- name: data
mountPath: /data Each time when a pod will be recycled the correct permissions (600) will be set. |
@jakubhajek thanks, I found the solution 2 minutes before you answered here :) |
Experiencing the same problem here, posted it just a few minutes before in the wrong repo 👆. I can confirm the workaround works 👍 |
Both the proposed solutions do in fact not work. @marcofranssen 's solutions leads to the error "no such file: /data/acme.json" and @jakubhajek 's solutions leads to "chmod: /data/lost+found: Operation not permitted" |
@Haribo112 See https://github.com/traefik/traefik-helm-chart/blob/master/EXAMPLES.md#use-traefik-native-lets-encrypt-integration-without-cert-manager for a detailed working example. |
When I follow those exact instructions I end up with a pod that cannot start, because the initcontainer keeps crashing. The logs for the init-container reveal: "touch: /data/acme.json: Permission denied also, other issues on this repo and elsewhere seem to indicate that one should never try to |
Experiencing the same issue as @Haribo112, the odd this is I also tried to create
When I run without Then, I test creating
I've been tried many things, including putting on Traefik: 2.10.1 |
Hi. I exeperienced the same issue as @huedaya and @Haribo112 . I found where the issue is but i'm not able to fix it right now. TL;DR : The /data (or /cert-ssl for me) folder got 600 permission and is root:root, but the busy box and traefik are running as non root containers so they don't get permission to access the json files Here are the step that i have done to investigate I have run the following commands into the busybox initContainers $ id root
uid=0(root) gid=0(root) groups=0(root),10(wheel)
$ ls -l / | grep ssl-certs
drwxr-xr-x 3 root root 4096 Jul 8 14:47 ssl-certs
$ ls -l /ssl-certs
drwx------ 2 root root 16384 Jul 8 14:47 lost+found
$ whoami
whoami: unknown uid 65532
$ touch /ssl-certs/acme-staging.json
touch: /ssl-certs/acme-staging.json: Permission denied
$ chmod -v 600 /ssl-certs/acme-staging.json
chmod: /ssl-certs/acme-staging.json: No such file or directory We can see that the issue come from the runing user doesn't get the permissions on the /ssl-cert/ folder to performe changes. I try to force the init container to run as root: deployment:
initContainers:
# The "volume-permissions" init container is required if you run into permission issues.
# Related issue: https://github.com/containous/traefik/issues/6972
- name: volume-permissions
image: busybox:1.31.1
command: ["sh", "-c", "id root;ls -l /; ls -l /ssl-certs; whoami; touch /ssl-certs/acme-staging.json; chmod -v 600 /ssl-certs/acme-staging.json; touch /ssl-certs/acme-production.json; chmod -v 600 /ssl-certs/acme-production.json"]
securityContext:
runAsUser: 0
runAsNonRoot: false
allowPrivilegeEscalation: false
volumeMounts:
- name: ssl-certs
mountPath: /ssl-certs And now the chmod are well executed on mode of '/ssl-certs/acme-staging.json' changed to 0600 (rw-------)
mode of '/ssl-certs/acme-production.json' changed to 0600 (rw-------) But now the issue is on the traefik container : time="2023-07-09T07:17:47Z" level=error msg="The ACME resolver \"staging\" is skipped from the resolvers list because: unable to get ACME account: open /ssl-certs/acme-staging.json: permission denied"
time="2023-07-09T07:17:47Z" level=error msg="The ACME resolver \"production\" is skipped from the resolvers list because: unable to get ACME account: open /ssl-certs/acme-production.json: permission denied" |
I am just now experiencing the same issue as @huedaya and @Haribo112 and @monnierant before me. Trying to run the initContainer fails because
|
I just decided to do something else, I am not sure if it solved the issue though because I am not yet trying to actually save a certificate in the file. But the initContainer appears to be working now.
initContainers:
- name: volume-permissions
image: traefik:v2.10.4
command:
[
"sh",
"-c",
"touch /data/acme.json; chown 65532 /data/acme.json; chmod -v 600 /data/acme.json",
]
securityContext:
runAsNonRoot: false
runAsGroup: 0
runAsUser: 0
volumeMounts:
- name: traefik-data
mountPath: /data The initContainer no longer crashes at least. I will let people know once I am actually able verify that I can store my certs this way. |
Hi @icodeforyou-dot-net, just in case you fail to solve the issue, I ended up using cert-manager instead of traefik's built in features.
|
I am not totally sure because my experience with Traefik is still limited, but at least I am past the error from inside traefik container. I'd say it stores the cert now. I also killed the pod and it is still going. I'd say it is working now. Edit: My domain appears to have a working wildcard cert now. So I am even more inclined to say it is working now 😄 |
This seems to be working for me, thanks for the suggestion |
I might create a pull request in a week or two to update the infos/comments the helm chart itself. The old solution that isn't working for everyone is still to be found there. So everyone will know in the future how to get around this. Just have to do some more tests. |
i am having the same issue , is there any update on this one ? |
@rsinha29 which is "the same issue"? Did you try my workaround? I did not have time to work on the helm chart itself so far. |
@icodeforyou-dot-net , forgot to mention here i used your workaround and it worked |
Is this the answer to the issue currently; the "workaround?" It seems like a long time to have this issue, but it definitely still exists and has wasted a lot of my day so far. |
Well it is my answer at least 😄 It is working, right? I believe it should technically also work with the But I am using should here for a reason. I didn't actually check any of that. I don't really know when I have the leisure to open a pull request on this one. I would need to test this with the |
I have tried it with ✗ kubectl logs traefik-dsfkj874-hfjei -n kube-system -c volume-permissions
mode of '/data/acme.json' changed to 0600 (rw-------) initContainers:
- name: volume-permissions
image: busybox:latest
command:
[
"sh",
"-c",
"touch /data/acme.json; chown 65532 /data/acme.json; chmod -v 600 /data/acme.json",
]
securityContext:
runAsNonRoot: false
runAsGroup: 0
runAsUser: 0
volumeMounts:
- name: data
mountPath: /data |
it works perfectly when using traefik image itself in the initcontainers
|
TLDR proposed workaround doesn't work in chart version 26.0.0 (possibly others too) since setting uid 0 violates the pod security context. per
This doesn't work for me, since the default chart values (which I haven't changed), for chart version 26.0.0, specify a pod security context, which makes the uid of 0 not allowed for the volume-permissions init container: The values: podSecurityContext:
# /!\ When setting fsGroup, Kubernetes will recursively change ownership and
# permissions for the contents of each volume to match the fsGroup. This can
# be an issue when storing sensitive content like TLS Certificates /!\
# fsGroup: 65532
# -- Specifies the policy for changing ownership and permissions of volume contents to match the fsGroup.
fsGroupChangePolicy: "OnRootMismatch"
# -- The ID of the group for all containers in the pod to run as.
runAsGroup: 65532
# -- Specifies whether the containers should run as a non-root user.
runAsNonRoot: true
# -- The ID of the user for all containers in the pod to run as.
runAsUser: 65532 The error:
|
Per my last comment, I just figured out the solution: uncomment fsGrouppodSecurityContext:
# /!\ When setting fsGroup, Kubernetes will recursively change ownership and
# permissions for the contents of each volume to match the fsGroup. This can
# be an issue when storing sensitive content like TLS Certificates /!\
fsGroup: 65532
# -- Specifies the policy for changing ownership and permissions of volume contents to match the fsGroup.
fsGroupChangePolicy: "OnRootMismatch"
# -- The ID of the group for all containers in the pod to run as.
runAsGroup: 65532
# -- Specifies whether the containers should run as a non-root user.
runAsNonRoot: true
# -- The ID of the user for all containers in the pod to run as.
runAsUser: 65532 don't use id 0 anymore; just use 65532initContainers:
# The "volume-permissions" init container is required if you run into permission issues.
# Related issue: https://github.com/traefik/traefik-helm-chart/issues/396
- name: volume-permissions
image: busybox:latest
command: ["sh", "-c", "touch /data/acme.json; chmod -v 600 /data/acme.json"]
securityContext:
runAsNonRoot: true
runAsGroup: 65532
runAsUser: 65532
volumeMounts:
- name: data
mountPath: /data Then, we can see success: k logs -n traefik pods/<your_pod> -c volume-permissions outputs
More on pod security contexts and action/PR follow-upIf a PR were submitted to add the comment, right above the initContainer: "uncomment podSecurityContext.fsGroup above and make sure it matches the runAs.. users here", this would solve the problem for people. |
I did not manage to make @life5ign solution work for me (26.0.0), but got it working. For reference, I'm doing this for EKS Fargate:
persistence:
# -- Enable persistence using Persistent Volume Claims
# ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
# It can be used to store TLS certificates, see `storage` in certResolvers
enabled: true
size: 128Mi
storageClass: efs
accessMode: ReadWriteOnce
...
deployment:
...
initContainers:
# The "volume-permissions" init container is required if you run into permission issues.
# Related issue: https://github.com/traefik/traefik-helm-chart/issues/396
- name: volume-permissions
image: busybox:1.36
command:
["sh", "-c", "touch /data/acme.json; chmod -v 600 /data/acme.json"]
securityContext:
runAsNonRoot: false
runAsGroup: 0
runAsUser: 0
volumeMounts:
- name: data
mountPath: /data
...
podSecurityContext:
# /!\ When setting fsGroup, Kubernetes will recursively change ownership and
# permissions for the contents of each volume to match the fsGroup. This can
# be an issue when storing sensitive content like TLS Certificates /!\
# fsGroup: 65532
# -- Specifies the policy for changing ownership and permissions of volume contents to match the fsGroup.
fsGroupChangePolicy: "OnRootMismatch"
# -- The ID of the group for all containers in the pod to run as.
runAsGroup: 65532
# -- Specifies whether the containers should run as a non-root user.
runAsNonRoot: true
# -- The ID of the user for all containers in the pod to run as.
runAsUser: 65532
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: efs
provisioner: efs.csi.aws.com
---
# Traefik will use this to store TLS certificates
apiVersion: v1
kind: PersistentVolume
metadata:
name: traefik-efs
spec:
capacity:
storage: 1Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce # Many is only available to dynamic volumes
persistentVolumeReclaimPolicy: Retain # so disk is not delete after pod deletion
storageClassName: efs
csi:
driver: efs.csi.aws.com
# As of today, Amazon EFS CSI driver doesn't support dynamic provisioning on Fargate,
# so we're manually deploying an EFS instance and hardcoding it here 😭
#
# Ref: https://github.com/kubernetes-sigs/aws-efs-csi-driver
volumeHandle: fs-0cbf8b13198606563 |
nvm, i remove the initContainers part and just use th |
The acme resolver isn't working with persistence enabled due to file permissions.
See below the log.
This is an abstract of my values.yml
The text was updated successfully, but these errors were encountered: