Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cadvisor init failure in kubernetes #195

Open
dragonTour opened this issue Mar 3, 2023 · 8 comments
Open

cadvisor init failure in kubernetes #195

dragonTour opened this issue Mar 3, 2023 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@dragonTour
Copy link

Describe this problem

cadvisor's pod failed to run

[root@xxx ~]# kubectl get po -n holoinsight-example
NAME                           READY   STATUS             RESTARTS      AGE
cadvisor-kpxbl                 0/1     CrashLoopBackOff   3 (35s ago)   90s
cadvisor-zwc4q                 0/1     CrashLoopBackOff   3 (16s ago)   90s
ceresdb-0                      1/1     Running            0             91s
clusteragent-0                 1/1     Running            0             91s
daemonagent-7xk4d              1/1     Running            0             90s
daemonagent-8n5gg              1/1     Running            0             90s
holoinsight-server-example-0   0/1     Running            0             91s
mongo-0                        1/1     Running            0             91s
mysql-0                        0/1     Running            0             90s

Viewing pod(cadvisor-kpxbl) logs:

[root@host-10-19-37-88 ~]# kubectl describe po cadvisor-kpxbl -n holoinsight-example
...
Containers:
  cadvisor:
    Container ID:  docker://7a3b2aab591d147b4dbf9e804e7b1837817696e50cd540ce1f63aff1ca27dac1
    Image:         gcr.io/cadvisor/cadvisor:v0.44.0
    Image ID:      docker-pullable://gcr.io/cadvisor/cadvisor@sha256:ef1e224267584fc9cb8d189867f178598443c122d9068686f9c3898c735b711f
    Port:          8080/TCP
    Host Port:     0/TCP
    Args:
      --allow_dynamic_housekeeping=false
      --housekeeping_interval=5s
      --max_housekeeping_interval=5s
      --storage_duration=2m
      --enable_metrics=cpu,memory,network,tcp,disk,diskIO,cpuLoad
      --enable_load_reader=true
      --store_container_labels=false
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       ContainerCannotRun
      Message:      OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:76: mounting "/var/lib/kubelet/pods/ab235440-bbca-45b0-94db-eb859ffdf763/volumes/kubernetes.io~projected/kube-api-access-hkwwk" to rootfs at "/var/run/secrets/kubernetes.io/serviceaccount" caused: mkdir /data/docker/overlay2/69dee914b194de362188cb07318446b62fa3559fc5cb03a54c1169e0cf4bda4c/merged/run/secrets: read-only file system: unknown

...
Events:
  Type     Reason     Age                     From               Message
  ----     ------     ----                    ----               -------
  Normal   Scheduled  5m8s                    default-scheduler  Successfully assigned holoinsight-example/cadvisor-kpxbl to host-10-19-37-88
  Warning  Failed     4m55s                   kubelet            Error: failed to start container "cadvisor": Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:76: mounting "/var/lib/kubelet/pods/ab235440-bbca-45b0-94db-eb859ffdf763/volumes/kubernetes.io~projected/kube-api-access-hkwwk" to rootfs at "/var/run/secrets/kubernetes.io/serviceaccount" caused: mkdir /data/docker/overlay2/ce6a7eebbaedec94eb7fdaf3a4f1427526613fb4cf7be485908299219d32ac4c/merged/run/secrets: read-only file system: unknown
  Warning  Failed     4m52s                   kubelet            Error: failed to start container "cadvisor": Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:76: mounting "/var/lib/kubelet/pods/ab235440-bbca-45b0-94db-eb859ffdf763/volumes/kubernetes.io~projected/kube-api-access-hkwwk" to rootfs at "/var/run/secrets/kubernetes.io/serviceaccount" caused: mkdir /data/docker/overlay2/944ffd264146660a5f9ede7638c677a1c47b98061838741cfb29bd3241c5babf/merged/run/secrets: read-only file system: unknown
  Warning  Failed     4m37s                   kubelet            Error: failed to start container "cadvisor": Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:76: mounting "/var/lib/kubelet/pods/ab235440-bbca-45b0-94db-eb859ffdf763/volumes/kubernetes.io~projected/kube-api-access-hkwwk" to rootfs at "/var/run/secrets/kubernetes.io/serviceaccount" caused: mkdir /data/docker/overlay2/f441c927545adff71fcbdc8c5056ebeaa2b441a112c321e8205b7bc2c5eadb0d/merged/run/secrets: read-only file system: unknown
  Warning  Failed     4m6s                    kubelet            Error: failed to start container "cadvisor": Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:76: mounting "/var/lib/kubelet/pods/ab235440-bbca-45b0-94db-eb859ffdf763/volumes/kubernetes.io~projected/kube-api-access-hkwwk" to rootfs at "/var/run/secrets/kubernetes.io/serviceaccount" caused: mkdir /data/docker/overlay2/bde54086f8560ea1fcbf9488fe25c83d8567d5bd0d9645ca5e59dd6c4940ffea/merged/run/secrets: read-only file system: unknown
  Normal   Pulled     3m21s (x5 over 5m1s)    kubelet            Container image "gcr.io/cadvisor/cadvisor:v0.44.0" already present on machine
  Normal   Created    3m20s (x5 over 5m)      kubelet            Created container cadvisor
  Warning  Failed     3m19s                   kubelet            Error: failed to start container "cadvisor": Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:76: mounting "/var/lib/kubelet/pods/ab235440-bbca-45b0-94db-eb859ffdf763/volumes/kubernetes.io~projected/kube-api-access-hkwwk" to rootfs at "/var/run/secrets/kubernetes.io/serviceaccount" caused: mkdir /data/docker/overlay2/8ba07a89c4755c857fbc1e11e48169dde0ac9c6a8aca0c4384c792d91e961f0a/merged/run/secrets: read-only file system: unknown
  Warning  BackOff    2m51s (x10 over 4m51s)  kubelet            Back-off restarting failed container

Steps to reproduce

kubernetes version:1.23
docker version: 20.10.6
linux kernal: 4.18.0-1.el7.elrepo.x86_64

Expected behavior

No response

Additional Information

No response

@dragonTour dragonTour added the bug Something isn't working label Mar 3, 2023
@xzchaoo xzchaoo self-assigned this Mar 3, 2023
@xzchaoo
Copy link
Contributor

xzchaoo commented Mar 3, 2023

Is your k8s cluster a real cluster? Or a minikube version?

@xzchaoo
Copy link
Contributor

xzchaoo commented Mar 3, 2023

I can't find an environment exactly like yours to reproduce the problem in the short term.
Maybe You can try to modify(e.g. comment out some configuration) the cadvisor.yaml and redeploy it.

@dragonTour
Copy link
Author

I used kubeadm to boot the cluster

@dragonTour
Copy link
Author

dragonTour commented Mar 3, 2023

我把docker的运行数据的目录改了,不在/var/ 下面,是不是这个引起的

[root@]# more /etc/docker/daemon.json 
{
    "data-root": "/data/docker",
    "exec-opts": [
        "native.cgroupdriver=systemd"
    ]
}

@xzchaoo
Copy link
Contributor

xzchaoo commented Mar 3, 2023

Is the original value of data-root '/var/lib/docker' ?
If so, maybe You need to change the cadvisor.yaml :

      volumes:
...
      - name: docker
        hostPath:
          path: /var/lib/docker
     ...

to

      volumes:
...
      - name: docker
        hostPath:
          path: /data/docker
     ...

@dragonTour
Copy link
Author

change readOnly to false, run successfully

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: cadvisor
  namespace: holoinsight-example
spec:
  selector:
    matchLabels:
      app: cadvisor
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  template:
    metadata:
      labels:
        app: cadvisor
        hi_common_version: '3'
    spec:
      restartPolicy: Always
      containers:
      - name: cadvisor
        image: gcr.io/cadvisor/cadvisor:v0.44.0
        args:
        - --allow_dynamic_housekeeping=false
        - --housekeeping_interval=5s
        - --max_housekeeping_interval=5s
        - --storage_duration=2m
        - --enable_metrics=cpu,memory,network,tcp,disk,diskIO,cpuLoad
        - --enable_load_reader=true
        - --store_container_labels=false
        volumeMounts:
        - name: rootfs
          mountPath: /rootfs
          readOnly: false
        - name: var-run
          mountPath: /var/run
          readOnly: false
        - name: sys
          mountPath: /sys
          readOnly: true
        - name: docker
          mountPath: /var/lib/docker
          readOnly: false
        - name: disk
          mountPath: /dev/disk
          readOnly: true
        ports:
        - name: http
          containerPort: 8080
          protocol: TCP

        resources:
          requests:
            cpu: "0"
            memory: "0"
          limits:
            cpu: "0.25"
            memory: "256Mi"
      volumes:
      - name: rootfs
        hostPath:
          path: /
      - name: var-run
        hostPath:
          path: /var/run
      - name: sys
        hostPath:
          path: /sys
      - name: docker
        hostPath:
          path: /data/docker
      - name: disk
        hostPath:
          path: /dev/disk

@xzchaoo
Copy link
Contributor

xzchaoo commented Mar 16, 2023

The volumeMounts config in cadvisor yaml are copied from cadvisor official repository without any changes.
And our internal deployments (through Aliyun k8s cluster) are all successful with this cadvisor config.
I think there is some special particularity in your k8s cluster, leading to deployment failure.

If you would like to explore the root cause of this issue, and contribute a corresponding solution, then this is quite welcome.

@gigi-at-zymergen
Copy link

dragonTour is not alone. I'm seeing this same issue in EKS 1.24 which uses containerd runtime.

cadvisor:
    Container ID:   containerd://80ad9ce8b85e077f50dd9c1bfd1e248801afa3126f94793b91bbdb5ea33acf29
    Image:          gcr.io/cadvisor/cadvisor:v0.49.1
    Image ID:       gcr.io/cadvisor/cadvisor@sha256:3cde6faf0791ebf7b41d6f8ae7145466fed712ea6f252c935294d2608b1af388
    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       StartError
      Message:      failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/var/lib/kubelet/pods/882dfec1-613f-4a83-8705-424230f18271/volumes/kubernetes.io~projected/kube-api-access-phx22" to rootfs at "/var/run/secrets/kubernetes.io/serviceaccount": mkdir /run/containerd/io.containerd.runtime.v2.task/k8s.io/80ad9ce8b85e077f50dd9c1bfd1e248801afa3126f94793b91bbdb5ea33acf29/rootfs/run/secrets: read-only file system: unknown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants