tigera-operator pod on AKS 1.20.7 Crashloopback with timeout to API server #1377

abarqawi · 2021-07-07T11:40:17Z

Expected Behavior

we deployed AKS cluster and using calico & kubenet
we used same pipeline to deploy many AKS clusters
we need the tigera-operator is in running state and not crashing

Current Behavior

pod tigera-operator is in crashloopback with below logs

2021/07/05 10:53:40 [INFO] Version: v1.17.1

2021/07/05 10:53:40 [INFO] Go Version: go1.15.2

2021/07/05 10:53:40 [INFO] Go OS/Arch: linux/amd64

{"level":"error","ts":1625482450.1487107,"logger":"controller-runtime.manager","msg":"Failed to get API Group-Resources","error":"Get "https://aks-mirai-acc-dns-d9c58f46.hcp.westeurope.azmk8s.io:443/api?timeout=32s\": dial tcp: i/o timeout","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.2.0/zapr.go:132\nsigs.k8s.io/controller-runtime/pkg/manager.New\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.7.0/pkg/manager/manager.go:317\nmain.main\n\t/go/src/github.com/tigera/operator/main.go:157\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:204"}

{"level":"error","ts":1625482450.1493962,"logger":"setup","msg":"unable to start manager","error":"Get "https://aks-mirai-acc-dns-d9c58f46.hcp.westeurope.azmk8s.io:443/api?timeout=32s\": dial tcp: i/o timeout","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.2.0/zapr.go:132\nmain.main\n\t/go/src/github.com/tigera/operator/main.go:175\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:204"}

*we tested traffic from the node hosted the tigera operator toward the API server FQDN above and its working fine !
*if it fail to connect with API server so we expect t all nodes are not ready , but all are ready state
*we have firewall but its allowing traffic to API server FQDN and all calico pods are running expect tigera

below describe and logs for tigera-operator

C:\WINDOWS\system32>kubectl describe pod tigera-operator-64bd78b58-99lmc -n tigera-operator

Name: tigera-operator-64bd78b58-99lmc

Namespace: tigera-operator

Priority: 0

Node: aks-systempool-14727861-vmss000000/10.248.56.4

Start Time: Fri, 02 Jul 2021 18:03:40 +0200

Labels: k8s-app=tigera-operator

          name=tigera-operator

          pod-template-hash=64bd78b58

Annotations:

Status: Running

IP: 10.248.56.4

IPs:

IP: 10.248.56.4

Controlled By: ReplicaSet/tigera-operator-64bd78b58

Containers:

tigera-operator:

Container ID:  containerd://15091f8c3c039accf9d559695ff97bbdf28a7ce95ce9823e30e79b487b9dfa7a

Image:         mcr.microsoft.com/oss/tigera/operator:v1.17.1

Image ID:      sha256:bcba4d5a252ae36cbf5909e31e3fed19ec6efb1ef62afa58b74f2687fea87b5b

Port:          <none>

Host Port:     <none>

Command:

  operator

State:          Waiting

  Reason:       CrashLoopBackOff

Last State:     Terminated

  Reason:       Error

  Exit Code:    1

  Started:      Mon, 05 Jul 2021 12:53:40 +0200

  Finished:     Mon, 05 Jul 2021 12:54:10 +0200

Ready:          False

Restart Count:  718

Environment Variables from:

  kubernetes-services-endpoint  ConfigMap  Optional: true

Environment:

  WATCH_NAMESPACE:

  POD_NAME:                            tigera-operator-64bd78b58-99lmc (v1:metadata.name)

  OPERATOR_NAME:                       tigera-operator

  TIGERA_OPERATOR_INIT_IMAGE_VERSION:  v1.17.1

  KUBERNETES_PORT_443_TCP_ADDR:        aks-mirai-acc-dns-d9c58f46.hcp.westeurope.azmk8s.io

  KUBERNETES_PORT:                     tcp://aks-mirai-acc-dns-d9c58f46.hcp.westeurope.azmk8s.io:443

  KUBERNETES_PORT_443_TCP:             tcp://aks-mirai-acc-dns-d9c58f46.hcp.westeurope.azmk8s.io:443

  KUBERNETES_SERVICE_HOST:             aks-mirai-acc-dns-d9c58f46.hcp.westeurope.azmk8s.io

Mounts:

  /var/lib/calico from var-lib-calico (ro)

  /var/run/secrets/kubernetes.io/serviceaccount from tigera-operator-token-4sv2d (ro)

Conditions:

Type Status

Initialized True

Ready False

ContainersReady False

PodScheduled True

Volumes:

var-lib-calico:

Type:          HostPath (bare host directory volume)

Path:          /var/lib/calico

HostPathType:

tigera-operator-token-4sv2d:

Type:        Secret (a volume populated by a Secret)

SecretName:  tigera-operator-token-4sv2d

Optional:    false

QoS Class: BestEffort

Node-Selectors: kubernetes.io/os=linux

Tolerations: :NoExecute op=Exists

             :NoSchedule op=Exists

             CriticalAddonsOnly op=Exists

Events:

Type Reason Age From Message

Normal Pulled 46m (x711 over 2d18h) kubelet Container image "mcr.microsoft.com/oss/tigera/operator:v1.17.1" already present on machine

Warning BackOff 66s (x16784 over 2d18h) kubelet Back-off restarting failed container

Context

tigera-operator pod failing

Your Environment

AKS cluster 1.20.7

The text was updated successfully, but these errors were encountered:

abarqawi · 2021-07-09T13:15:45Z

issue resolved after restarting CoreDNS pod

abarqawi closed this as completed Jul 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tigera-operator pod on AKS 1.20.7 Crashloopback with timeout to API server #1377

tigera-operator pod on AKS 1.20.7 Crashloopback with timeout to API server #1377

abarqawi commented Jul 7, 2021 •

edited

Loading

abarqawi commented Jul 9, 2021

tigera-operator pod on AKS 1.20.7 Crashloopback with timeout to API server #1377

tigera-operator pod on AKS 1.20.7 Crashloopback with timeout to API server #1377

Comments

abarqawi commented Jul 7, 2021 • edited Loading

Expected Behavior

Current Behavior

Context

Your Environment

abarqawi commented Jul 9, 2021

abarqawi commented Jul 7, 2021 •

edited

Loading