Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tigera-operator pod on AKS 1.20.7 Crashloopback with timeout to API server #1377

Closed
abarqawi opened this issue Jul 7, 2021 · 1 comment
Closed

Comments

@abarqawi
Copy link

abarqawi commented Jul 7, 2021

Expected Behavior

we deployed AKS cluster and using calico & kubenet
we used same pipeline to deploy many AKS clusters
we need the tigera-operator is in running state and not crashing

Current Behavior

pod tigera-operator is in crashloopback with below logs

2021/07/05 10:53:40 [INFO] Version: v1.17.1

2021/07/05 10:53:40 [INFO] Go Version: go1.15.2

2021/07/05 10:53:40 [INFO] Go OS/Arch: linux/amd64

{"level":"error","ts":1625482450.1487107,"logger":"controller-runtime.manager","msg":"Failed to get API Group-Resources","error":"Get "https://aks-mirai-acc-dns-d9c58f46.hcp.westeurope.azmk8s.io:443/api?timeout=32s\": dial tcp: i/o timeout","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.2.0/zapr.go:132\nsigs.k8s.io/controller-runtime/pkg/manager.New\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.7.0/pkg/manager/manager.go:317\nmain.main\n\t/go/src/github.com/tigera/operator/main.go:157\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:204"}

{"level":"error","ts":1625482450.1493962,"logger":"setup","msg":"unable to start manager","error":"Get "https://aks-mirai-acc-dns-d9c58f46.hcp.westeurope.azmk8s.io:443/api?timeout=32s\": dial tcp: i/o timeout","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.2.0/zapr.go:132\nmain.main\n\t/go/src/github.com/tigera/operator/main.go:175\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:204"}

*we tested traffic from the node hosted the tigera operator toward the API server FQDN above and its working fine !
*if it fail to connect with API server so we expect t all nodes are not ready , but all are ready state
*we have firewall but its allowing traffic to API server FQDN and all calico pods are running expect tigera

image


below describe and logs for tigera-operator

C:\WINDOWS\system32>kubectl describe pod tigera-operator-64bd78b58-99lmc -n tigera-operator

Name: tigera-operator-64bd78b58-99lmc

Namespace: tigera-operator

Priority: 0

Node: aks-systempool-14727861-vmss000000/10.248.56.4

Start Time: Fri, 02 Jul 2021 18:03:40 +0200

Labels: k8s-app=tigera-operator

          name=tigera-operator

          pod-template-hash=64bd78b58

Annotations:

Status: Running

IP: 10.248.56.4

IPs:

IP: 10.248.56.4

Controlled By: ReplicaSet/tigera-operator-64bd78b58

Containers:

tigera-operator:

Container ID:  containerd://15091f8c3c039accf9d559695ff97bbdf28a7ce95ce9823e30e79b487b9dfa7a

Image:         mcr.microsoft.com/oss/tigera/operator:v1.17.1

Image ID:      sha256:bcba4d5a252ae36cbf5909e31e3fed19ec6efb1ef62afa58b74f2687fea87b5b

Port:          <none>

Host Port:     <none>

Command:

  operator

State:          Waiting

  Reason:       CrashLoopBackOff

Last State:     Terminated

  Reason:       Error

  Exit Code:    1

  Started:      Mon, 05 Jul 2021 12:53:40 +0200

  Finished:     Mon, 05 Jul 2021 12:54:10 +0200

Ready:          False

Restart Count:  718

Environment Variables from:

  kubernetes-services-endpoint  ConfigMap  Optional: true

Environment:

  WATCH_NAMESPACE:

  POD_NAME:                            tigera-operator-64bd78b58-99lmc (v1:metadata.name)

  OPERATOR_NAME:                       tigera-operator

  TIGERA_OPERATOR_INIT_IMAGE_VERSION:  v1.17.1

  KUBERNETES_PORT_443_TCP_ADDR:        aks-mirai-acc-dns-d9c58f46.hcp.westeurope.azmk8s.io

  KUBERNETES_PORT:                     tcp://aks-mirai-acc-dns-d9c58f46.hcp.westeurope.azmk8s.io:443

  KUBERNETES_PORT_443_TCP:             tcp://aks-mirai-acc-dns-d9c58f46.hcp.westeurope.azmk8s.io:443

  KUBERNETES_SERVICE_HOST:             aks-mirai-acc-dns-d9c58f46.hcp.westeurope.azmk8s.io

Mounts:

  /var/lib/calico from var-lib-calico (ro)

  /var/run/secrets/kubernetes.io/serviceaccount from tigera-operator-token-4sv2d (ro)

Conditions:

Type Status

Initialized True

Ready False

ContainersReady False

PodScheduled True

Volumes:

var-lib-calico:

Type:          HostPath (bare host directory volume)

Path:          /var/lib/calico

HostPathType:

tigera-operator-token-4sv2d:

Type:        Secret (a volume populated by a Secret)

SecretName:  tigera-operator-token-4sv2d

Optional:    false

QoS Class: BestEffort

Node-Selectors: kubernetes.io/os=linux

Tolerations: :NoExecute op=Exists

             :NoSchedule op=Exists

             CriticalAddonsOnly op=Exists

Events:

Type Reason Age From Message


Normal Pulled 46m (x711 over 2d18h) kubelet Container image "mcr.microsoft.com/oss/tigera/operator:v1.17.1" already present on machine

Warning BackOff 66s (x16784 over 2d18h) kubelet Back-off restarting failed container

Context

tigera-operator pod failing

Your Environment

AKS cluster 1.20.7

@abarqawi
Copy link
Author

abarqawi commented Jul 9, 2021

issue resolved after restarting CoreDNS pod

@abarqawi abarqawi closed this as completed Jul 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant