Description
/kind bug
1. What kops
version are you running? The command kops version
, will display
this information.
Client version: 1.29.0 (git-v1.29.0)
2. What Kubernetes version are you running? kubectl version
will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops
flag.
1.28.7
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
Update Kops from 1.28.4 to 1.29.0, or create a new cluster using Kops 1.29.0 with Node Local DNS and Cilium CNI.
5. What happened after the commands executed?
Pods on updated nodes cannot access node-local-dns pods
6. What did you expect to happen?
Pods can access node-local-dns pods.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
creationTimestamp: "2024-05-31T07:47:47Z"
name: k8s.tmp-test.example
spec:
additionalSans:
- api-internal.k8s.tmp-test.example
- api.internal.k8s.tmp-test.example
api:
loadBalancer:
type: Internal
useForInternalApi: true
authentication: {}
authorization:
rbac: {}
certManager:
enabled: true
managed: false
cloudConfig:
awsEBSCSIDriver:
enabled: true
cloudProvider: aws
clusterAutoscaler:
enabled: true
configBase: s3://example/k8s.tmp-test.example
containerd:
registryMirrors:
'*':
- https://nexus-proxy.example.io
docker.io:
- https://nexus-proxy.example.io
k8s.gcr.io:
- https://nexus-proxy.example.io
public.ecr.aws:
- https://nexus-proxy.example.io
quay.io:
- https://nexus-proxy.example.io
registry.example.io:
- https://registry.example.io
etcdClusters:
- cpuRequest: 200m
etcdMembers:
- instanceGroup: master-1a
name: a
- instanceGroup: master-1b
name: b
- instanceGroup: master-1c
name: c
manager:
backupRetentionDays: 90
env:
- name: ETCD_LISTEN_METRICS_URLS
value: http://0.0.0.0:2379
- name: ETCD_METRICS
value: basic
- name: ETCD_MANAGER_HOURLY_BACKUPS_RETENTION
value: 1d
- name: ETCD_MANAGER_DAILY_BACKUPS_RETENTION
value: 1d
- name: ETCD_MAX_REQUEST_BYTES
value: "1572864"
memoryRequest: 100Mi
name: main
- cpuRequest: 100m
etcdMembers:
- instanceGroup: master-1a
name: a
- instanceGroup: master-1b
name: b
- instanceGroup: master-1c
name: c
manager:
backupRetentionDays: 90
env:
- name: ETCD_MANAGER_HOURLY_BACKUPS_RETENTION
value: 1d
- name: ETCD_MANAGER_DAILY_BACKUPS_RETENTION
value: 1d
- name: ETCD_MAX_REQUEST_BYTES
value: "1572864"
memoryRequest: 100Mi
name: events
fileAssets:
- content: |
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: RequestResponse
userGroups:
- "/devops"
- "/developers"
- "/teamleads"
- "/k8s-full"
- "/sre"
- "/support"
- "/qa"
- "system:serviceaccounts"
name: audit-policy-config
path: /etc/kubernetes/audit/policy-config.yaml
roles:
- ControlPlane
iam:
legacy: false
kubeAPIServer:
auditLogMaxAge: 10
auditLogMaxBackups: 1
auditLogMaxSize: 100
auditLogPath: /var/log/kube-apiserver-audit.log
auditPolicyFile: /etc/kubernetes/audit/policy-config.yaml
oidcClientID: kubernetes
oidcGroupsClaim: groups
oidcIssuerURL: https://sso.example.io/auth/realms/example
serviceAccountIssuer: https://api-internal.k8s.tmp-test.example
serviceAccountJWKSURI: https://api-internal.k8s.tmp-test.example/openid/v1/jwks
kubeDNS:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kops.k8s.io/instancegroup
operator: In
values:
- infra-nodes
nodeLocalDNS:
cpuRequest: 25m
enabled: true
memoryRequest: 5Mi
provider: CoreDNS
tolerations:
- effect: NoSchedule
key: dedicated/infra
operator: Exists
kubeProxy:
enabled: false
kubelet:
anonymousAuth: false
authenticationTokenWebhook: true
authorizationMode: Webhook
evictionHard: memory.available<7%,nodefs.available<3%,nodefs.inodesFree<5%,imagefs.available<10%,imagefs.inodesFree<5%
evictionMaxPodGracePeriod: 30
evictionSoft: memory.available<12%
evictionSoftGracePeriod: memory.available=200s
kubernetesApiAccess:
- 10.170.0.0/16
kubernetesVersion: 1.28.7
masterPublicName: api.k8s.tmp-test.example
metricsServer:
enabled: false
insecure: true
networkCIDR: 10.170.0.0/16
networkID: vpc-xxxx
networking:
cilium:
enableNodePort: true
enablePrometheusMetrics: true
ipam: eni
nodeProblemDetector:
enabled: true
nodeTerminationHandler:
enableSQSTerminationDraining: false
enabled: true
nonMasqueradeCIDR: 100.64.0.0/10
sshAccess:
- 10.170.0.0/16
subnets:
- cidr: 10.170.140.0/24
name: kops-k8s-1a
type: Private
zone: eu-central-1a
- cidr: 10.170.142.0/24
name: kops-k8s-eni-1a
type: Private
zone: eu-central-1a
- cidr: 10.170.141.0/24
name: kops-k8s-utility-1a
type: Utility
zone: eu-central-1a
- cidr: 10.170.143.0/24
name: kops-k8s-1b
type: Private
zone: eu-central-1b
- cidr: 10.170.145.0/24
name: kops-k8s-eni-1b
type: Private
zone: eu-central-1b
- cidr: 10.170.144.0/24
name: kops-k8s-utility-1b
type: Utility
zone: eu-central-1b
- cidr: 10.170.146.0/24
name: kops-k8s-1c
type: Private
zone: eu-central-1c
- cidr: 10.170.148.0/24
name: kops-k8s-eni-1c
type: Private
zone: eu-central-1c
- cidr: 10.170.147.0/24
name: kops-k8s-utility-1c
type: Utility
zone: eu-central-1c
topology:
dns:
type: Private
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2024-05-31T07:47:48Z"
labels:
kops.k8s.io/cluster: k8s.tmp-test.example
name: graviton-nodes
spec:
autoscale: false
image: ami-0192de4261c8ff06a
machineType: t4g.small
maxSize: 1
minSize: 1
mixedInstancesPolicy:
instances:
- t4g.small
onDemandAboveBase: 0
onDemandBase: 0
spotAllocationStrategy: lowest-price
spotInstancePools: 3
nodeLabels:
kops.k8s.io/instancegroup: graviton-nodes
role: Node
rootVolumeSize: 25
subnets:
- kops-k8s-1a
- kops-k8s-eni-1a
sysctlParameters:
- net.netfilter.nf_conntrack_max = 1048576
- net.core.netdev_max_backlog = 30000
- net.core.rmem_max = 134217728
- net.core.wmem_max = 134217728
- net.ipv4.tcp_wmem = 4096 87380 67108864
- net.ipv4.tcp_rmem = 4096 87380 67108864
- net.ipv4.tcp_mem = 187143 249527 1874286
- net.ipv4.tcp_max_syn_backlog = 8192
- net.ipv4.ip_local_port_range = 10240 65535
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2024-05-31T07:47:48Z"
labels:
kops.k8s.io/cluster: k8s.tmp-test.example
name: infra-nodes
spec:
autoscale: false
image: ami-035f7f826413ac489
machineType: t3.small
maxSize: 1
minSize: 1
mixedInstancesPolicy:
instances:
- t3.small
onDemandAboveBase: 0
onDemandBase: 0
spotAllocationStrategy: lowest-price
spotInstancePools: 3
nodeLabels:
kops.k8s.io/instancegroup: infra-nodes
role: Node
rootVolumeSize: 25
subnets:
- kops-k8s-1a
- kops-k8s-eni-1a
sysctlParameters:
- net.netfilter.nf_conntrack_max = 1048576
- net.core.netdev_max_backlog = 30000
- net.core.rmem_max = 134217728
- net.core.wmem_max = 134217728
- net.ipv4.tcp_wmem = 4096 12582912 16777216
- net.ipv4.tcp_rmem = 4096 12582912 16777216
- net.ipv4.tcp_mem = 187143 249527 1874286
- net.ipv4.tcp_max_syn_backlog = 8192
- net.ipv4.ip_local_port_range = 10240 65535
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2024-05-31T07:47:47Z"
labels:
kops.k8s.io/cluster: k8s.tmp-test.example
name: master-1a
spec:
image: ami-035f7f826413ac489
machineType: t3.medium
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: master-1a
role: Master
rootVolumeSize: 25
subnets:
- kops-k8s-1a
- kops-k8s-eni-1a
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2024-05-31T07:47:47Z"
labels:
kops.k8s.io/cluster: k8s.tmp-test.example
name: master-1b
spec:
image: ami-035f7f826413ac489
machineType: t3.medium
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: master-1b
role: Master
rootVolumeSize: 25
subnets:
- kops-k8s-1b
- kops-k8s-eni-1b
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2024-05-31T07:47:48Z"
labels:
kops.k8s.io/cluster: k8s.tmp-test.example
name: master-1c
spec:
image: ami-035f7f826413ac489
machineType: t3.medium
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: master-1c
role: Master
rootVolumeSize: 25
subnets:
- kops-k8s-1c
- kops-k8s-eni-1c
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2024-05-31T07:47:48Z"
labels:
kops.k8s.io/cluster: k8s.tmp-test.example
name: nodes
spec:
image: ami-035f7f826413ac489
machineType: t3.small
maxSize: 1
minSize: 1
mixedInstancesPolicy:
instances:
- t3.small
onDemandAboveBase: 0
onDemandBase: 0
spotAllocationStrategy: lowest-price
spotInstancePools: 3
nodeLabels:
kops.k8s.io/instancegroup: nodes
role: Node
rootVolumeSize: 25
subnets:
- kops-k8s-1a
- kops-k8s-eni-1a
sysctlParameters:
- net.netfilter.nf_conntrack_max = 1048576
- net.core.netdev_max_backlog = 30000
- net.core.rmem_max = 134217728
- net.core.wmem_max = 134217728
- net.ipv4.tcp_wmem = 4096 87380 67108864
- net.ipv4.tcp_rmem = 4096 87380 67108864
- net.ipv4.tcp_mem = 187143 249527 1874286
- net.ipv4.tcp_max_syn_backlog = 8192
- net.ipv4.ip_local_port_range = 10240 65535
8. Please run the commands with most verbose logging by adding the -v 10
flag.
Paste the logs into this report, or in a gist and provide the gist link here.
9. Anything else do we need to know?
We found a workaround to fix this issue on a single node:
We noticed that the nodelocaldns interface is in a down state on nodes. ( but the same we can observe on older kops versions where node-local-dns works fine)
But after executing
ip link set dev nodelocaldns up
Nodelocaldns interface:
In cilium agent logs on this node we can see:
time="2024-05-31T09:23:08Z" level=info msg="Node addresses updated" device=nodelocaldns node-addresses="169.254.20.10 (nodelocaldns)" subsys=node-address
After these actions, all pods on this node can access node-local-dns without any problems.