-
Notifications
You must be signed in to change notification settings - Fork 241
Error Multiple pods found #107
Comments
The multiple pods error is reported when the cache of Pods (maintained through both a watch against the API server and periodic syncs) has two pods with the same IP address and are in non-completed states (ie. Running or Pending). It's difficult to suggest much to avoid it- it'd be useful to know how frequently this happens. The default sync interval is a minute (which would explain why it corrected)- one option is to increase the frequency of that? |
Hey @pingles Just wanted to make sure it wasn't a known bug - I could try changing the sync interval and see what happens? |
Ah ok cool. I don't believe there's a bug but unfortunately, because of the design and the way Kiam and Kubernetes works, it's a possibility. You could change the sync interval but it's worth keeping an eye on your server load to make sure you're not consuming too much of the API server / etcd resources. |
Yeah I had checked the etcd server and apiserver latencies/resources etc in prometheus - all seemed clear at the time. I will adjust the sync interval and see what happens! cheers! |
We saw that issue once when upgrading our cluster, I'd inadvertently created two different networks that were separate from each other but had the same cidr, so new pods would get created on the new nodes on a different network but with the possibility of having the same ip as an old pod, this resolved itself once the old nodes were gone. |
@Joseph-Irving Interesting, since we're using the VPC cni from Amazon our pods have a "real" ip address inside AWS. I'm thinking this is a symptom of deployments coming up really quickly before the sync-interval hits. Our deployments are pretty quick (some of them come up in just over 30s!) |
The watcher should deliver events between syncs (the sync just ensures that latest state is updated) but I'm not sure if there are especially strong guarantees that events are delivered, or the order. So ideally it shouldn't happen very often but I don't think (given the current implementation/model) its possible to guarantee it won't happen- one of the reasons for opening kubernetes/community#2329. |
Hi,
When we did a RollingUpdate deployment we received the below:
Jul 04 13:46:39 kiam-agent-ltjpw kiam: {"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = multiple running pods found","pod.ip":"172.36.119.98","time":"2018-07-04T12:46:39Z"}
kops + k8s 1.8.1 running AWS VPC CNI.
Any idea what we can do to solve this? it caused credentials for the container to be unavailable for around a minute:
Jul 04 13:46:39 our_pod_with_error:
[Aws\Exception\CredentialsException]
Error retrieving credentials from the instance profile metadata server. (cU
RL error 28: Operation timed out after 1000 milliseconds with 0 bytes recei
ved (see http://curl.haxx.se/libcurl/c/libcurl-errors.html))
Thanks,
The text was updated successfully, but these errors were encountered: