Skip to content
This repository has been archived by the owner on Mar 5, 2024. It is now read-only.

Error Multiple pods found #107

Closed
vpm-bradleyhession opened this issue Jul 4, 2018 · 7 comments
Closed

Error Multiple pods found #107

vpm-bradleyhession opened this issue Jul 4, 2018 · 7 comments

Comments

@vpm-bradleyhession
Copy link

Hi,

When we did a RollingUpdate deployment we received the below:

Jul 04 13:46:39 kiam-agent-ltjpw kiam: {"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = multiple running pods found","pod.ip":"172.36.119.98","time":"2018-07-04T12:46:39Z"}

kops + k8s 1.8.1 running AWS VPC CNI.

Any idea what we can do to solve this? it caused credentials for the container to be unavailable for around a minute:

Jul 04 13:46:39 our_pod_with_error:
[Aws\Exception\CredentialsException]
Error retrieving credentials from the instance profile metadata server. (cU
RL error 28: Operation timed out after 1000 milliseconds with 0 bytes recei
ved (see http://curl.haxx.se/libcurl/c/libcurl-errors.html))

Thanks,

@pingles
Copy link
Contributor

pingles commented Jul 4, 2018

The multiple pods error is reported when the cache of Pods (maintained through both a watch against the API server and periodic syncs) has two pods with the same IP address and are in non-completed states (ie. Running or Pending).

It's difficult to suggest much to avoid it- it'd be useful to know how frequently this happens. The default sync interval is a minute (which would explain why it corrected)- one option is to increase the frequency of that?

@vpm-bradleyhession
Copy link
Author

Hey @pingles
Thanks for the prompt response! This is the only occurrence we've had so far of this (running kiam from kube2iam for about 3months now).

Just wanted to make sure it wasn't a known bug - I could try changing the sync interval and see what happens?

@pingles
Copy link
Contributor

pingles commented Jul 4, 2018

Ah ok cool. I don't believe there's a bug but unfortunately, because of the design and the way Kiam and Kubernetes works, it's a possibility.

You could change the sync interval but it's worth keeping an eye on your server load to make sure you're not consuming too much of the API server / etcd resources.

@vpm-bradleyhession
Copy link
Author

Yeah I had checked the etcd server and apiserver latencies/resources etc in prometheus - all seemed clear at the time.

I will adjust the sync interval and see what happens! cheers!

@Joseph-Irving
Copy link
Contributor

We saw that issue once when upgrading our cluster, I'd inadvertently created two different networks that were separate from each other but had the same cidr, so new pods would get created on the new nodes on a different network but with the possibility of having the same ip as an old pod, this resolved itself once the old nodes were gone.

@vpm-bradleyhession
Copy link
Author

@Joseph-Irving Interesting, since we're using the VPC cni from Amazon our pods have a "real" ip address inside AWS. I'm thinking this is a symptom of deployments coming up really quickly before the sync-interval hits. Our deployments are pretty quick (some of them come up in just over 30s!)

@pingles
Copy link
Contributor

pingles commented Jul 5, 2018

The watcher should deliver events between syncs (the sync just ensures that latest state is updated) but I'm not sure if there are especially strong guarantees that events are delivered, or the order. So ideally it shouldn't happen very often but I don't think (given the current implementation/model) its possible to guarantee it won't happen- one of the reasons for opening kubernetes/community#2329.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants