Error Multiple pods found #107

vpm-bradleyhession · 2018-07-04T13:17:01Z

Hi,

When we did a RollingUpdate deployment we received the below:

Jul 04 13:46:39 kiam-agent-ltjpw kiam: {"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = multiple running pods found","pod.ip":"172.36.119.98","time":"2018-07-04T12:46:39Z"}

kops + k8s 1.8.1 running AWS VPC CNI.

Any idea what we can do to solve this? it caused credentials for the container to be unavailable for around a minute:

Jul 04 13:46:39 our_pod_with_error:
[Aws\Exception\CredentialsException]
Error retrieving credentials from the instance profile metadata server. (cU
RL error 28: Operation timed out after 1000 milliseconds with 0 bytes recei
ved (see http://curl.haxx.se/libcurl/c/libcurl-errors.html))

Thanks,

pingles · 2018-07-04T14:24:54Z

The multiple pods error is reported when the cache of Pods (maintained through both a watch against the API server and periodic syncs) has two pods with the same IP address and are in non-completed states (ie. Running or Pending).

It's difficult to suggest much to avoid it- it'd be useful to know how frequently this happens. The default sync interval is a minute (which would explain why it corrected)- one option is to increase the frequency of that?

vpm-bradleyhession · 2018-07-04T14:27:45Z

Hey @pingles
Thanks for the prompt response! This is the only occurrence we've had so far of this (running kiam from kube2iam for about 3months now).

Just wanted to make sure it wasn't a known bug - I could try changing the sync interval and see what happens?

pingles · 2018-07-04T14:36:17Z

Ah ok cool. I don't believe there's a bug but unfortunately, because of the design and the way Kiam and Kubernetes works, it's a possibility.

You could change the sync interval but it's worth keeping an eye on your server load to make sure you're not consuming too much of the API server / etcd resources.

vpm-bradleyhession · 2018-07-04T14:40:23Z

Yeah I had checked the etcd server and apiserver latencies/resources etc in prometheus - all seemed clear at the time.

I will adjust the sync interval and see what happens! cheers!

Joseph-Irving · 2018-07-04T14:42:25Z

We saw that issue once when upgrading our cluster, I'd inadvertently created two different networks that were separate from each other but had the same cidr, so new pods would get created on the new nodes on a different network but with the possibility of having the same ip as an old pod, this resolved itself once the old nodes were gone.

vpm-bradleyhession · 2018-07-04T14:45:25Z

@Joseph-Irving Interesting, since we're using the VPC cni from Amazon our pods have a "real" ip address inside AWS. I'm thinking this is a symptom of deployments coming up really quickly before the sync-interval hits. Our deployments are pretty quick (some of them come up in just over 30s!)

pingles · 2018-07-05T17:00:45Z

The watcher should deliver events between syncs (the sync just ensures that latest state is updated) but I'm not sure if there are especially strong guarantees that events are delivered, or the order. So ideally it shouldn't happen very often but I don't think (given the current implementation/model) its possible to guarantee it won't happen- one of the reasons for opening kubernetes/community#2329.

Joseph-Irving closed this as completed Mar 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error Multiple pods found #107

Error Multiple pods found #107

vpm-bradleyhession commented Jul 4, 2018

pingles commented Jul 4, 2018

vpm-bradleyhession commented Jul 4, 2018

pingles commented Jul 4, 2018

vpm-bradleyhession commented Jul 4, 2018

Joseph-Irving commented Jul 4, 2018

vpm-bradleyhession commented Jul 4, 2018

pingles commented Jul 5, 2018

Error Multiple pods found #107

Error Multiple pods found #107

Comments

vpm-bradleyhession commented Jul 4, 2018

pingles commented Jul 4, 2018

vpm-bradleyhession commented Jul 4, 2018

pingles commented Jul 4, 2018

vpm-bradleyhession commented Jul 4, 2018

Joseph-Irving commented Jul 4, 2018

vpm-bradleyhession commented Jul 4, 2018

pingles commented Jul 5, 2018