Join GitHub today
Node deletion does not clear up the IPs #3372
Node on getting deleted/terminated by autoscaler, or manual termination should clear up the Pod IPs. But it is not happening.
Related - #2797 (comment)
What you expected to happen?
I expected the dead nodes IPs to get cleared
Expected IPs to get cleared for
The nodes were deleted but still showed the IPs as not cleared
How to reproduce it?
Delete a node
Anything else we need to know?
@bboreham I would like to contribute to fixing the problem here. Let me know if i can take this up.
@alok87 I don't think there is any impact.
They are not exactly unreachable. Once a node goes down, depending on nature of the deployment pods will get rescheduled to other nodes. Application will continue to work.
Once a new node joins up, there will be readjustment (reclaim unused/unreachable ip address range) so 100% of subnet is usable for pod's.
For e.g, with kops provisioned cluster with auto-scale group set to minimum of 3 instances . this is the ipam status I see once i delete a node, and after node is re-provisioned.
Do you see any thing problematic?
From our experience, there is an impact. In case then pods are already allocated with IPs from some subnet block and after that, node responsible for this subnet goes down - pods are not able to communicate with others anymore.
@bboreham Sure i will get back with more info if I find a node provisioned with the same IP and its ips are not reclaimed.
But is not such nodes lying there in the IPAM ring a problem. Is not it a problem if other peers could not connect too so many dead nodes in their peer to peer connection list.
No solid evidence yet but what we have observed is when we cross 50 nodes and there so many unreachable nodes the network overhead by the kubernetes n/w tend to increase by around 20ms.
I will come back with the true evidence for this observation.
You showed one node, not "so many".
I know of no mechanism to connect unreachable nodes to packet latency.
I'll close this for now; please re-open or open a new issue when you have evidence of a problem.
This was referenced
Aug 23, 2018
We were facing 4-5 hours of increased latency in the kubernetes network. Spent couple of hours fighting it.
Looks like the deleted nodes if not removed from weave results in network latency. Not really sure it should or it should not but removing did work for us.
How did you come to this conclusion? Is this something easily reproducible and what scale?
In general dealing with deleted node is a control-plane aspect of Weave I dont see any reason why it should have any impact on data-plane.