Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes | Pods | Deleting evicted pods #37

Closed
cbharathnoor opened this issue Feb 14, 2019 · 10 comments
Closed

Kubernetes | Pods | Deleting evicted pods #37

cbharathnoor opened this issue Feb 14, 2019 · 10 comments

Comments

@cbharathnoor
Copy link

cbharathnoor commented Feb 14, 2019

Hi,

Pod-reaper is not deleting pods which are in evicted state.
Is it the expected behavior? If yes, then can we have a feature in place which deletes pods which are in evicted state.
Please let us know your inputs. @brianberzins @hblanks

Thanks.

@brianberzins
Copy link
Collaborator

I haven't explicitly tried this with evicted pods. I'll setup a test and see if I can get this work (either as is or with a code change) as soon as I can!

@cbharathnoor
Copy link
Author

@brianberzins sure, appreciate your quick response, it will be really helpful, since kubernetes start evicting pods once the cluster reaches it's max capacity in order to accommodate higher priority pods, resulting in lot of evicted pods in the cluster.

@brianberzins
Copy link
Collaborator

brianberzins commented Feb 15, 2019

Okay. I found a way to replicate this without COMPLETELY messing with a cluster (also because you can't exactly drain a node on a single node minikube setup).

Basically I created a deployment that just ran sleeps and added an emptyDir volume mount, exec-ed into the pod and cat /dev/urandom into the dir until it used up all available space -- after which the pod was evicted. Note, it appears that emptyDir.sizeLimit is not currently be honored as per kubernetes/kubernetes#63641

Now I should be able to test this properly.

-- more details here --
I just confirmed that, much to my surprise, it actually is skipping over Evicted pods. I suspect that this is something preventing them from being returned by call to get pods, since doing an explicit delete pod command (which pod-reaper does) usually actually cleans up evicted pods. More to come, but looks like this will require a code change of some variety.

@brianberzins
Copy link
Collaborator

Alright. I know what's going on.

To summarize reasonably, let's say you run kubectl get pods and get something that looks like this:

NAME                            READY     STATUS      RESTARTS   AGE
busybox-6fc7f6b4cf-ncwhk        0/1       Evicted     0          6h
busybox-6fc7f6b4cf-qnfv2        0/1       Error       0          6h
busybox-6fc7f6b4cf-m6vw6        1/1       Running     0          6h

The STATUS column in this case is populated from 3 different places in code. The CONTAINER_STATUSES option of pod-reaper is currently capable of finding the Error pod because that Error is actually a "container status reason" (specifically a ContainerStateTerminated.Reason).

The Evicted status is different and actually comes directly from the pod (specifically the PodStatus.Reason

So here's the plan: I'm going to make another role specifically for the pod status. The logic is similar, but it's still looking at a different thing despite looking the same from the kubectl get pods output.

I built an image to prove this out. He's the log line of interest:
{"level":"info","msg":"reaping pod","pod":"busybox-6fc7f6b4cf-ncwhk","reasons":["has pod status Evicted"],"time":"2019-02-16T03:38:18Z"}

From here, it's just a matter of adding documentation and a bit of code cleanup. I'm hoping to have this all wrapped up with a new version for you in the next couple hours.

Nice find 👍

@brianberzins
Copy link
Collaborator

@cbharathnoor version 2.3.0 and a new latest include the ability to kill evicted pods!
Readme has been updated to reflect the new configuration you can use to kill those pesky pods: https://github.com/target/pod-reaper#pod-status

Let me know if this works for you! I did a full functional test with the new version and it killed the pod that I forced into an Evicted status

@cbharathnoor
Copy link
Author

cbharathnoor commented Feb 19, 2019

@brianberzins thank you, pod-reaper is able to delete pods which are in "Evicted" state, absolutely working fine !
One observation, when we configure container status along with pod status in a single deployment template (please find the reference template mentioned below), pod-reaper is not able to delete pods based on container statuses. At any point of time, pod-reaper is deleting pods based on either "Pod" status or "Container" status, having different clean up pods for each of them (Pod and Container statuses) may result in resource overhead. Is there a way, where we can have a single pod-reaper in place which in turn will delete pods based on pod status as well as container status?
Kindly share your inputs.

Example:

containers:
      - name: pod-cleanup
        image: target/pod-reaper:2.3.0
        env:
          # Check pod status every 3 minutes
          - name: SCHEDULE
            value: "*/3 * * * *"
          - name: POD_STATUSES
            value: "Evicted"
          - name: CONTAINER_STATUSES
            value: "Completed,Error,ImagePullBackOff,ErrImagePull"
      restartPolicy: Always
      terminationGracePeriodSeconds: 30

@brianberzins
Copy link
Collaborator

This is easily the most counter-intuitive part of pod reaper. In order for a pod to be reaped EVERY loaded rule needs flag the pod. So in order to get some or, I have been running multiple pod-reaper containers (with different configurations) in the same pod.

  containers:
  - image: target/pod-reaper:2.3.0
    name: pod-reaper-pod-status
    env:
      - name: POD_STATUSES
        value: Evicted
      - name: SCHEDULE
        value: "*/3 * * * *"
  - image: target/pod-reaper:2.3.0
    name: pod-reaper-container-status
    env:
      - name: CONTAINER_STATUSES
        value: "Completed,Error,ImagePullBackOff,ErrImagePull"
      - name: SCHEDULE
        value: "*/3 * * * *"

Given that you've been running into quota limits: note that pod-reaper is literally just a linux binary installed on top of a scratch (completely empty container) so you can limit the resources you give it a lot.

Think this will work for you?

@brianberzins
Copy link
Collaborator

@cbharathnoor

Checking in: how's this working for you?

@cbharathnoor
Copy link
Author

@brianberzins Hey, apologies for the late response, i was not around for few days, the above implementation works fine for me. Pod-reaper is able to delete pods based on container and pod statuses. I have tested this out on GKE, pod-reaper behavior looks fine.

@brianberzins
Copy link
Collaborator

@cbharathnoor Awesome!
Glad I could help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants