<center><img src="https://storage.googleapis.com/unskript-website/assets/favicon.png" alt="unSkript.com" width="100" height="100">
<h1 id="-unSkript-Runbooks-">unSkript Runbooks&nbsp;</h1>
<div class="alert alert-block alert-success">
<h3 id="-Objective">Objective</h3>
<br><strong style="color: #000000;"><em>To identify and delete failing Kubernetes pods from jobs to mitigate IP exhaustion issues in the cluster.</em></strong></div>
</center>
<p>&nbsp;</p>
<center>
<h2 id="Delete-Evicted-Pods-From-Namespaces">IP Exhaustion Mitigation: Failing K8s Pod Deletion from Jobs</h2>
</center>
<h1 id="Steps-Overview">Steps Overview</h1>
<p>1)<a href="#1" target="_self" rel="noopener"> Get failing pods from all jobs.</a><br>2)<a href="#2" target="_self" rel="noopener"> Delete the pod&nbsp;</a></p>

In [None]:
if namespace is None:
    namespace = ''

<h3 id="Show-All-Evicted-Pods-From-All-Namespaces"><a id="1" target="_self" rel="nofollow"></a>Get failing Pods From all jobs</h3>
<p>If a job doesn&rsquo;t exit cleanly (whether it finished successfully or not) the pod is left in a terminated or errored state. After some rounds of runs, these extra pods can quickly exhaust iptables&rsquo; available IP addresses in the cluster. This action fetches all the pods that are not in the running state from a scheduled job.</p>
<blockquote>
<p>Input parameters: <code>namespace (Optional)</code></p>
</blockquote>
<blockquote>
<p>Output variable: <code>unhealthy_pods</code></p>
</blockquote>

In [None]:
#
# Copyright (c) 2023 unSkript.com
# All rights reserved.
#

import pprint
from typing import Tuple, Optional
from pydantic import BaseModel, Field
from kubernetes.client.rest import ApiException
import json


from beartype import beartype
@beartype
def k8s_get_error_pods_from_all_jobs_printer(output):
    if output is None:
        return
    pprint.pprint(output)


@beartype
def k8s_get_error_pods_from_all_jobs(handle, namespace: str = '') -> Tuple:
    """k8s_get_error_pods_from_all_jobs This check function uses the handle's native command
       method to execute a pre-defined kubectl command and returns the output of list of error pods
       from all jobs.

       :type handle: Object
       :param handle: Object returned from the task.validate(...) function

       :rtype: Tuple Result in tuple format.
    """
    action_op = []
    if handle.client_side_validation is not True:
        raise ApiException(f"K8S Connector is invalid {handle}")

    if not namespace:
        kubectl_command = f"kubectl get jobs --all-namespaces -o json"
    else:
        kubectl_command = f"kubectl get jobs -n {namespace} -o json"
    result = handle.run_native_cmd(kubectl_command)
    if result.stderr:
        raise ApiException(f"Error occurred while executing command {kubectl_command} {result.stderr}")
    job_names = []
    if result.stdout:
        op = json.loads(result.stdout)
        for jobs in op["items"]:
            job_dict = {}
            job_dict["job_name"] = jobs["metadata"]["name"]
            job_dict["namespace"] = jobs["metadata"]["namespace"]
            job_names.append(job_dict)
    if job_names:
        for job in job_names:
            command = f"""kubectl get pods --selector=job-name={job["job_name"]} -n {job["namespace"]} --field-selector=status.phase!=Running -o json"""
            pod_result = handle.run_native_cmd(kubectl_command)
            if pod_result.stderr:
                raise ApiException(f"Error occurred while executing command {command} {pod_result.stderr}")
            job_names = []
            if pod_result.stdout:
                pod_op = json.loads(pod_result.stdout)
                for pods in pod_op["items"]:
                    pod_dict = {}
                    pod_dict["job_name"] = job["job_name"]
                    pod_dict["namespace"] = job["namespace"]
                    pod_dict["pod_name"] = pods["metadata"]["name"]
                    action_op.append(pod_dict)
    if len(action_op) != 0:
        return (False, action_op)
    else:
        return (True, None)


task = Task(Workflow())
task.configure(inputParamsJson='''{
    "namespace": "namespace"
    }''')
task.configure(outputName="unhealthy_pods")

task.configure(printOutput=True)
(err, hdl, args) = task.validate(vars=vars())
if err is None:
    task.execute(k8s_get_error_pods_from_all_jobs, lego_printer=k8s_get_error_pods_from_all_jobs_printer, hdl=hdl, args=args)

<h3 id="Create-List-of-Old-EBS-Snapshots">Create list of errored pods</h3>
<p>This action gets a list of all&nbsp; objects from the output of Step 1</p>
<blockquote>
<p>This action takes the following parameters: <code>None</code></p>
</blockquote>
<blockquote>
<p>This action captures the following output: <code>all_uhealthy_pods</code></p>
</blockquote>

In [None]:
all_unhealthy_pods = []
if unhealthy_pods[0] == False:
        if len(unhealthy_pods[1])!=0:
            all_unhealthy_pods=unhealthy_pods[1]
print(all_unhealthy_pods)
task.configure(outputName="all_unhealthy_pods")

<h3 id="Delete-Evicted-Pods-From-All-Namespaces"><a id="2" target="_self" rel="nofollow"></a>Delete the Pod</h3>
<p>This action deletes the pods found in Step 1.</p>
<blockquote>
<p>Input parameters: <code>pod_name, namespace</code></p>
</blockquote>
<blockquote>
<p>Output paramerters:<span style="font-family: monospace;"> None</span></p>
</blockquote>

In [None]:
#
# Copyright (c) 2021 unSkript.com
# All rights reserved.
#
import pprint
from typing import Dict
from pydantic import BaseModel, Field
from kubernetes import client
from kubernetes.client.rest import ApiException

from beartype import beartype
@beartype
def k8s_delete_pod_printer(output):
    if output is None:
        return

    pprint.pprint(output)


@beartype
def k8s_delete_pod(handle, namespace: str, podname: str):
    """k8s_delete_pod delete a Kubernetes POD in a given Namespace

        :type handle: object
        :param handle: Object returned from the Task validate method

        :type namespace: str
        :param namespace: Kubernetes namespace

        :type podname: str
        :param podname: K8S Pod Name

        :rtype: Dict of POD info
    """
    coreApiClient = client.CoreV1Api(api_client=handle)

    try:
        resp = coreApiClient.delete_namespaced_pod(
            name=podname, namespace=namespace)
    except ApiException as e:
        resp = 'An Exception occured while executing the command' + e.reason

    return resp


task = Task(Workflow())
task.configure(continueOnError=True)
task.configure(inputParamsJson='''{
    "podname": "iter.get(\\"pod_name\\")",
    "namespace": "iter.get(\\"namespace\\")"
    }''')
task.configure(iterJson='''{
    "iter_enabled": true,
    "iter_list_is_const": false,
    "iter_list": "all_unhealthy_pods",
    "iter_parameter": ["podname","namespace"]
    }''')

task.configure(printOutput=True)
(err, hdl, args) = task.validate(vars=vars())
if err is None:
    task.execute(k8s_delete_pod, lego_printer=k8s_delete_pod_printer, hdl=hdl, args=args)

<p>This runbook addressed the issue of failing Kubernetes pods in jobs that were leading to IP exhaustion. By following the steps outlined in this runbook, the failing pods were identified and deleted, preventing further IP exhaustion. Regular monitoring and proactive deletion of failing pods from jobs are crucial to maintaining the stability and availability of the Kubernetes cluster. Implementing this runbook as part of the operational processes will help ensure efficient resource utilization and minimize disruptions caused by IP exhaustion. To view the full platform capabilities of unSkript please visit <a href="https://us.app.unskript.io" target="_blank" rel="noopener">https://us.app.unskript.io</a></p>