<hr><center><img src="https://storage.googleapis.com/unskript-website/assets/favicon.png" alt="unSkript.com" width="100" height="100">
<h1 id="unSkript-Runbooks">unSkript Runbooks</h1>
<div class="alert alert-block alert-success">
<h3 id="Objective">Objective</h3>
<br><strong style="color: #000000;"><em>Fix K8s Pod in CrashLoopBack State</em></strong></div>
</center>
<p>&nbsp;</p>
<center>
<h2 id="Terminate-EC2-Instances-Without-Valid-Lifetime-Tag"><u>K8S Pod in CrashLoopBack State</u></h2>
</center>
<h1 id="Steps-Overview">Steps Overview</h1>
<p>1)&nbsp;<a href="#1">Get list of pods in CrashLoopBackOff State</a><br>2)&nbsp;<a href="#2">Gather information of the pod</a><br>3)&nbsp;<a href="#2">Collect pod exit code</a></p>
<p>A <code>CrashLoopBackOff</code> error occurs when a pod startup fails repeatedly in Kubernetes.</p>
<pre><code>When running. a kubectl get pods command, you would see something like this

NAME                     READY     STATUS             RESTARTS   AGE
nginx-7ef9efa7cd-qasd2   0/1       CrashLoopBackOff   2          1m

Or

NAME                     READY     STATUS                  RESTARTS   AGE
pod1-7ef9efa7cd-qasd2    0/2       Init:CrashLoopBackOff   2          1m
</code></pre>
<hr>

<h3 id="Convert-namespace-to-String-if-empty&para;&para;"><a id="1" target="_self" rel="nofollow"></a>Convert namespace to String if empty<a class="jp-InternalAnchorLink" href="#Get-List-of-Pods-in-CrashLoopBackOff-State" target="_self" rel="noopener">&para;</a><a class="jp-InternalAnchorLink" href="../../../../../../files/97ea8f79-ead4-449e-844a-dfc8ed651315/current/%23Get-List-of-Pods-in-ImagePullBackOff-State%C2%B6?_xsrf=2%7C84903cb5%7C0fc688833621afd7a1297198ce4df7c4%7C1673863912#Get-List-of-Pods-in-ImagePullBackOff-State%C2%B6" target="_self" rel="noopener" data-commandlinker-command="rendermime:handle-local-link" data-commandlinker-args="{&quot;path&quot;:&quot;97ea8f79-ead4-449e-844a-dfc8ed651315/current/#Get-List-of-Pods-in-ImagePullBackOff-State&para;&quot;,&quot;id&quot;:&quot;#Get-List-of-Pods-in-ImagePullBackOff-State%C2%B6&quot;}">&para;</a><a class="jp-InternalAnchorLink" href="#Convert-namespace-to-String-if-empty&para;&para;" target="_self">&para;</a></h3>
<p>This custom action changes the type of namespace from None to String only if no namespace is given</p>

In [None]:
if namespace==None:
    namespace=''

<h3 id="Get-List-of-Pods-in-CrashLoopBackOff-State"><a id="1" target="_self" rel="nofollow"></a>Get List of Pods in CrashLoopBackOff State<a class="jp-InternalAnchorLink" href="#Get-List-of-Pods-in-CrashLoopBackOff-State" target="_self">&para;</a></h3>
<p>This action fetches a list of the pods in CrashLoopBack State. This action will consider <code>namespace</code> as&nbsp;<strong> all&nbsp;</strong>if no namespace is given.</p>
<blockquote>
<p>This action takes the following parameters (Optional):&nbsp;<code>namespace</code></p>
</blockquote>
<blockquote>
<p>This action captures the following ouput: <code>crashloopbackoff_pods</code></p>
</blockquote>

In [4]:
#
# Copyright (c) 2022 unSkript.com
# All rights reserved.
#

from pydantic import BaseModel, Field
from typing import Optional, Tuple
from unskript.legos.utils import CheckOutput, CheckOutputStatus
from collections import defaultdict
import json
import pprint
import re

from beartype import beartype
@beartype
def k8s_get_pods_in_crashloopbackoff_state_printer(output):
    if output is None:
        return
    if isinstance(output, CheckOutput):
        print(output.json())
    else:
        pprint.pprint(output)


@beartype
def k8s_get_pods_in_crashloopbackoff_state(handle, namespace: str=None) -> Tuple:
    """k8s_get_pods_in_crashloopbackoff_state executes the given kubectl command to find pods in CrashLoopBackOff State

        :type handle: object
        :param handle: Object returned from the Task validate method

        :type namespace: Optional[str]
        :param namespace: Namespace to get the pods from. Eg:"logging", if not given all namespaces are considered

        :rtype: Status, List of pods in CrashLoopBackOff State
    """
    if handle.client_side_validation != True:
        print(f"K8S Connector is invalid: {handle}")
        return str()
    kubectl_command ="kubectl get pods --all-namespaces | grep CrashLoopBackOff | tr -s ' ' | cut -d ' ' -f 1,2"
    if namespace:
        kubectl_command = "kubectl get pods -n " + namespace + " | grep CrashLoopBackOff | cut -d' ' -f 1 | tr -d ' '"
    response = handle.run_native_cmd(kubectl_command)
    if response is None or hasattr(response, "stderr") is False or response.stderr is None:
        print(
            f"Error while executing command ({kubectl_command}): {response.stderr}")
        return str()
    temp = response.stdout
    result = []
    res = []
    unhealthy_pods =[]
    unhealthy_pods_tuple = ()
    if not namespace:
        all_namespaces = re.findall(r"(\S+).*",temp)
        all_unhealthy_pods = re.findall(r"\S+\s+(.*)",temp)
        unhealthy_pods = [(i, j) for i, j in zip(all_namespaces, all_unhealthy_pods)]
        res = defaultdict(list)
        for key, val in unhealthy_pods:
            res[key].append(val)
    elif namespace:
        all_pods = []
        all_unhealthy_pods =[]
        all_pods = re.findall(r"(\S+).*",temp)
        for p in all_pods:
                unhealthy_pods_tuple = (namespace,p)
                unhealthy_pods.append(unhealthy_pods_tuple)
        res = defaultdict(list)
        for key, val in unhealthy_pods:
            res[key].append(val)
    if len(res)!=0:
        result.append(dict(res))
    if len(result) != 0:
        return (False, result)
    else:
        return (True, None)


task = Task(Workflow())
task.configure(inputParamsJson='''{
    "namespace": "namespace"
    }''')
task.configure(outputName="crashloopbackoff_pods")

task.configure(printOutput=True)
(err, hdl, args) = task.validate(vars=vars())
if err is None:
    task.execute(k8s_get_pods_in_crashloopbackoff_state, lego_printer=k8s_get_pods_in_crashloopbackoff_state_printer, hdl=hdl, args=args)

<h3 id="Examine-the-Events">Create List of commands to get Events<a class="jp-InternalAnchorLink" href="#Examine-the-Events" target="_self">&para;</a></h3>
<p>Examine the output from Step 1👆,&nbsp; and create a list of commands for each pod in a namespace that is found to be in the CrashLoopBackOff State</p>
<blockquote>
<p>This action captures the following ouput:&nbsp;<code>all_unhealthy_pods</code></p>
</blockquote>

In [5]:
all_unhealthy_pods = []
for each_pod_dict in crashloopbackoff_pods:
    if type(each_pod_dict)==list:
        for pod in each_pod_dict:
            for k,v in pod.items():
                if len(v)!=0:
                    nspace = k
                    u_pod = ' '.join([str(each_pod) for each_pod in v])
                    cmd = "kubectl describe pod "+u_pod+" -n "+nspace+" | grep -A 10 Events"
                    all_unhealthy_pods.append(cmd)
print(all_unhealthy_pods)

<h3 id="List-all-AWS-Regions">Gather information of the pods</h3>
<p>This action describes events for a list of unhealthy pods obtained in Step 1.</p>
<blockquote>
<p>This action takes the following parameters (Optional):&nbsp;<code>namespace</code></p>
</blockquote>
<blockquote>
<p>This action captures the following ouput: <code>describe_output</code></p>
</blockquote>

In [6]:
#
# Copyright (c) 2021 unSkript.com
# All rights reserved.
#

from pydantic import BaseModel, Field


from beartype import beartype
@beartype
def k8s_kubectl_command(handle, kubectl_command: str) -> str:

    result = handle.run_native_cmd(kubectl_command)
    if result is None or hasattr(result, "stderr") is False or result.stderr is None:
        print(
            f"Error while executing command ({kubectl_command}): {result.stderr}")
        return None

    return result.stdout


task = Task(Workflow())
task.configure(continueOnError=True)
task.configure(printOutput=True)
task.configure(inputParamsJson='''{
    "kubectl_command": "iter_item"
    }''')
task.configure(iterJson='''{
    "iter_enabled": true,
    "iter_list_is_const": false,
    "iter_list": "all_unhealthy_pods",
    "iter_parameter": "kubectl_command"
    }''')
task.configure(conditionsJson='''{
    "condition_enabled": true,
    "condition_cfg": "len(all_unhealthy_pods)!=0",
    "condition_result": true
    }''')
task.configure(outputName="describe_output")
(err, hdl, args) = task.validate(vars=vars())
if err is None:
    task.execute(k8s_kubectl_command, hdl=hdl, args=args)

if hasattr(task, 'output'):
    if isinstance(task.output, (list, tuple)):
        for item in task.output:
            print(f'item: {item}')
    elif isinstance(task.output, dict):
        for item in task.output.items():
            print(f'item: {item}')
    else:
        print(f'Output for {task.name}')
        print(task.output)
    w.tasks[task.name]= task.output

<h3 id="Examine-the-Events">Convert to String<a class="jp-InternalAnchorLink" href="#Examine-the-Events" target="_self">&para;</a></h3>
<p>From the output from Step 2👆,&nbsp; we convert the dict output to a string format.</p>
<blockquote>
<p>This action captures the following ouput: <code>all_describe_info</code></p>
</blockquote>

In [26]:
import json

all_describe_info = json.dumps(describe_output)
print(all_describe_info)

<h3 id="Examine-the-Events">Examine the Events<a class="jp-InternalAnchorLink" href="#Examine-the-Events" target="_self">&para;</a></h3>
<p>Examine the output from Step 2A👆,&nbsp; and make a note of any containers that have a <code>Back-off restarting failed container</code> in the description.</p>

In [27]:
import re

"""
This Custom Action searches Known errors in the describeOutput variable.
This lego 
"""


def check_msg(msg):
    return re.search(msg, all_describe_info)

if ('describeOutput' not in globals()):
    pass
else:
    print("Processing Events...")
    result = check_msg("Back-off restarting failed container")
    if result is not None:
        print("Confirming the POD(s) is in Back-Off restarting state")

<h3 id="Examine-the-Events">Create List of commands to get Exit Code<a class="jp-InternalAnchorLink" href="#Examine-the-Events" target="_self">&para;</a></h3>
<p>From the output from Step 1👆create a list of commands for each pod in a namespace to get the exit code for each pod to examine the reason of failure.</p>
<blockquote>
<p>This action captures the following ouput: <code>all_pods_exit_code</code></p>
</blockquote>

In [7]:
all_pods_exit_code = []
for x in crashloopbackoff_pods:
    if type(x[1])==list:
        if len(x[1])!=0:
            for pod in x[1]:
                for k,v in pod.items():
                    nspace = k
                    u_pod = ' '.join([str(each_pod) for each_pod in v])
                    cmd = "kubectl describe pod "+u_pod+" -n "+nspace+" | grep \\"+"Exit Code"+"\\"+" | cut -d':' -f 2 | tr -d ' '"
                    all_pods_exit_code.append(cmd)
print(all_pods_exit_code)

<h3 id="Collect-pod-exit-code">Collect pod exit code<a class="jp-InternalAnchorLink" href="#Collect-pod-exit-code" target="_self">&para;</a></h3>
<p>Examine the output from Step 1👆, and look for the Exit Code.</p>
<blockquote>This action captures the following ouput: exit_code</blockquote>

In [31]:
#
# Copyright (c) 2022 unSkript.com
# All rights reserved.
#

from pydantic import BaseModel, Field


from beartype import beartype
@beartype
def k8s_kubectl_command_printer(output):
    if output is None:
        return
    print(output)


@beartype
def k8s_kubectl_command(handle, kubectl_command: str) -> str:
    """k8s_kubectl_command executes the given kubectl command on the pod

        :type handle: object
        :param handle: Object returned from the Task validate method

        :type kubectl_command: str
        :param kubectl_command: The Actual kubectl command, like kubectl get ns, etc..

        :rtype: String, Output of the command in python string format or Empty String in case of Error.
    """
    if handle.client_side_validation != True:
        print(f"K8S Connector is invalid: {handle}")
        return str()

    result = handle.run_native_cmd(kubectl_command)
    if result is None or hasattr(result, "stderr") is False or result.stderr is None:
        print(
            f"Error while executing command ({kubectl_command}): {result.stderr}")
        return str()

    return result.stdout


task = Task(Workflow())
task.configure(continueOnError=True)
task.configure(inputParamsJson='''{
    "kubectl_command": "iter_item"
    }''')
task.configure(iterJson='''{
    "iter_enabled": true,
    "iter_list_is_const": false,
    "iter_list": "all_pods_exit_code",
    "iter_parameter": "kubectl_command"
    }''')
task.configure(conditionsJson='''{
    "condition_enabled": true,
    "condition_cfg": "len(all_pods_exit_code)!=0",
    "condition_result": true
    }''')
task.configure(outputName="exit_code")

task.configure(printOutput=True)
(err, hdl, args) = task.validate(vars=vars())
if err is None:
    task.execute(k8s_kubectl_command, lego_printer=k8s_kubectl_command_printer, hdl=hdl, args=args)

<h3 id="Examine-the-Events">Create List Exit Codes<a class="jp-InternalAnchorLink" href="#Examine-the-Events" target="_self">&para;</a></h3>
<p>From the output from Step 3👆create a list of exit codes&nbsp; to ananlyze in Step 3C.</p>
<blockquote>
<p>This action captures the following ouput: <code>all_exit_code_info</code></p>
</blockquote>

In [45]:
import json
all_exit_code_info = []
for k,v in exit_code.items():
    all_exit_code_info.append(v)
print(all_exit_code_info)

<h3 id="Examine-the-Events">Examine Exit Codes<a class="jp-InternalAnchorLink" href="#Examine-the-Events" target="_self">&para;</a></h3>
<p>Using the exit_codes list from Step 3B👆examine each code.</p>

In [52]:
from IPython.display import Markdown as md

# if repoLocation is not None:
#     display(md(f"**Please verify {repoLocation} is accessible from the K8S POD**"))

if 'all_exit_code_info' not in globals():
    pass
else:
    for ec in all_exit_code_info:
        if ec is None or len(ec)==0:
            exitCode = 323400
        if ec is not None or len(ec)!=0:
            exitCode = int(ec)
        if exitCode == 0:
            display(md("Exit code 0 implies that the specified container command completed"))
            display(md("Successfully, but too often for Kubernetes to accept as working."))
            display(md(""))
            display(md("Did you fail to specify a command in the POD Spec, and the container ran"))
            display(md("a default shell command that failed? If so, you will need to fix the command"))
        elif exitCode == 1:
            display(md("The container failed to run its command successfully, and returned"))
            display(md("an exit code 1. This is an application failure within the process"))
            display(md("that was started, but return with a failing exit code some time after."))
            display(md(""))
            display(md("If this is happening only with all pods running on your cluster, then"))
            display(md("there may be a problem with your nodes. Check Nodes are OK on your cluster"))
            display(md("with kubectl get nodes -o wide command"))
        elif exitCode == 2:
            display(md("An exit code of 2 indicates either that the application chose to return"))
            display(md("that error code, or there was a misuse of a shell builtin. Check your"))
            display(md("pod's command specification to ensure that the command is correct."))
            display(md("If you think it is correct, try running the image locally with a shell"))
            display(md("and run the command directly."))
        elif exitCode == 128:
            display(md("An exit code of 128 indicates that the container could not run. Check this"))
            display(md("by kubectl describe pod command, check to see if LastState Reason is"))
            display(md("ContainerCannotRun."))
        elif exitCode == 137:
            display(md("This indicates that the container was killed with Signal 9"))
            display(md("This can be due to One of these reasons:"))
            display(md("    1. Container ran out of Memory"))
            display(md("    2. The OOMKiller killed the container"))
            display(md("    3. The liveness probe failed. Check liveness and readiness probes"))
        else:
            display(md("Some common application problem to consider are"))
            display(md("    1. Priveleged access to function. By setting allowPrivelegeEscalation"))
            display(md("    2. SELinux or AppArmor controls may be preventing your application to run"))
        

    display(md(">You can use kubectl get pods command to verify after you fix the issue"))

<h3 id="Conclusion">Conclusion<a class="jp-InternalAnchorLink" href="#Conclusion" target="_self">&para;</a></h3>
<p>In this Runbook, we were able to identify pods stuck in CrashLoopBackOff State and examined the possible event that caused it's failure using unSkript's K8s actions. To view the full platform capabilities of unSkript please visit <a href="https://us.app.unskript.io" target="_blank" rel="noopener">us.app.unskript.io</a></p>