***
<img src="https://unskript.com/assets/favicon.png" alt="unSkript.com" width="100" height="100"/> 
<h1> unSkript Runbooks </h1>
<div class="alert alert-block alert-success">
    <b> This runbook demonstrates the usage of K8S kubectl action.
    Using this action, we can query the K8S cluster and find out pod(s) stuck in CrashLoopBackOff state.
    </b>
</div>

<br>
<center><h2>K8S Pod stuck in CrashLoopBackOff State</h2></center>

A `CrashLoopBackOff` error occurs when a pod startup fails repeatedly in Kubernetes.

    When running. a kubectl get pods command, you would see something like this
    
    NAME                     READY     STATUS             RESTARTS   AGE
    nginx-7ef9efa7cd-qasd2   0/1       CrashLoopBackOff   2          1m
    
    Or
    
    NAME                     READY     STATUS                  RESTARTS   AGE
    pod1-7ef9efa7cd-qasd2    0/2       Init:CrashLoopBackOff   2          1m
    
### Initial Steps
    1. Create List of Pods in CrashLoopBackOff State
    2. Gather Events information for these Pods
    3. Examine Events
    4. Collect Pod Exit Code
    4.1. Examine Exit Code


The original runbook is written by [Ian Miell](https://containersolutions.github.io/runbooks/posts/kubernetes/crashloopbackoff/)
***

### 1 Create List of Pods in CrashLoopBackOff State

Here we will use unSkript's `k8s_kubectl_command` Lego to execute `kubectl get pods -n {namespace} | grep CrashLoopBackOff`
Doing so we generate a list of PODS that are in `CrashLoopBackOff` state. Once the List is generated, it is saved as
`unhealthyPods`, which we shall use it later in following cell.

In [2]:
#
# Copyright (c) 2021 unSkript.com
# All rights reserved.
#

from pydantic import BaseModel, Field


from beartype import beartype

def k8s_kubectl_command_printer(output):
    if output is None:
        return
    print(output)

@beartype
def k8s_kubectl_command(handle, kubectl_command: str) -> str:

    result = handle.run_native_cmd(kubectl_command)
    if result is None or hasattr(result, "stderr") is False or result.stderr is None:
        print(
            f"Error while executing command ({kubectl_command}): {result.stderr}")
        return None

    return result.stdout


task = Task(Workflow())
task.configure(inputParamsJson='''{
    "kubectl_command": "f\\"kubectl get pods -n {namespace} | grep -i CrashLoopBackOff | cut -d' ' -f 1 | tr -d ' '\\" "
    }''')
task.configure(outputName="unhealthyPods")
(err, hdl, args) = task.validate(vars=vars())
if err is None:
    task.execute(k8s_kubectl_command, hdl=hdl, args=args, lego_printer=k8s_kubectl_command_printer)

### 2 Gather Event information for these Pods

Here we use unSkript `k8s_kubectl_command` to perform `kubectl describe pod` for each of the
`unhealthyPods`, Grep the Events portion of the output and store it  as `describeOutput` variable.

This cell also uses unSkript framework's `Start Condition` Feature. 

In [3]:
#
# Copyright (c) 2021 unSkript.com
# All rights reserved.
#

from pydantic import BaseModel, Field


from beartype import beartype

def k8s_kubectl_command_printer(output):
    if output is None:
        return
    print(output)

@beartype
def k8s_kubectl_command(handle, kubectl_command: str) -> str:

    result = handle.run_native_cmd(kubectl_command)
    if result is None or hasattr(result, "stderr") is False or result.stderr is None:
        print(
            f"Error while executing command ({kubectl_command}): {result.stderr}")
        return None

    return result.stdout


task = Task(Workflow())
task.configure(inputParamsJson='''{
    "kubectl_command": "f\\"kubectl describe pod {unhealthyPods.strip()} -n {namespace} | grep -A 10 Events\\" "
    }''')
task.configure(conditionsJson='''{
    "condition_enabled": true,
    "condition_cfg": "unhealthyPods is not ''",
    "condition_result": true
    }''')
task.configure(outputName="describeOutput")
(err, hdl, args) = task.validate(vars=vars())
if err is None:
    task.execute(k8s_kubectl_command, hdl=hdl, args=args, lego_printer=k8s_kubectl_command_printer)


### 3 Examine Events

Here we construct a `custom cell` where we examine the `describeOutput` variable from step-2.

In [4]:
import re

"""
This Custom Action searches Known errors in the describeOutput variable.
This lego 
"""


def check_msg(msg):
    return re.search(msg, describeOutput)

if ('describeOutput' not in globals()):
    pass
else:
    print("Processing Events...")
    result = check_msg("Back-off restarting failed container")
    if result is not None:
        print("Confirming the POD(s) is in Back-Off restarting state")

### 4 Collect Pod Exit Code

Here again we use unSkript `k8s_kubectl_command` lego to perform `kubectl describe pod` and
extract the Pod Exit Code. Pod Exit Code gives us clues as to why the Pod was stuck in
the `CrashLoopBackOff` State. We use the `Start Condition` Feature as well in this cell.

In [5]:
#
# Copyright (c) 2021 unSkript.com
# All rights reserved.
#

from pydantic import BaseModel, Field


from beartype import beartype

def k8s_kubectl_command_printer(output):
    if output is None:
        return
    print(output)

@beartype
def k8s_kubectl_command(handle, kubectl_command: str) -> str:

    result = handle.run_native_cmd(kubectl_command)
    if result is None or hasattr(result, "stderr") is False or result.stderr is None:
        print(
            f"Error while executing command ({kubectl_command}): {result.stderr}")
        return None

    return result.stdout


task = Task(Workflow())
task.configure(inputParamsJson='''{
    "kubectl_command": "f\\"kubectl describe pod {unhealthyPods.strip()} -n {namespace} | grep \\"Exit Code\\" | cut -d':' -f 2 | tr -d ' '\\""
    }''')
task.configure(conditionsJson='''{
    "condition_enabled": true,
    "condition_cfg": "unhealthyPods is not ''",
    "condition_result": true
    }''')
task.configure(outputName="exitCode")

(err, hdl, args) = task.validate(vars=vars())
if err is None:
    task.execute(k8s_kubectl_command, hdl=hdl, args=args, lego_printer=k8s_kubectl_command_printer)

### 4.1 Examine the Exit Code

In this `custom cell` we go through all possible values of the exit-code and identify
what are the possible causes for the Pod to be in the errored state. We also display 
Helpful information related to each exit-code. 

In [6]:
from IPython.display import Markdown as md

# if repoLocation is not None:
#     display(md(f"**Please verify {repoLocation} is accessible from the K8S POD**"))

if 'exitCode' not in globals():
    pass
else:
    exitCode = int(exitCode)
    if exitCode == 0:
        display(md("Exit code 0 implies that the specified container command completed"))
        display(md("Successfully, but too often for Kubernetes to accept as working."))
        display(md(""))
        display(md("Did you fail to specify a command in the POD Spec, and the container ran"))
        display(md("a default shell command that failed? If so, you will need to fix the command"))
    elif exitCode == 1:
        display(md("The container failed to run its command successfully, and returned"))
        display(md("an exit code 1. This is an application failure within the process"))
        display(md("that was started, but return with a failing exit code some time after."))
        display(md(""))
        display(md("If this is happening only with all pods running on your cluster, then"))
        display(md("there may be a problem with your nodes. Check Nodes are OK on your cluster"))
        display(md("with kubectl get nodes -o wide command"))
    elif exitCode == 2:
        display(md("An exit code of 2 indicates either that the application chose to return"))
        display(md("that error code, or there was a misuse of a shell builtin. Check your"))
        display(md("pod's command specification to ensure that the command is correct."))
        display(md("If you think it is correct, try running the image locally with a shell"))
        display(md("and run the command directly."))
    elif exitCode == 128:
        display(md("An exit code of 128 indicates that the container could not run. Check this"))
        display(md("by kubectl describe pod command, check to see if LastState Reason is"))
        display(md("ContainerCannotRun."))
    elif exitCode == 137:
        display(md("This indicates that the container was killed with Signal 9"))
        display(md("This can be due to One of these reasons:"))
        display(md("    1. Container ran out of Memory"))
        display(md("    2. The OOMKiller killed the container"))
        display(md("    3. The liveness probe failed. Check liveness and readiness probes"))
    else:
        display(md("Some common application problem to consider are"))
        display(md("    1. Priveleged access to function. By setting allowPrivelegeEscalation"))
        display(md("    2. SELinux or AppArmor controls may be preventing your application to run"))
        

    display(md(">You can use kubectl get pods command to verify after you fix the issue"))

### Conclusion

This Runbook demonstrated the use of unSkript `k8s_kubectl_command` Lego with `Start Condition` feature to
come up with a non-trivial workflow. You can find more such Runbooks and useful information about
the Platform capabilities at `https://unskript.com` 