<center><img src="https://storage.googleapis.com/unskript-website/assets/favicon.png" alt="unSkript.com" width="100" height="100">
<h1 id="-unSkript-Runbooks-">unSkript Runbooks</h1>
<div class="alert alert-block alert-success">
<h3 id="-Objective">Objective</h3>
<br><strong style="color: #000000;"><em>Perform rolling restart for a node in an Elasticsearch cluster</em></strong></div>
</center>
<p>&nbsp;</p>
<center>
<h2 id="Elasticsearch-Rolling-Restart"><u>Elasticsearch Rolling Restart</u></h2>
</center>
<h1 id="Steps-Overview">Steps Overview</h1>
<p>1) <a href="Cluster%20Health%20Check">Cluster Health Check</a><br>2)&nbsp;<a href="#1" target="_self" rel="noopener">Disable shard allocation</a><br>3)<a href="#2" target="_self" rel="noopener"> Shut down node</a><br>4)<a href="#3" target="_self" rel="noopener"> Perform changes/ maintenance</a><br>5)<a href="#4" target="_self" rel="noopener"> Start the node</a><br>6)<a href="#5" target="_self" rel="noopener"> Reenable shard allocation</a><br>7)<a href="#6" target="_self" rel="noopener">&nbsp;</a><a href="Cluster%20Health%20Check">Cluster Health Check</a></p>

<h3 id="Check-Cluster-Health&para;"><a id="6" target="_self" rel="nofollow"></a>Check Cluster Health</h3>
<p>This action checks the status of an Elasticsearch cluster to trigger a rolling restart for the cluster. Ideally, the cluster should show <span style="color: green;">Green/ None</span> in which case Step 2 will not be triggered. These are the cluster statuses that you may encounter-</p>
<ol>
<li>Unassigned primary shards = <span style="color: red;">Red</span> Status</li>
<li>Unassigned replica shards = <span style="color: #ffbf00;">Yellow</span> Status</li>
<li>All shards assigned = <span style="color: green;">Green</span> Status which will return <span style="color: rgb(45, 194, 107);">None</span></li>
</ol>
<blockquote>
<p>This action takes the following parameters: <code>None</code></p>
</blockquote>

In [10]:
##
# Copyright (c) 2021 unSkript, Inc
# All rights reserved.
##
import subprocess
import pprint
from pydantic import BaseModel, Field
from typing import Dict, Tuple
from subprocess import PIPE
import json


from beartype import beartype
@beartype
def elasticsearch_check_health_status_printer(output):
    if output is None:
        return
    print(output)


@beartype
def elasticsearch_check_health_status(handle) -> Tuple:
    result = []
    cluster_health ={}
    """elasticsearch_check_health_status checks the status of an Elasticsearch cluster .

            :type handle: object
            :param handle: Object returned from Task Validate

            :rtype: Result Dict of result
    """

    output = handle.web_request("/_cluster/health?pretty",  # Path
                                "GET",                      # Method
                                None)                       # Data
    if output['status'] != 'green':
        cluster_health[output['cluster_name']] = output['status'] 
        result.append(cluster_health)
    if len(result) != 0:
        return(False, result)
    else:
        return(True, None)


task = Task(Workflow())
task.configure(outputName="cluster_health")

task.configure(credentialsJson='''{
    "credential_name": "DevESCred",
    "credential_type": "CONNECTOR_TYPE_ELASTICSEARCH",
    "credential_id": "4d5b3053-55fc-4b17-9c7b-27b86c35503f"
}''')
task.configure(printOutput=True)
(err, hdl, args) = task.validate(vars=vars())
if err is None:
    task.execute(elasticsearch_check_health_status, lego_printer=elasticsearch_check_health_status_printer, hdl=hdl, args=args)

In [12]:
cluster_health_status = ''
for cluster in cluster_health:
    if type(cluster)==list:
        if len(cluster)!=0:
            for x in cluster:
                for status in x.values():
                    cluster_health_status= status
    else:
        cluster_health_status = 'None'
print(cluster_health_status)

<h3 id="Disable-Shard-Allocation"><a id="1" target="_self" rel="nofollow"></a>Disable Shard Allocation<a class="jp-InternalAnchorLink" href="#Disable-Shard-Allocation" target="_self">&para;</a></h3>
<p>Using unSkript's Elasticsearch Disable Shard Allocation action we can disable shard allocation to avoid rebalancing of missing shards while the node shutdown process is in progress. This step ensures that no new shards are assigned till the node restarts.</p>
<blockquote>
<p>This action takes the following parameters: <code>None</code></p>
</blockquote>

In [12]:
##
# Copyright (c) 2021 unSkript, Inc
# All rights reserved.
##
import subprocess
import pprint
from pydantic import BaseModel, Field
from typing import List, Dict
from subprocess import PIPE, run
import json


from beartype import beartype
@beartype
def elasticsearch_disable_shard_allocation_printer(output):
    if output is None:
        return
    print("Shard allocations disabled for any kind shards")
    print(output)


@beartype
def elasticsearch_disable_shard_allocation(handle) -> Dict:
    """elasticsearch_disable_shard_allocation disallows shard allocations for any indices.

            :type handle: object
            :param handle: Object returned from Task Validate

            :rtype: Result Dict of result
    """

    es_dict = {"transient": {"cluster.routing.allocation.enable": "none"}}
    output = handle.web_request("/_cluster/settings?pretty",  # Path
                                "PUT",                        # Method
                                es_dict)                      # Data

    return output


task = Task(Workflow())
task.configure(conditionsJson='''{
    "condition_enabled": true,
    "condition_cfg": "cluster_health_status!='None'",
    "condition_result": true
    }''')

task.configure(printOutput=True)
(err, hdl, args) = task.validate(vars=vars())
if err is None:
    task.execute(elasticsearch_disable_shard_allocation, lego_printer=elasticsearch_disable_shard_allocation_printer, hdl=hdl, args=args)

<h3><a id='2'>Shut down node</a></h3>
unSkript's SSH Execute Remote Command action can be used to shut down a single node by sshing on the IP of the node and executing the command to stop the Elasticsearch service.

   If you are running Elasticsearch with systemd:

    sudo systemctl stop elasticsearch.service

   If you are running Elasticsearch with SysV init:

    sudo -i service elasticsearch stop
    
>This action takes the following parameters: `host_for_ssh`(takes List of hosts but we need only one), `command_stop_elasticsearch`, `run_with_sudo`


In [37]:
##
# Copyright (c) 2021 unSkript, Inc
# All rights reserved.
##
from pydantic import BaseModel, Field
from typing import List, Optional, Dict
import pprint


from beartype import beartype
@beartype
def ssh_execute_remote_command_printer(output):
    if output is None:
        return
    print("Elasticsearch Service successfully STOPPED")
    print("\n")
    pprint.pprint(output)


@beartype
def ssh_execute_remote_command(sshClient, hosts: List[str], command: str, sudo: bool = False) -> Dict:

    client = sshClient(hosts)
    runCommandOutput = client.run_command(command=command, sudo=sudo)
    client.join()
    res = {}

    for host_output in runCommandOutput:
        hostname = host_output.host
        output = []
        for line in host_output.stdout:
            output.append(line)

        o = "\n".join(output)
        res[hostname] = o

    return res


task = Task(Workflow())
task.configure(printOutput=True)
task.configure(inputParamsJson='''{
    "command": "cmd_stop_elasticsearch",
    "hosts": "host_for_ssh",
    "sudo": "run_with_sudo"
    }''')
task.configure(conditionsJson='''{
    "condition_enabled": true,
    "condition_cfg": "cluster_health_status!='None' and len(host_for_ssh)!=0",
    "condition_result": true
    }''')

(err, hdl, args) = task.validate(vars=vars())
if err is None:
    task.execute(ssh_execute_remote_command, lego_printer=ssh_execute_remote_command_printer, hdl=hdl, args=args)

<h3 id="Perform-changes/-maintenance"><a id="3" target="_self" rel="nofollow"></a>Perform changes/ maintenance<a class="jp-InternalAnchorLink" href="#Perform-changes/-maintenance" target="_self">&para;</a></h3>
<p>In this step we can perform maintenance jobs, install updates or even modify the elasticsearch.yml. We can create a custom action (Click on <span style="background-color: rgb(230, 126, 35);"><strong>Add</strong></span> button on the top) as per the requirement and add it in this step.</p>
<p>This article explains some of the common issues incurred by Elasticsearch clusters- <strong><a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/fix-common-cluster-issues.html">link to blog&nbsp;</a></strong></p>
<pre><code>Note- Please make sure that the configuration changes don't cause the failure of a node restart in the next step
</code></pre>

<h3><a id='4'>Start the node</a></h3>
This action starts the node after performing changes on the node.


   If you are running Elasticsearch with systemd:

    sudo systemctl start elasticsearch.service

   If you are running Elasticsearch with SysV init:

    sudo -i service elasticsearch start

>This action takes the following parameters: `host_for_ssh`, `command_start_elasticsearch`, `run_with_sudo`


In [32]:
##
# Copyright (c) 2021 unSkript, Inc
# All rights reserved.
##
from pydantic import BaseModel, Field
from typing import List, Optional, Dict
import pprint


from beartype import beartype
@beartype
def ssh_execute_remote_command_printer(output):
    if output is None:
        return
    print("Elasticsearch Service successfully STARTED")
    print("\n")
    pprint.pprint(output)


@beartype
def ssh_execute_remote_command(sshClient, hosts: List[str], command: str, sudo: bool = False) -> Dict:

    client = sshClient(hosts)
    runCommandOutput = client.run_command(command=command, sudo=sudo)
    client.join()
    res = {}

    for host_output in runCommandOutput:
        hostname = host_output.host
        output = []
        for line in host_output.stdout:
            output.append(line)

        o = "\n".join(output)
        res[hostname] = o

    return res


task = Task(Workflow())
task.configure(printOutput=True)
task.configure(inputParamsJson='''{
    "command": "cmd_start_elasticsearch",
    "hosts": "host_for_ssh",
    "sudo": "run_with_sudo"
    }''')
task.configure(conditionsJson='''{
    "condition_enabled": true,
    "condition_cfg": "cluster_health_status!='None' and len(host_for_ssh)!=0",
    "condition_result": true
    }''')

(err, hdl, args) = task.validate(vars=vars())
if err is None:
    task.execute(ssh_execute_remote_command, lego_printer=ssh_execute_remote_command_printer, hdl=hdl, args=args)

<h3><a id='5'>Elasticsearch Reenable Shard Allocation</a></h3>
This action to enables shard allocation and makes the node ready to use.

>This action takes the following parameters: `elasticsearch_host`, `port`, `api_key`

In [None]:
##
# Copyright (c) 2021 unSkript, Inc
# All rights reserved.
##
import subprocess
import pprint
from pydantic import BaseModel, Field
from typing import List, Dict
from subprocess import PIPE, run
import json


from beartype import beartype
@beartype
def elasticsearch_enable_shard_allocation_printer(output):
    if output is None:
        return
    print("Shard allocations enabled for all kinds of shards")
    print(output)


@beartype
def elasticsearch_enable_shard_allocation(handle) -> Dict:
    """elasticsearch_enable_shard_allocation enables shard allocations for any shards for any indices.

            :type handle: object
            :param handle: Object returned from Task Validate

            :rtype: Result Dict of result
    """
    es_dict = {"transient": {"cluster.routing.allocation.enable": "all"}}
    output = handle.web_request("/_cluster/settings?pretty",  # Path
                                "PUT",                        # Method
                                es_dict)                      # Data

    return output


task = Task(Workflow())
task.configure(conditionsJson='''{
    "condition_enabled": true,
    "condition_cfg": "cluster_health_status!='None'",
    "condition_result": true
    }''')

task.configure(printOutput=True)
(err, hdl, args) = task.validate(vars=vars())
if err is None:
    task.execute(elasticsearch_enable_shard_allocation, lego_printer=elasticsearch_enable_shard_allocation_printer, hdl=hdl, args=args)

<h3 id="Check-Cluster-Health"><a id="6" target="_self" rel="nofollow"></a>Check Cluster Health<a class="jp-InternalAnchorLink" href="#Check-Cluster-Health" target="_self">&para;</a></h3>
<p>This action checks the status of an Elasticsearch cluster after restart. Ideally, the cluster should show <span style="color: green;">Green</span> status after a successfull restart. These are the cluster statuses that you may encounter-</p>
<ol>
<li>Unassigned primary shards = <span style="color: red;">Red</span> Status</li>
<li>Unassigned replica shards = <span style="color: #ffbf00;">Yellow</span> Status</li>
<li>All shards assigned = <span style="color: green;">Green</span> Status</li>
</ol>
<blockquote>
<p>This action takes the following parameters: <code>None</code></p>
</blockquote>

In [8]:
##
##  Copyright (c) 2021 unSkript, Inc
##  All rights reserved.
##
from pydantic import BaseModel


from beartype import beartype
@beartype
def elasticsearch_get_handle(handle):
    """elasticsearch_get_handle returns the elasticsearch client handle.

       :rtype: elasticsearch client handle.
    """
    return handle


def unskript_default_printer(output):
    if isinstance(output, (list, tuple)):
        for item in output:
            print(f'item: {item}')
    elif isinstance(output, dict):
        for item in output.items():
            print(f'item: {item}')
    else:
        print(f'Output for {task.name}')
        print(output)

task = Task(Workflow())
task.configure(outputName="handle")
task.configure(printOutput=True)
(err, hdl, args) = task.validate(vars=vars())
if err is None:
    task.execute(elasticsearch_get_handle, lego_printer=unskript_default_printer, hdl=hdl, args=args)

In [15]:
from unskript.legos.elasticsearch.elasticsearch_check_health_status.elasticsearch_check_health_status import elasticsearch_check_health_status

output = elasticsearch_check_health_status(handle=handle)
cluster_health_status = ''
for cluster in output:
    if type(cluster)==list:
        if len(cluster)!=0:
            for x in cluster:
                for status in x.values():
                    cluster_health_status= status
    else:
        cluster_health_status = 'green'
print("Cluster Status: ",cluster_health_status)

<h3 id="Conclusion">Conclusion<a class="jp-InternalAnchorLink" href="#Conclusion" target="_self">&para;</a></h3>
<p>In this Runbook, we were able to perform rolling restart on a node in an Elasticsearch cluster using unSkript's Elasticsearch and SSH legos. This runbooks can be re triggered for mutiple clusters in a sequence. To view the full platform capabilities of unSkript please visit <a href="https://us.app.unskript.io" target="_blank" rel="noopener">us.app.unskript.io</a></p>