# Analyze Elasticsearch index templates

The notebook gives insights to index templates:
- Unused component templates
- Overlapping templates

## Prepare environment

### Install required Python packages

In [None]:
pip install pandas~=2.2 elasticsearch~=8.15

### Restart Jupiter kernel

In [None]:
get_ipython().kernel.do_shutdown(True)

### Import packages

In [None]:
import getpass
import re
from pathlib import Path
from elasticsearch import Elasticsearch
from IPython.display import display, FileLink
import pandas as pd

### Input Elasticsearch connection settings

To connect Elasticsearh instance it's hostname and valid API key are required.

API key can be created via Kibana - [https://www.elastic.co/guide/en/kibana/current/api-keys.html](https://www.elastic.co/guide/en/kibana/current/api-keys.html)

In [None]:
elasticsearch_host = input("Enter Elasticsearch hostname: ").strip()

In [None]:
elasticsearch_api_key = getpass.getpass("Enter Elasticsearch API key: ").strip()

### Create Elasticsearch client and connect the cluster

In [None]:
client = Elasticsearch(
    hosts=elasticsearch_host,
    api_key=elasticsearch_api_key,
    verify_certs=False,             # Elasticsearch certificate is signed by the non-public authority, so ignore any warning
    ssl_show_warn=False             # Unverified SSL/TLS connections cause a lot of warnings, so them should be supressed
)

In [None]:
print(client.cat.health())

## Analyze index templates in the cluster

### Analyze templates overlapping

> [https://www.elastic.co/guide/en/elasticsearch/reference/current/index-templates.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-templates.html)

In Elasticsearch, index configurations, including settings and field mappings, are determined by an index template. When data arrives at the cluster and the destination index or data stream does not exist, Elasticsearch can automatically create the index. This is achieved by comparing the index name from the client's request with the index templates in the cluster, selecting the one with the highest priority.

Over time, and without proper management, the number of templates can increase. This can lead to situations where multiple index templates match the same index patterns, but only the one with the highest priority is used.

In the long term, it becomes challenging to determine which index patterns are still required.

The code below is designed to identify index templates that are 'overlapped' by others with higher priority.

**Output:** The code generates a complete list of index templates, indicating whether they are overlapped in the 'overlapped_with' field.

In [None]:
def pattern_to_regex(pattern):
    """Convert wildcard pattern to regex pattern."""
    escaped_pattern = re.escape(pattern)
    regex_pattern = "^{0}$".format(escaped_pattern.replace(r'\*', '.*'))
    return regex_pattern

def check_overlap(pattern1, pattern2):
    """Check the type of overlap between two patterns."""
    if pattern1 == pattern2:
        return 2

    regex1 = pattern_to_regex(pattern1)
    regex2 = pattern_to_regex(pattern2)

    if re.match(regex1, pattern2):
        return 1
    elif re.match(regex2, pattern1):
        return 3

    return 0

# Fetch index templates from Elasticsearch client
response = client.indices.get_index_template()
print(f"Total index templates found: {len(response['index_templates'])}")

# Prepare data for DataFrame
data = [{
    "name": template["name"],
    "index_patterns": template["index_template"]["index_patterns"],
    "priority": template["index_template"].get("priority", 0),
    "managed": template["index_template"].get("_meta", {}).get("managed", False),
    "overlapped_with": ""
} for template in response["index_templates"]]

# Compare patterns to find overlaps
n = len(data)
for i in range(n):
    for p1 in data[i]["index_patterns"]:
        for j in range(i + 1, n):
            for p2 in data[j]["index_patterns"]:
                status = check_overlap(p1, p2)
                priority_diff = data[i]["priority"] - data[j]["priority"]

                if status in (1, 2) and priority_diff > 0:
                    data[j]["overlapped_with"] = data[i]["name"]
                elif status in (3, 2) and priority_diff < 0:
                    data[i]["overlapped_with"] = data[j]["name"]

# Create DataFrame and sort by 'overlapped_with'
df = pd.DataFrame(data).sort_values(by="overlapped_with", ascending=False).reset_index(drop=True)

# Save DataFrame to CSV
output_dir = Path('temp')
output_dir.mkdir(parents=True, exist_ok=True)
csv_path = output_dir / "overlapping_templates.csv"
df.to_csv(csv_path, index=False)

# Display a link to download the CSV
display(FileLink(csv_path, result_html_prefix="Open CSV file: "))

### Analyze unused component templates

> [https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-component-template.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-component-template.html)

Index templates in Elasticsearch can be composed of component templates, which are pieces of an index template configuration used as building blocks for the resulting index template.

A lack of careful management of Elasticsearch entities may lead to the accumulation of unused component templates in cluster. The code below is designed to identify such obsolete component templates.

In [None]:
index_template_response = client.indices.get_index_template()
print(f"Total index templates found: {len(index_template_response['index_templates'])}")

component_template_response = client.cluster.get_component_template()
print(f"Total component templates found: {len(component_template_response['component_templates'])}")

# Extract component templates in use from index templates
component_templates_in_use = []
for template in index_template_response["index_templates"]:
    components = template["index_template"].get("composed_of", [])
    for component in components:
        component_templates_in_use.append({"name": component})

# Extract all component templates in the cluster
component_templates_in_cluster = [
    {"name": template["name"]} for template in component_template_response["component_templates"]
]

# Create DataFrames
df1 = pd.DataFrame(component_templates_in_cluster)
df2 = pd.DataFrame(component_templates_in_use).drop_duplicates(keep='first')

# Find component templates that are not in use
diff = pd.merge(df1, df2, on=['name'], how='left', indicator=True)
unused_components = diff[diff['_merge'] == 'left_only']

# Save DataFrame to CSV
output_dir = Path('temp')
output_dir.mkdir(parents=True, exist_ok=True)
csv_path = output_dir / "unused_component_templates.csv"
unused_components.to_csv(csv_path, index=False)

# Display a link to download the CSV
display(FileLink(csv_path, result_html_prefix="Open CSV file: "))