# Elasticsearch

**Elasticsearch** is a distributed search and analytics engine. In observability it is most commonly used to **index and search logs** (JSON events) and run aggregations for dashboards and investigations.


## Goals
- Understand what Elasticsearch stores (documents) and how it searches (indexes).
- Know the core objects: **index**, **document**, **mapping**, **shard/replica**.
- See how logs are ingested and queried with examples.


## What is it?
- A **document store** for JSON-like data.
- A **full-text search engine** (inverted index) with fast filtering.
- A **distributed system** (clusters, shards, replication) for scaling throughput and storage.

In a logs use case, each log event becomes a document with fields like `service`, `level`, `http.status`, `trace_id`.


## Why is it used?
- **Fast interactive search** ("find all errors for service X in region Y during deploy Z").
- **Aggregations** for analytics (counts over time, top endpoints, error breakdowns).
- Works well with **semi-structured** data (logs with lots of fields).

Tradeoffs:
- Great for search/aggregations, but **storage can get expensive** at high volume.
- Requires attention to **mappings**, **shards**, and **retention**.


## Core concepts
- **Document**: one JSON event (a single log line after parsing).
- **Index**: a collection of documents (often time-based like `logs-2026.01.07`).
- **Mapping**: schema for fields (types like keyword/text/date/long).
- **Shard**: a partition of an index (scaling).
- **Replica**: a copy of a shard (resilience + read scaling).
- **Ingest pipeline**: server-side processing steps (parse/enrich) at index time.
- **ILM / retention**: rollover + delete old data to control cost.


## How it is used (logs pipeline)
A common end-to-end flow:

```
app -> stdout/file -> shipper (Filebeat/Fluent Bit) -> Logstash -> Elasticsearch -> Kibana
```

Key idea: shipper/Logstash turns "raw text" into **structured fields**, Elasticsearch stores them so you can search and aggregate.


## Example: indexing log events (Bulk API)
Elasticsearch supports efficient ingestion with the NDJSON bulk format:

```bash
curl -XPOST 'http://es:9200/_bulk' \
  -H 'Content-Type: application/x-ndjson' \
  --data-binary @- <<'NDJSON'
{ "index": { "_index": "logs-checkout-2026.01.07" } }
{ "ts":"2026-01-07T20:13:11Z", "level":"error", "service":"checkout", "msg":"timeout", "http": {"status": 504} }
{ "index": { "_index": "logs-checkout-2026.01.07" } }
{ "ts":"2026-01-07T20:13:12Z", "level":"info", "service":"checkout", "msg":"retry", "http": {"status": 200} }
NDJSON
```

In practice you rarely do this manually; an agent/Logstash usually sends events.


## Example: searching + aggregating
A typical query filters by service/time/status and aggregates by status:

```json
{
  "query": {
    "bool": {
      "filter": [
        {"term": {"service": "checkout"}},
        {"range": {"http.status": {"gte": 500}}},
        {"range": {"ts": {"gte": "now-15m"}}}
      ]
    }
  },
  "aggs": {
    "by_status": {"terms": {"field": "http.status"}}
  }
}
```

This is the kind of query Kibana builds behind the scenes.


## Pitfalls and operational notes
- **Mapping mistakes** are painful later (e.g., treating IDs as `text` instead of `keyword`).
- **Too many shards**: overhead grows; size shards sensibly.
- **Hot vs warm data**: recent logs need fast storage; older logs can move to cheaper tiers.
- **Retention**: always set ILM/rollover/delete; logs grow without bound.
- **Security**: enable auth/TLS and restrict who can read logs.

## Exercises
- Design an index naming + retention policy (7d hot, 30d warm, delete after 90d).
- Pick 5 fields you would standardize across all services (service, env, region, version, trace_id).

## References
- Elasticsearch docs: https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
- Data streams & ILM: https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html
