# Logstash

**Logstash** is a data processing pipeline in the Elastic Stack. It ingests events from many sources, **parses/transforms/enriches** them, and ships them to one or more destinations (often Elasticsearch).


## Goals
- Understand the Logstash pipeline model: **inputs -> filters -> outputs**.
- Know why Logstash is useful in logging architectures.
- See a realistic configuration example (Nginx access logs).


## Why is it used?
- Turn raw logs into **structured fields** (JSON documents).
- **Normalize** data across many teams/services.
- **Enrich** events (geoip, user-agent parsing, add env/version, lookups).
- Route to multiple outputs (Elasticsearch + S3 archive + stdout for debugging).

When you already log structured JSON, you may not need Logstash at all (agents can ship directly), but it is still useful for central enrichment/routing.


## How it works (mental model)
Pipeline pseudocode:

```text
for event in input_stream:
  event = parse(event)         # grok/json/dissect
  event = enrich(event)        # add service/env/geoip/user_agent
  event = normalize(event)     # rename fields, types, timestamps
  if should_drop(event):
    continue
  send(event, outputs)
```

Config is declarative: you choose plugins for inputs/filters/outputs.


## Example: parse Nginx access logs -> Elasticsearch
A minimal-ish Logstash pipeline:

```conf
input {
  beats {
    port => 5044
  }
}

filter {
  grok {
    match => { "message" => "%{COMBINEDAPACHELOG}" }
  }

  # convert the parsed timestamp into @timestamp
  date {
    match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
  }

  mutate {
    add_field => { "service" => "edge" }
    add_field => { "env" => "prod" }
  }
}

output {
  elasticsearch {
    hosts => ["http://elasticsearch:9200"]
    index => "nginx-%{+YYYY.MM.dd}"
  }
  stdout { codec => rubydebug }
}
```

Notes:
- The **Beats input** commonly receives events from Filebeat.
- `grok` turns free-form text into fields; prefer structured logs when possible.


## Practical tips
- Prefer **structured JSON logging** in apps; use Logstash mainly for enrichment and routing.
- Use `dissect` when patterns are fixed (faster than grok).
- Standardize field names (`service`, `env`, `region`, `version`, `trace_id`).
- Treat parsing as a product: test pipelines against sample logs.


## Operational notes
- **Backpressure**: Elasticsearch slowdowns can propagate; plan buffering.
- **Persistent queues** can improve resilience (disk-backed buffering).
- Scale horizontally by running multiple Logstash instances behind a load balancer.
- Watch CPU usage for grok-heavy pipelines.

## Exercises
- Write a grok pattern to parse a custom app log format.
- Add geoip enrichment and visualize top countries in Kibana.

## References
- Logstash docs: https://www.elastic.co/guide/en/logstash/current/introduction.html
- Grok patterns: https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
