# Kibana

**Kibana** is the UI layer of the Elastic Stack. It is used to **search, visualize, and alert** on data stored in Elasticsearch (commonly logs).


## Goals
- Know what Kibana is used for day-to-day (Discover, dashboards, alerting).
- Learn the basic workflow: connect -> data view -> search -> visualize -> alert.
- See example queries you would use during an incident.


## What is it?
- A web UI for Elasticsearch data.
- Includes:
  - **Discover**: interactive log search and filtering.
  - **Visualize/Lens**: build charts and aggregations.
  - **Dashboards**: compose multiple visualizations.
  - **Alerting**: trigger notifications based on queries.


## Why is it used?
- The fastest way for humans to **explore logs**: filter, pivot, and aggregate.
- Creates shared dashboards for operations and incident response.
- Lets teams define alerts on log patterns (e.g., spikes in 5xx, specific error messages).


## How it is used (workflow)
1. Create a **Data View** (formerly "index pattern") that matches your indices (example: `logs-*`).
2. Use **Discover** to filter by time range and fields.
3. Save useful searches.
4. Build visualizations (counts over time, top services, errors by route).
5. Combine into dashboards and share.
6. Add alert rules when patterns should page someone.


## Example queries (KQL-style)
During an incident you often start broad, then narrow:

```text
service: "checkout" and env: "prod"
service: "checkout" and http.status >= 500
service: "checkout" and message: "timeout"
trace_id: "4bf92f3577b34da6a3ce929d0e0e4736"
```

If logs are structured, you can drill down by `region`, `version`, `pod`, `customer_tier`, etc.


## Example: alert on error spike (pseudocode)
A typical rule idea:

```text
every 1m:
  errors = count(logs where service='checkout' and http.status >= 500 over last 5m)
  if errors > 200:
    notify(oncall, "checkout 5xx spike")
```

In practice Kibana stores this as a query + threshold + schedule + action.


## Pitfalls
- If you log free-form strings, Kibana searches become brittle. Prefer JSON logs.
- If timestamps are wrong, time filtering becomes misleading.
- Without consistent fields (`service`, `env`, `version`), dashboards become hard to reuse.

## Exercises
- Build a dashboard showing errors per service over time.
- Add a filter control for `env` and `region`.

## References
- Kibana docs: https://www.elastic.co/guide/en/kibana/current/introduction.html
- KQL guide: https://www.elastic.co/guide/en/kibana/current/kuery-query.html
