Skip to content

tiroq/ember

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔥 Ember

A quiet system for revealing hidden heat and load.

Ember is a standalone Linux host observability stack designed to diagnose fan noise, thermal spikes, and system bottlenecks using host and container metrics. Built for Mini PCs and homelab environments where understanding thermal behavior and resource utilization is critical.

Ember Logo

Features

  • Thermal Monitoring: Track CPU temperatures, thermal zones, and fan speeds via hwmon
  • CPU Analysis: Per-core utilization, iowait detection, load averages
  • Memory & Swap: Real-time memory breakdown and swap usage tracking
  • Disk I/O: Throughput, IOPS, and filesystem usage monitoring
  • Network: Bandwidth utilization, errors, and packet drops
  • Container Insights: Top containers by CPU/memory, restart detection
  • Process Visibility: Per-process resource consumption (optional)

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         Linux Host                               │
├─────────────────────────────────────────────────────────────────┤
│  ┌──────────────┐  ┌──────────────┐  ┌────────────────────────┐ │
│  │ node_exporter│  │   cAdvisor   │  │   process-exporter     │ │
│  │   :9100      │  │    :8080     │  │        :9256           │ │
│  └──────┬───────┘  └──────┬───────┘  └───────────┬────────────┘ │
│         │                 │                      │              │
│         └────────────┬────┴──────────────────────┘              │
│                      ▼                                          │
│              ┌───────────────┐                                  │
│              │  Prometheus   │──── 15 day retention             │
│              │    :9090      │                                  │
│              └───────┬───────┘                                  │
│                      │                                          │
│                      ▼                                          │
│              ┌───────────────┐                                  │
│              │    Grafana    │──── Auto-provisioned dashboards  │
│              │    :3000      │                                  │
│              └───────────────┘                                  │
└─────────────────────────────────────────────────────────────────┘

Prerequisites

  • Docker >= 20.10
  • Docker Compose >= 2.0 (or docker-compose v1.29+)
  • Linux host with access to /proc, /sys, and /dev
  • lm-sensors (recommended for temperature metrics)

Verify Docker Installation

docker --version
docker compose version

Quick Start

1. Clone or Create the Project

cd /path/to/ember

2. Configure Environment

Copy the example environment file and optionally generate a new secure password:

# Use the provided .env (has a pre-generated secure password)
# OR generate your own:
cp .env.example .env
echo "GF_SECURITY_ADMIN_PASSWORD=$(openssl rand -base64 24)" > .env

3. Start the Stack

docker compose up -d

4. Verify All Services Are Running

docker compose ps

Expected output shows all 5 services as "healthy" or "running":

  • ember-prometheus
  • ember-grafana
  • ember-node-exporter
  • ember-cadvisor
  • ember-process-exporter

Accessing the Interfaces

Service URL Default Credentials
Grafana http://localhost:3000 admin / (see .env)
Prometheus http://localhost:9090 -

Note: All services bind to 127.0.0.1 (localhost only) by default for security.

External Application Integration

Ember can scrape metrics from external applications running in Docker containers via an external Docker network. This is configured using override files that are gitignored, keeping the main Ember configuration clean.

Prerequisites

  • External application stack must be running
  • External Docker network must exist (e.g., <app>_default, <app>_network)

Setup

  1. Verify external network exists:

    docker network ls | grep <network-name>
  2. Configure the override files (gitignored):

    • docker-compose.override.yml - Adds external network connectivity
    • prometheus/prometheus.local.yml - Adds scrape jobs for external services
  3. Update network name in docker-compose.override.yml:

    networks:
      <external-network-name>:
        external: true
  4. Start Ember (override auto-applied):

    docker compose up -d
  5. Start WITHOUT external integration (ignore override):

    docker compose -f docker-compose.yml up -d

Verify Targets

  1. Open http://localhost:9090/targets
  2. Look for your custom job targets
  3. UP = service is running and metrics are being scraped
  4. DOWN = service is not running (expected if external app is stopped)

Test Connectivity from Prometheus Container

# Replace <service-name> and <port> with actual values
docker exec ember-prometheus wget -qO- http://<service-name>:<port>/metrics | head -20

Troubleshooting

If connectivity fails, verify:

  1. External stack is running: docker ps | grep <app-name>
  2. External network exists: docker network ls | grep <network-name>
  3. Ember is connected to the network: docker network inspect <external-network-name>

Adding Alert Rules

Alert rules allow Prometheus to fire alerts when conditions are met (displayed in Prometheus UI, no notifications without Alertmanager).

  1. Copy the example rules file:

    cp prometheus/rules/example.rules.yml prometheus/rules/my-app.rules.yml
  2. Edit and uncomment rules in prometheus/rules/my-app.rules.yml

  3. Validate rules:

    docker run --rm --entrypoint promtool \
      -v "$(pwd)/prometheus:/etc/prometheus:ro" \
      prom/prometheus:v2.51.0 check config /etc/prometheus/prometheus.yml
  4. Reload Prometheus (no restart needed):

    curl -X POST http://localhost:9090/-/reload
  5. View alerts: http://localhost:9090/alerts

Note: Rules files (*.rules.yml) are gitignored except example.rules.yml. This allows environment-specific alert configurations.

Common Alert Patterns

# Service down
- alert: ServiceDown
  expr: up{job="my-service"} == 0
  for: 1m
  labels:
    severity: critical

# High latency (histogram)
- alert: HighLatency
  expr: histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) > 1
  for: 5m
  labels:
    severity: warning

# Message queue backlog (NATS JetStream example)
- alert: ConsumerBacklogHigh
  expr: jetstream_consumer_num_pending > 1000
  for: 5m
  labels:
    severity: warning

Verification Steps

Check Prometheus Targets

  1. Open http://localhost:9090/targets
  2. Verify all targets show UP status:
    • node-exporter (1/1 up)
    • cadvisor (1/1 up)
    • process-exporter (1/1 up)
    • prometheus (1/1 up)

Check Grafana Dashboards

  1. Open http://localhost:3000
  2. Login with admin and the password from your .env file
  3. Navigate to Dashboards in the left sidebar
  4. Verify two dashboards are present:
    • Host Health + Thermals
    • Containers Overview

Verify Metrics Collection

In Prometheus (http://localhost:9090/graph), try these queries:

# CPU temperature (requires lm-sensors)
node_hwmon_temp_celsius

# CPU usage
rate(node_cpu_seconds_total{mode="user"}[1m])

# Container memory
container_memory_usage_bytes{name!=""}

# Process CPU
namedprocess_namegroup_cpu_seconds_total

Enabling Temperature Metrics

Temperature metrics require lm-sensors to be installed and configured on the host.

Install lm-sensors

Debian/Ubuntu:

sudo apt update
sudo apt install lm-sensors

Fedora/RHEL:

sudo dnf install lm_sensors

Arch Linux:

sudo pacman -S lm_sensors

Detect Sensors

Run the sensor detection wizard:

sudo sensors-detect
  • Answer YES to probe for various sensor chips
  • Answer YES to add modules to /etc/modules when prompted
  • Reboot or load modules manually:
sudo systemctl restart systemd-modules-load.service

Verify Sensors

sensors

Expected output shows temperature readings:

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +45.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +43.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +44.0°C  (high = +100.0°C, crit = +100.0°C)

Check hwmon Files

Verify sensor data is exposed in sysfs:

ls /sys/class/hwmon/
cat /sys/class/hwmon/hwmon*/temp*_input

Restart node_exporter

After configuring lm-sensors, restart the stack to pick up new sensors:

docker compose restart node-exporter

If Temperature Metrics Are Missing

  1. Check if hwmon is exposed:

    ls -la /sys/class/hwmon/
  2. Verify node_exporter can read hwmon:

    curl -s http://localhost:9100/metrics | grep hwmon
  3. Check for thermal_zone metrics (alternative):

    curl -s http://localhost:9100/metrics | grep thermal_zone
  4. Ensure kernel modules are loaded:

    lsmod | grep -E 'coretemp|k10temp|nct|it87'
  5. Common issues:

    • Some Mini PCs don't expose fan RPM via hwmon
    • Virtual machines typically don't have hwmon sensors
    • BIOS/UEFI settings may disable sensor reporting

Commands Reference

Start Stack

docker compose up -d

Stop Stack

docker compose down

View Logs

# All services
docker compose logs -f

# Specific service
docker compose logs -f prometheus
docker compose logs -f grafana
docker compose logs -f node-exporter

Restart Services

docker compose restart

Update Images

docker compose pull
docker compose up -d

Check Resource Usage

docker stats

Remove Everything (including data)

docker compose down -v

Data Persistence

Data is stored in Docker named volumes:

Volume Purpose
prometheus_data Prometheus TSDB (15 days)
grafana_data Grafana config & state

To backup volumes:

docker run --rm -v ember_prometheus_data:/data -v $(pwd):/backup alpine tar czf /backup/prometheus-backup.tar.gz /data
docker run --rm -v ember_grafana_data:/data -v $(pwd):/backup alpine tar czf /backup/grafana-backup.tar.gz /data

Security Notes

Localhost Binding

All services bind to 127.0.0.1 by default, making them accessible only from the local machine. This is intentional for security.

To expose externally (not recommended without additional security):

Edit docker-compose.yml and change port bindings:

ports:
  - "0.0.0.0:3000:3000"  # Exposes to all interfaces

Secrets Management

  • Never commit .env to version control (it's in .gitignore)
  • The .env.example file shows the required format without real secrets
  • Generate strong passwords: openssl rand -base64 24

Container Privileges

  • node-exporter: Runs with host PID namespace for accurate process metrics
  • cadvisor: Runs privileged to access container metrics
  • process-exporter: Runs privileged to read /proc

These are required for accurate metrics collection.

Troubleshooting

host.docker.internal Not Resolving

On Linux, host.docker.internal requires explicit configuration. The docker-compose.yml includes:

extra_hosts:
  - "host.docker.internal:host-gateway"

If you still have issues:

  1. Verify Docker version >= 20.10
  2. Check the host gateway IP:
    docker run --rm alpine ip route | grep default
  3. Manually specify the host IP in prometheus.yml if needed

Prometheus Can't Scrape Targets

  1. Check target status: http://localhost:9090/targets
  2. Verify containers are running: docker compose ps
  3. Check container logs: docker compose logs <service>
  4. Test connectivity from Prometheus container:
    docker compose exec prometheus wget -qO- http://node-exporter:9100/metrics | head

Grafana Dashboard Shows "No Data"

  1. Verify Prometheus datasource: Grafana → Connections → Data sources → Prometheus
  2. Check if Prometheus has data: http://localhost:9090/graph
  3. Ensure time range is appropriate (default: last 1 hour)
  4. Check for metric name changes between versions

cAdvisor Memory Issues

cAdvisor can be memory-intensive. To limit:

cadvisor:
  deploy:
    resources:
      limits:
        memory: 512M

High Disk Usage

Prometheus stores 15 days of data. To reduce:

  1. Edit docker-compose.yml:
    command:
      - '--storage.tsdb.retention.time=7d'
  2. Restart Prometheus:
    docker compose restart prometheus

process-exporter High Cardinality

If you have many unique processes, edit process-exporter/process-exporter.yml to group more aggressively or exclude noisy processes.

Customization

Changing Scrape Intervals

Edit prometheus/prometheus.yml:

scrape_configs:
  - job_name: 'node-exporter'
    scrape_interval: 10s  # Change from 5s

Adding Custom Dashboards

  1. Create JSON dashboard file in grafana/dashboards/
  2. Dashboards are auto-loaded within 30 seconds
  3. Or restart Grafana: docker compose restart grafana

Modifying Retention

Edit retention in docker-compose.yml:

prometheus:
  command:
    - '--storage.tsdb.retention.time=30d'  # 30 days instead of 15

Stack Versions

Component Version
Prometheus v2.51.0
Grafana 10.4.1
node_exporter v1.7.0
cAdvisor v0.47.2
process-exporter 0.8.2

License

This project is provided as-is for personal and educational use.


EmberRevealing the hidden heat in your system 🔥

About

A quiet system for revealing hidden heat and load.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors