Netprobe is a toolset that can be used for debugging networking issues. The netprobe-ping binary can test ICMP traffic to both IPv4 and IPv6 addresses, perform ARP ping to IPv4 addresses, and perform Neighbour Solicitation on IPv6 addresses.
The daemon binary (netprobe) will execute those checks automatically, and expose the results as a Prometheus exporter.
- Multiple Ping Methods: Supports both ICMP and ARP ping protocols out of the box
- Scalable Design: Processes targets in parallel batches to handle thousands of endpoints
- Pluggable Target Sources: Database-driven targets with extensible source architecture
- Prometheus Integration: Exposes metrics in standard Prometheus format
- Rich Metrics: Tracks packet loss percentage and min/max/average latency per target
- Dimensional Analysis: Labels for customer_id, VLAN, pod, and host for anomaly detection
make buildCreate a config.yaml based on config.example.yaml:
exporter:
listen_address: "0.0.0.0"
listen_port: 9090
ping_interval_seconds: 60
batch_size: 100
max_parallel_workers: 10
icmp:
enabled: true
timeout_ms: 5000
count: 1
arp:
enabled: true
timeout_ms: 5000
database:
type: "postgresql"
host: "localhost"
port: 5432
database: "network_db"
user: "exporter"
password: "${DB_PASSWORD}"
max_open_conns: 25
max_idle_conns: 5
conn_max_lifetime_seconds: 600./netprobe --config config.yamlMetrics will be available at http://localhost:9090/metrics
make test- Name:
netprobe_packet_loss_percent - Type: Gauge
- Labels:
destination_ip,method,customer_id,vlan,pod,host - Value: 0-100 (percentage of lost packets)
- Names:
netprobe_latency_min_ms,netprobe_latency_max_ms,netprobe_latency_avg_ms - Type: Gauge
- Labels: Same as packet loss
- Value: Response time in milliseconds
netprobe_packet_loss_percent{destination_ip="10.0.0.1",method="icmp",customer_id="acme",vlan="prod",pod="us-west-2",host="server-01"} 0.0
netprobe_latency_avg_ms{destination_ip="10.0.0.1",method="icmp",customer_id="acme",vlan="prod",pod="us-west-2",host="server-01"} 2.5
The exporter uses a scalable, concurrent architecture:
- Target Fetching: Periodically loads targets from configured database source
- Batch Scheduling: Divides targets into batches to prevent resource exhaustion
- Parallel Execution: Workers process multiple targets concurrently within each batch
- Non-blocking Collection: Results collected concurrently to prevent deadlocks
- Metrics Storage: Thread-safe in-memory metrics storage
- HTTP Exposition: Prometheus-compatible
/metricsendpoint