-
Notifications
You must be signed in to change notification settings - Fork 180
Description
Is your feature request related to a problem? Please describe.
Currently, the project provides excellent observability support through Docker Compose with Prometheus and Grafana, including a pre-built dashboard (deploy/llm-router-dashboard.json
). However, users deploying the semantic router in Kubernetes environments lack comprehensive guidance on how to set up Grafana dashboards to monitor their deployments effectively.
The existing Kubernetes deployment documentation focuses on the core application deployment but doesn't cover the observability stack setup, leaving users to figure out:
- How to deploy Prometheus and Grafana in Kubernetes
- How to configure service discovery for the semantic router metrics endpoint
- How to import and configure the existing dashboard in a Kubernetes environment
- Best practices for persistent storage and configuration management
Describe the solution you'd like
Add a comprehensive guide for deploying Grafana dashboards in Kubernetes environments that includes:
-
Kubernetes Manifests for Observability Stack:
- Prometheus deployment with proper RBAC and service discovery
- Grafana deployment with persistent storage configuration
- ConfigMaps for Prometheus scrape configs and Grafana datasources
- Services and ingress configurations for external access
-
Dashboard Integration:
- Instructions for importing the existing
deploy/llm-router-dashboard.json
- ConfigMap-based dashboard provisioning for automated deployment
- Dashboard customization guidelines for Kubernetes-specific metrics
- Instructions for importing the existing
-
Configuration Examples:
- Prometheus configuration for discovering semantic router metrics endpoints
- Grafana datasource configuration pointing to Kubernetes Prometheus
- RBAC configurations for proper service discovery permissions
-
Deployment Instructions:
- Step-by-step deployment guide using kubectl/kustomize
- Integration with existing semantic router Kubernetes deployment
- Verification steps to ensure metrics collection is working
-
Best Practices:
- Resource requirements and scaling considerations
- Security configurations (authentication, authorization)
- Persistent storage recommendations for metrics retention
- Monitoring stack maintenance and updates
Additional context
The project already has:
- A working Grafana dashboard (
deploy/llm-router-dashboard.json
) with semantic router metrics - Kubernetes deployment manifests with metrics endpoint exposed on port 9190
- Docker Compose observability setup as a reference implementation
- Prometheus configuration (
config/prometheus.yaml
) that can be adapted for Kubernetes
This guide should complement the existing deploy/kubernetes/README.md
and provide users with a complete observability solution for production Kubernetes deployments.
The guide should be placed in deploy/kubernetes/observability/
directory with:
README.md
- Main deployment guideprometheus/
- Prometheus manifests and configurationsgrafana/
- Grafana manifests and dashboard configurationskustomization.yaml
- Kustomize configuration for easy deployment