Skip to content

Add Kubernetes Grafana Dashboard Deployment Guide #279

@Xunzhuo

Description

@Xunzhuo

Is your feature request related to a problem? Please describe.

Currently, the project provides excellent observability support through Docker Compose with Prometheus and Grafana, including a pre-built dashboard (deploy/llm-router-dashboard.json). However, users deploying the semantic router in Kubernetes environments lack comprehensive guidance on how to set up Grafana dashboards to monitor their deployments effectively.

The existing Kubernetes deployment documentation focuses on the core application deployment but doesn't cover the observability stack setup, leaving users to figure out:

  • How to deploy Prometheus and Grafana in Kubernetes
  • How to configure service discovery for the semantic router metrics endpoint
  • How to import and configure the existing dashboard in a Kubernetes environment
  • Best practices for persistent storage and configuration management

Describe the solution you'd like

Add a comprehensive guide for deploying Grafana dashboards in Kubernetes environments that includes:

  1. Kubernetes Manifests for Observability Stack:

    • Prometheus deployment with proper RBAC and service discovery
    • Grafana deployment with persistent storage configuration
    • ConfigMaps for Prometheus scrape configs and Grafana datasources
    • Services and ingress configurations for external access
  2. Dashboard Integration:

    • Instructions for importing the existing deploy/llm-router-dashboard.json
    • ConfigMap-based dashboard provisioning for automated deployment
    • Dashboard customization guidelines for Kubernetes-specific metrics
  3. Configuration Examples:

    • Prometheus configuration for discovering semantic router metrics endpoints
    • Grafana datasource configuration pointing to Kubernetes Prometheus
    • RBAC configurations for proper service discovery permissions
  4. Deployment Instructions:

    • Step-by-step deployment guide using kubectl/kustomize
    • Integration with existing semantic router Kubernetes deployment
    • Verification steps to ensure metrics collection is working
  5. Best Practices:

    • Resource requirements and scaling considerations
    • Security configurations (authentication, authorization)
    • Persistent storage recommendations for metrics retention
    • Monitoring stack maintenance and updates

Additional context

The project already has:

  • A working Grafana dashboard (deploy/llm-router-dashboard.json) with semantic router metrics
  • Kubernetes deployment manifests with metrics endpoint exposed on port 9190
  • Docker Compose observability setup as a reference implementation
  • Prometheus configuration (config/prometheus.yaml) that can be adapted for Kubernetes

This guide should complement the existing deploy/kubernetes/README.md and provide users with a complete observability solution for production Kubernetes deployments.

The guide should be placed in deploy/kubernetes/observability/ directory with:

  • README.md - Main deployment guide
  • prometheus/ - Prometheus manifests and configurations
  • grafana/ - Grafana manifests and dashboard configurations
  • kustomization.yaml - Kustomize configuration for easy deployment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions