Kaftain is a real-time Kafka consumer lag monitoring and auto-scaling solution designed for Kubernetes environments. It provides visibility into consumer group performance and automatically scales Kubernetes deployments based on lag thresholds.
- Real-time Lag Monitoring: Track consumer group lag across multiple Kafka clusters
- Auto-scaling: Automatically scale Kubernetes deployments based on configurable lag thresholds
- Multi-cluster Support: Monitor and manage multiple Kafka clusters from a single interface
- Historical Data: Store and visualize lag trends over time (1h, 6h, 24h views)
- Beautiful UI: Modern React-based dashboard with real-time updates
- Flexible Configuration: Per-consumer group scaling policies with min/max replicas
- Kubernetes Native: Seamless integration with Kubernetes API for deployment management
Kaftain consists of three main components:
- Frontend: React + TypeScript dashboard for visualization and configuration
- Backend: Node.js + Express API server that orchestrates monitoring and scaling
- Data Layer: PostgreSQL for storing configuration, lag history, and scaling events
The system integrates with:
- Kafka Exporter: Prometheus-compatible metrics endpoint for Kafka consumer lag
- Kubernetes API: For scaling deployments based on lag
graph TB
subgraph "Kaftain Architecture"
subgraph "Frontend (React + TypeScript) "
UI[Dashboard UI]
Charts[Lag Timeline Charts]
Config[Cluster/Monitor Config]
end
subgraph "Backend (Node.js + Express)"
API[REST API]
Monitor[Monitor Service]
Lag[Lag Service]
Scale[Scaling Service]
K8s[K8s Controller]
end
subgraph "Data Storage"
Postgresql[(Postgresql)]
end
subgraph "External Systems"
Kafka[Kafka Clusters]
KafkaExp[Kafka Exporter]
K8sAPI[Kubernetes API]
Deploy[Consumer Deployments]
end
end
UI --> API
Charts --> API
Config --> API
API --> Monitor
API --> Lag
Monitor --> Lag
Monitor --> Scale
Scale --> K8s
Lag --> KafkaExp
KafkaExp --> Kafka
K8s --> K8sAPI
K8sAPI --> Deploy
Monitor --> Postgresql
Lag --> Postgresql
Scale --> Postgresql
API --> Postgresql
- Node.js 18+
- PostgreSQL 15+
- Kubernetes cluster with:
- Kafka consumer deployments
- Kafka Exporter deployed
- RBAC permissions for deployment scaling
- Access to Kafka cluster metrics
- Clone the repository:
git clone https://github.com/yourusername/kaftain.git
cd kaftain
- Install dependencies:
npm install
cd server && npm install && cd ..
- Set up environment variables:
# Create .env file in server directory
cat > server/.env << EOF
DATABASE_URL=postgresql://localhost:5432/kaftain
PORT=3001
NAMESPACE=default
DEPLOYMENT_NAME=kafka-consumer
EOF
- Start PostgreSQL locally:
docker run -d -p 5432:5432 --name kaftain-postgres postgres:15
- Run the application:
npm run dev-all # Starts both frontend and backend
- Build the Docker image:
docker build -t your-registry/kaftain:latest .
docker push your-registry/kaftain:latest
- Create Kubernetes resources:
# kaftain-deployment.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: kaftain-config
data:
DATABASE_URL: "postgresql://postgres:5432/kaftain"
NAMESPACE: "default"
DEPLOYMENT_NAME: "kafka-consumer"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: kaftain
spec:
replicas: 1
selector:
matchLabels:
app: kaftain
template:
metadata:
labels:
app: kaftain
spec:
serviceAccountName: kaftain
containers:
- name: kaftain
image: your-registry/kaftain:latest
ports:
- containerPort: 3001
- containerPort: 5173
envFrom:
- configMapRef:
name: kaftain-config
---
apiVersion: v1
kind: Service
metadata:
name: kaftain
spec:
selector:
app: kaftain
ports:
- name: api
port: 3001
targetPort: 3001
- name: ui
port: 5173
targetPort: 5173
type: LoadBalancer
- Create RBAC permissions:
# kaftain-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: kaftain
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kaftain
rules:
- apiGroups: ["apps"]
resources: ["deployments", "deployments/scale"]
verbs: ["get", "list", "watch", "update", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kaftain
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kaftain
subjects:
- kind: ServiceAccount
name: kaftain
namespace: default
- Apply the manifests:
kubectl apply -f kaftain-rbac.yaml
kubectl apply -f kaftain-deployment.yaml
Ensure your Kafka consumer deployments have appropriate labels and are in the namespace configured in Kaftain.
- Access the Kaftain UI (http://your-kaftain-url:5173)
- Click "Add Cluster" in the sidebar
- Enter:
- Cluster Name: A friendly name for your Kafka cluster
- URL: The Kafka Exporter metrics endpoint (e.g.,
http://kafka-exporter:9308/metrics
)
- Select your cluster from the sidebar
- Click "Add Monitor"
- Choose a consumer group to monitor
- Configure scaling parameters:
- Min Replicas: Minimum number of consumer pods
- Max Replicas: Maximum number of consumer pods
- Scaling Factor: Lag per replica (e.g., 1000 = scale up by 1 replica per 1000 lag)
- Cluster Sidebar: Switch between multiple Kafka clusters
- Active Monitors: View and manage running monitors
- Lag Timeline: Visualize consumer lag trends over time
- Consumer Groups Table: See current lag for all consumer groups
The auto-scaler calculates optimal replicas using:
replicas = ceil(current_lag / scaling_factor)
Bounded by min_replicas
and max_replicas
.
Example:
- Current lag: 5000
- Scaling factor: 1000
- Min replicas: 1
- Max replicas: 10
- Result: 5 replicas
GET /api/cluster-config
- List all clustersPOST /api/cluster-config
- Add a new clusterDELETE /api/cluster-config/:id
- Remove a cluster
POST /api/service/start
- Start monitoring a consumer groupPOST /api/service/stop
- Stop monitoringGET /api/service/monitors
- List active monitorsDELETE /api/service/monitors/:id
- Delete a monitor
GET /api/lag/records
- Get historical lag dataGET /api/consumer-groups
- List consumer groups for a cluster
Each monitor can be configured with:
- minReplicas: Minimum pod count (default: 1)
- maxReplicas: Maximum pod count (default: 10)
- scalingFactor: Lag messages per replica (default: 1000)
- "Failed to fetch consumer groups"
- Verify Kafka Exporter is running and accessible
- Check the cluster URL is correct
- Ensure network policies allow communication
- "Failed to scale deployment"
- Check RBAC permissions for the service account
- Verify deployment exists in the configured namespace
- Check Kubernetes API server logs
- "No lag data available"
- Ensure consumer groups are actively consuming
- Verify Kafka Exporter is scraping the correct Kafka cluster
- Check if consumer group names match exactly
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
- Kafka Exporter for metrics
- Recharts for beautiful charts
- Kubernetes Client for K8s integration