Kaftain - Kafka Consumer Lag Monitor & Auto-Scaler

Kaftain is a real-time Kafka consumer lag monitoring and auto-scaling solution designed for Kubernetes environments. It provides visibility into consumer group performance and automatically scales Kubernetes deployments based on lag thresholds.

🚀 Features

Real-time Lag Monitoring: Track consumer group lag across multiple Kafka clusters
Auto-scaling: Automatically scale Kubernetes deployments based on configurable lag thresholds
Multi-cluster Support: Monitor and manage multiple Kafka clusters from a single interface
Historical Data: Store and visualize lag trends over time (1h, 6h, 24h views)
Beautiful UI: Modern React-based dashboard with real-time updates
Flexible Configuration: Per-consumer group scaling policies with min/max replicas
Kubernetes Native: Seamless integration with Kubernetes API for deployment management

🏗️ Architecture

Kaftain consists of three main components:

Frontend: React + TypeScript dashboard for visualization and configuration
Backend: Node.js + Express API server that orchestrates monitoring and scaling
Data Layer: PostgreSQL for storing configuration, lag history, and scaling events

The system integrates with:

Kafka Exporter: Prometheus-compatible metrics endpoint for Kafka consumer lag
Kubernetes API: For scaling deployments based on lag

graph TB
    subgraph "Kaftain Architecture"
        subgraph "Frontend (React + TypeScript) "
            UI[Dashboard UI]
            Charts[Lag Timeline Charts]
            Config[Cluster/Monitor Config]
        end
        
        subgraph "Backend (Node.js + Express)"
            API[REST API]
            Monitor[Monitor Service]
            Lag[Lag Service]
            Scale[Scaling Service]
            K8s[K8s Controller]
        end
        
        subgraph "Data Storage"
            Postgresql[(Postgresql)]
        end
        
        subgraph "External Systems"
            Kafka[Kafka Clusters]
            KafkaExp[Kafka Exporter]
            K8sAPI[Kubernetes API]
            Deploy[Consumer Deployments]
        end
    end
    
    UI --> API
    Charts --> API
    Config --> API
    
    API --> Monitor
    API --> Lag
    Monitor --> Lag
    Monitor --> Scale
    Scale --> K8s
    
    Lag --> KafkaExp
    KafkaExp --> Kafka
    K8s --> K8sAPI
    K8sAPI --> Deploy
    
    Monitor --> Postgresql
    Lag --> Postgresql
    Scale --> Postgresql
    API --> Postgresql

📋 Prerequisites

Node.js 18+
PostgreSQL 15+
Kubernetes cluster with:
- Kafka consumer deployments
- Kafka Exporter deployed
- RBAC permissions for deployment scaling
Access to Kafka cluster metrics

🔧 Installation

Local Development

Clone the repository:

git clone https://github.com/yourusername/kaftain.git
cd kaftain

Install dependencies:

npm install
cd server && npm install && cd ..

Set up environment variables:

# Create .env file in server directory
cat > server/.env << EOF
DATABASE_URL=postgresql://localhost:5432/kaftain
PORT=3001
NAMESPACE=default
DEPLOYMENT_NAME=kafka-consumer
EOF

Start PostgreSQL locally:

docker run -d -p 5432:5432 --name kaftain-postgres postgres:15

Run the application:

npm run dev-all  # Starts both frontend and backend

Production Deployment (Kubernetes)

Build the Docker image:

docker build -t your-registry/kaftain:latest .
docker push your-registry/kaftain:latest

Create Kubernetes resources:

# kaftain-deployment.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: kaftain-config
data:
  DATABASE_URL: "postgresql://postgres:5432/kaftain"
  NAMESPACE: "default"
  DEPLOYMENT_NAME: "kafka-consumer"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kaftain
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kaftain
  template:
    metadata:
      labels:
        app: kaftain
    spec:
      serviceAccountName: kaftain
      containers:
      - name: kaftain
        image: your-registry/kaftain:latest
        ports:
        - containerPort: 3001
        - containerPort: 5173
        envFrom:
        - configMapRef:
            name: kaftain-config
---
apiVersion: v1
kind: Service
metadata:
  name: kaftain
spec:
  selector:
    app: kaftain
  ports:
  - name: api
    port: 3001
    targetPort: 3001
  - name: ui
    port: 5173
    targetPort: 5173
  type: LoadBalancer

Create RBAC permissions:

# kaftain-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kaftain
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kaftain
rules:
- apiGroups: ["apps"]
  resources: ["deployments", "deployments/scale"]
  verbs: ["get", "list", "watch", "update", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kaftain
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kaftain
subjects:
- kind: ServiceAccount
  name: kaftain
  namespace: default

Apply the manifests:

kubectl apply -f kaftain-rbac.yaml
kubectl apply -f kaftain-deployment.yaml

🔌 Integration with Your Kafka Environment

1. Configure Your Consumer Deployments

Ensure your Kafka consumer deployments have appropriate labels and are in the namespace configured in Kaftain.

2. Add Kafka Cluster to Kaftain

Access the Kaftain UI (http://your-kaftain-url:5173)
Click "Add Cluster" in the sidebar
Enter:
- Cluster Name: A friendly name for your Kafka cluster
- URL: The Kafka Exporter metrics endpoint (e.g., http://kafka-exporter:9308/metrics)

3. Configure Monitoring & Auto-scaling

Select your cluster from the sidebar
Click "Add Monitor"
Choose a consumer group to monitor
Configure scaling parameters:
- Min Replicas: Minimum number of consumer pods
- Max Replicas: Maximum number of consumer pods
- Scaling Factor: Lag per replica (e.g., 1000 = scale up by 1 replica per 1000 lag)

📊 Usage

Dashboard Features

Cluster Sidebar: Switch between multiple Kafka clusters
Active Monitors: View and manage running monitors
Lag Timeline: Visualize consumer lag trends over time
Consumer Groups Table: See current lag for all consumer groups

Scaling Logic

The auto-scaler calculates optimal replicas using:

replicas = ceil(current_lag / scaling_factor)

Bounded by min_replicas and max_replicas.

Example:

Current lag: 5000
Scaling factor: 1000
Min replicas: 1
Max replicas: 10
Result: 5 replicas

🔌 API Endpoints

Cluster Configuration

GET /api/cluster-config - List all clusters
POST /api/cluster-config - Add a new cluster
DELETE /api/cluster-config/:id - Remove a cluster

Monitoring

POST /api/service/start - Start monitoring a consumer group
POST /api/service/stop - Stop monitoring
GET /api/service/monitors - List active monitors
DELETE /api/service/monitors/:id - Delete a monitor

Lag Data

GET /api/lag/records - Get historical lag data
GET /api/consumer-groups - List consumer groups for a cluster

🔧 Configuration

Scaling Configuration

Each monitor can be configured with:

minReplicas: Minimum pod count (default: 1)
maxReplicas: Maximum pod count (default: 10)
scalingFactor: Lag messages per replica (default: 1000)

🐛 Troubleshooting

Common Issues

"Failed to fetch consumer groups"

Verify Kafka Exporter is running and accessible
Check the cluster URL is correct
Ensure network policies allow communication

"Failed to scale deployment"

Check RBAC permissions for the service account
Verify deployment exists in the configured namespace
Check Kubernetes API server logs

"No lag data available"

Ensure consumer groups are actively consuming
Verify Kafka Exporter is scraping the correct Kafka cluster
Check if consumer group names match exactly

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

🙏 Acknowledgments

Kafka Exporter for metrics
Recharts for beautiful charts
Kubernetes Client for K8s integration

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
dist		dist
server		server
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
eslint.config.js		eslint.config.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

Kaftain - Kafka Consumer Lag Monitor & Auto-Scaler

🚀 Features

🏗️ Architecture

📋 Prerequisites

🔧 Installation

Local Development

Production Deployment (Kubernetes)

🔌 Integration with Your Kafka Environment

1. Configure Your Consumer Deployments

2. Add Kafka Cluster to Kaftain

3. Configure Monitoring & Auto-scaling

📊 Usage

Dashboard Features

Scaling Logic

🔌 API Endpoints

Cluster Configuration

Monitoring

Lag Data

🔧 Configuration

Scaling Configuration

🐛 Troubleshooting

Common Issues

🤝 Contributing

🙏 Acknowledgments

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Contributors 3

Uh oh!

Languages

Uh oh!

oslabs-beta/Kaftain

Folders and files

Latest commit

History

Repository files navigation

Kaftain - Kafka Consumer Lag Monitor & Auto-Scaler

🚀 Features

🏗️ Architecture

📋 Prerequisites

🔧 Installation

Local Development

Production Deployment (Kubernetes)

🔌 Integration with Your Kafka Environment

1. Configure Your Consumer Deployments

2. Add Kafka Cluster to Kaftain

3. Configure Monitoring & Auto-scaling

📊 Usage

Dashboard Features

Scaling Logic

🔌 API Endpoints

Cluster Configuration

Monitoring

Lag Data

🔧 Configuration

Scaling Configuration

🐛 Troubleshooting

Common Issues

🤝 Contributing

🙏 Acknowledgments

About

Resources

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Contributors 3

Uh oh!

Languages

Packages