🚀 DataOps Copilot

Enterprise AI-Powered DataOps Platform with Multi-Model Routing

An intelligent data operations platform that automatically profiles, cleans, and analyzes data using Claude Sonnet 4.5, GPT-5 mini, Gemini 2.0 Flash, and Azure GPT-4o-mini with smart model routing.

🌐 LIVE DEMO | 📚 Documentation | 💼 Portfolio

✨ Features

🤖 Multi-Model AI Routing

Claude Sonnet 4.5 for complex reasoning and data analysis
GPT-5 mini for fast structured outputs (latest OpenAI!)
Gemini 2.0 Flash for vision and multimodal tasks (FREE!)
Azure GPT-4o-mini for enterprise compliance
Automatic fallback routing with LiteLLM

📊 Core Capabilities

Auto Data Profiling: Upload CSV/Excel/JSON → instant quality analysis
LLM-Powered Insights: AI explains your data and suggests improvements
Smart SQL Generation: Natural language → production SQL (coming soon)
Dashboard Vision: Upload screenshots → extract metrics (coming soon)
Conversational BI: Ask questions about your data (coming soon)

🏗️ Architecture

Frontend: Next.js 14 (TypeScript, Tailwind CSS, React Query)
Backend: FastAPI (Python 3.11+)
Database: PostgreSQL with DuckDB for analytics
Deployment: Docker + Cloud Run ready
Cost: ~$15-30/month during active use

🚀 Quick Start

Prerequisites

Docker & Docker Compose
Python 3.11+
Node.js 18+
API Keys (at least one):
- Anthropic (Claude)
- OpenAI (GPT-4)
- Google AI (Gemini) - Recommended for free tier!

1. Clone the Repository

git clone <your-repo-url>
cd dataops-copilot

2. Set Up Environment Variables

# Copy example env file
cp backend/.env.example backend/.env

# Edit with your API keys
nano backend/.env

Minimum required in .env:

ANTHROPIC_API_KEY=sk-ant-xxxxx
OPENAI_API_KEY=sk-xxxxx
GOOGLE_API_KEY=xxxxx

3. Start with Docker Compose

# Start all services (backend, postgres, redis)
docker-compose up -d

# View logs
docker-compose logs -f backend

The backend will be available at: http://localhost:8000

4. Set Up Frontend

cd frontend
npm install
npm run dev

The frontend will be available at: http://localhost:3000

🎯 Usage

Upload and Profile Data

Navigate to http://localhost:3000
Click "Launch App"
Upload a CSV, Excel, or JSON file
Toggle "Use AI insights" (recommended)
Click "Analyze File"
View comprehensive profiling results with:
- Basic statistics (rows, columns, nulls)
- Column-level analysis
- Data quality issues
- AI-generated insights and recommendations

Example Data

Use the included sample_data/sales_data.csv for testing.

📁 Project Structure

dataops-copilot/
├── backend/                    # FastAPI Backend
│   ├── app/
│   │   ├── main.py            # FastAPI app entry
│   │   ├── core/
│   │   │   └── config.py      # Configuration
│   │   ├── routers/           # API endpoints
│   │   │   ├── data.py        # Data upload & profiling
│   │   │   └── health.py      # Health checks
│   │   ├── services/          # Business logic
│   │   │   ├── llm_router.py  # Multi-model routing (LiteLLM)
│   │   │   └── data_profiler.py # Data analysis
│   │   └── models/            # Pydantic schemas
│   ├── requirements.txt       # Python dependencies
│   └── Dockerfile
│
├── frontend/                   # Next.js Frontend
│   ├── app/                   # Next.js 14 App Router
│   │   ├── page.tsx           # Landing page
│   │   ├── dashboard/         # Main app
│   │   └── layout.tsx
│   ├── components/
│   │   └── features/          # Feature components
│   ├── lib/
│   │   └── api.ts             # API client
│   └── package.json
│
└── docker-compose.yml         # Local development

🛠️ Development

Backend Development

Run without Docker:

cd backend

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run the server
uvicorn app.main:app --reload --port 8000

API Documentation:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Frontend Development

cd frontend

# Install dependencies
npm install

# Run dev server
npm run dev

# Build for production
npm run build
npm start

Testing

# Backend tests
cd backend
pytest

# Frontend tests
cd frontend
npm test

🎨 Tech Stack

Backend

FastAPI - Modern async Python web framework
LiteLLM - Unified API for multiple LLM providers
Pandas/Polars - Data manipulation
DuckDB - In-memory SQL analytics
SQLAlchemy - ORM
Redis - Caching and task queue
Pydantic - Data validation

Frontend

Next.js 14 - React framework with App Router
TypeScript - Type safety
Tailwind CSS - Utility-first CSS
React Query - Server state management
Axios - HTTP client
Lucide React - Icon library

AI Models

Claude Sonnet 4.5 - Complex reasoning ($3/$15 per 1M tokens)
GPT-5 mini - Latest OpenAI model ($0.15/$0.60 per 1M tokens - estimated)
Gemini 2.0 Flash - FREE during preview! ($0/$0 per 1M tokens)
Azure GPT-4o-mini - Enterprise option ($0.165/$0.66 per 1M tokens)

🚢 Deployment

Deploy to Railway (Free Tier)

Backend:

# Install Railway CLI
npm install -g @railway/cli

# Login
railway login

# Deploy
railway up

Frontend (Vercel):

# Install Vercel CLI
npm install -g vercel

# Deploy
cd frontend
vercel --prod

Environment Variables for Production

Backend (Railway):

ANTHROPIC_API_KEY=sk-ant-xxxxx
OPENAI_API_KEY=sk-xxxxx
GOOGLE_API_KEY=xxxxx
DATABASE_URL=postgresql://...
REDIS_URL=redis://...
DEBUG=False

Frontend (Vercel):

NEXT_PUBLIC_API_URL=https://your-railway-backend.up.railway.app

💰 Cost Breakdown

Development (Testing)

LLM APIs: ~$1-3/month (Gemini 2.0 Flash is FREE!)
Hosting: $0 (free tiers)
Total: ~$1-3/month

Production (Active Use)

LLM APIs: ~$5-10/month (mostly using free/cheap models)
Railway: $5/month (500 hrs)
Database: $0 (Supabase free tier)
Total: ~$10-15/month

Cost Optimization:

Use Gemini 2.0 Flash for most tasks (FREE!)
Use GPT-5 mini for fast structured outputs (latest OpenAI!)
Reserve Claude for complex reasoning only
Enable prompt caching
Use DuckDB for in-memory analytics (free)

🎯 Roadmap

Phase 1: MVP (Week 1-2) ✅

Multi-model routing setup
Data profiling with LLM insights
Next.js frontend with file upload
Docker development environment

Phase 2: SQL & BI (Week 3)

Natural language to SQL generation
Query execution with DuckDB
Interactive chart generation

Phase 3: Vision & Dashboards (Week 4)

Dashboard screenshot OCR (Gemini)
Metric extraction
Auto-dashboard generation

Phase 4: Advanced Features (Week 5+)

🤝 Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Make your changes
Submit a pull request

📝 License

MIT License - feel free to use this for your portfolio or commercial projects!

👨‍💼 Built By

William Kim - AI Engineer

⭐ Show Your Support

If this helped you land interviews, give it a ⭐️!

Note: This is a portfolio project demonstrating full-stack AI engineering skills. For production use, add proper authentication, rate limiting, and monitoring.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
backend		backend
frontend		frontend
sample_data		sample_data
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
start.sh		start.sh

willckim/DataOps-Copilot

Folders and files

Latest commit

History

Repository files navigation