Skip to content

willckim/DataOps-Copilot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ DataOps Copilot

Enterprise AI-Powered DataOps Platform with Multi-Model Routing

An intelligent data operations platform that automatically profiles, cleans, and analyzes data using Claude Sonnet 4.5, GPT-5 mini, Gemini 2.0 Flash, and Azure GPT-4o-mini with smart model routing.

🌐 LIVE DEMO | πŸ“š Documentation | πŸ’Ό Portfolio

DataOps Copilot License Python Next.js


✨ Features

πŸ€– Multi-Model AI Routing

  • Claude Sonnet 4.5 for complex reasoning and data analysis
  • GPT-5 mini for fast structured outputs (latest OpenAI!)
  • Gemini 2.0 Flash for vision and multimodal tasks (FREE!)
  • Azure GPT-4o-mini for enterprise compliance
  • Automatic fallback routing with LiteLLM

πŸ“Š Core Capabilities

  • Auto Data Profiling: Upload CSV/Excel/JSON β†’ instant quality analysis
  • LLM-Powered Insights: AI explains your data and suggests improvements
  • Smart SQL Generation: Natural language β†’ production SQL (coming soon)
  • Dashboard Vision: Upload screenshots β†’ extract metrics (coming soon)
  • Conversational BI: Ask questions about your data (coming soon)

πŸ—οΈ Architecture

  • Frontend: Next.js 14 (TypeScript, Tailwind CSS, React Query)
  • Backend: FastAPI (Python 3.11+)
  • Database: PostgreSQL with DuckDB for analytics
  • Deployment: Docker + Cloud Run ready
  • Cost: ~$15-30/month during active use

πŸš€ Quick Start

Prerequisites

  • Docker & Docker Compose
  • Python 3.11+
  • Node.js 18+
  • API Keys (at least one):
    • Anthropic (Claude)
    • OpenAI (GPT-4)
    • Google AI (Gemini) - Recommended for free tier!

1. Clone the Repository

git clone <your-repo-url>
cd dataops-copilot

2. Set Up Environment Variables

# Copy example env file
cp backend/.env.example backend/.env

# Edit with your API keys
nano backend/.env

Minimum required in .env:

ANTHROPIC_API_KEY=sk-ant-xxxxx
OPENAI_API_KEY=sk-xxxxx
GOOGLE_API_KEY=xxxxx

3. Start with Docker Compose

# Start all services (backend, postgres, redis)
docker-compose up -d

# View logs
docker-compose logs -f backend

The backend will be available at: http://localhost:8000

4. Set Up Frontend

cd frontend
npm install
npm run dev

The frontend will be available at: http://localhost:3000


🎯 Usage

Upload and Profile Data

  1. Navigate to http://localhost:3000
  2. Click "Launch App"
  3. Upload a CSV, Excel, or JSON file
  4. Toggle "Use AI insights" (recommended)
  5. Click "Analyze File"
  6. View comprehensive profiling results with:
    • Basic statistics (rows, columns, nulls)
    • Column-level analysis
    • Data quality issues
    • AI-generated insights and recommendations

Example Data

Use the included sample_data/sales_data.csv for testing.


πŸ“ Project Structure

dataops-copilot/
β”œβ”€β”€ backend/                    # FastAPI Backend
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ main.py            # FastAPI app entry
β”‚   β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”‚   └── config.py      # Configuration
β”‚   β”‚   β”œβ”€β”€ routers/           # API endpoints
β”‚   β”‚   β”‚   β”œβ”€β”€ data.py        # Data upload & profiling
β”‚   β”‚   β”‚   └── health.py      # Health checks
β”‚   β”‚   β”œβ”€β”€ services/          # Business logic
β”‚   β”‚   β”‚   β”œβ”€β”€ llm_router.py  # Multi-model routing (LiteLLM)
β”‚   β”‚   β”‚   └── data_profiler.py # Data analysis
β”‚   β”‚   └── models/            # Pydantic schemas
β”‚   β”œβ”€β”€ requirements.txt       # Python dependencies
β”‚   └── Dockerfile
β”‚
β”œβ”€β”€ frontend/                   # Next.js Frontend
β”‚   β”œβ”€β”€ app/                   # Next.js 14 App Router
β”‚   β”‚   β”œβ”€β”€ page.tsx           # Landing page
β”‚   β”‚   β”œβ”€β”€ dashboard/         # Main app
β”‚   β”‚   └── layout.tsx
β”‚   β”œβ”€β”€ components/
β”‚   β”‚   └── features/          # Feature components
β”‚   β”œβ”€β”€ lib/
β”‚   β”‚   └── api.ts             # API client
β”‚   └── package.json
β”‚
└── docker-compose.yml         # Local development

πŸ› οΈ Development

Backend Development

Run without Docker:

cd backend

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run the server
uvicorn app.main:app --reload --port 8000

API Documentation:

Frontend Development

cd frontend

# Install dependencies
npm install

# Run dev server
npm run dev

# Build for production
npm run build
npm start

Testing

# Backend tests
cd backend
pytest

# Frontend tests
cd frontend
npm test

🎨 Tech Stack

Backend

  • FastAPI - Modern async Python web framework
  • LiteLLM - Unified API for multiple LLM providers
  • Pandas/Polars - Data manipulation
  • DuckDB - In-memory SQL analytics
  • SQLAlchemy - ORM
  • Redis - Caching and task queue
  • Pydantic - Data validation

Frontend

  • Next.js 14 - React framework with App Router
  • TypeScript - Type safety
  • Tailwind CSS - Utility-first CSS
  • React Query - Server state management
  • Axios - HTTP client
  • Lucide React - Icon library

AI Models

  • Claude Sonnet 4.5 - Complex reasoning ($3/$15 per 1M tokens)
  • GPT-5 mini - Latest OpenAI model ($0.15/$0.60 per 1M tokens - estimated)
  • Gemini 2.0 Flash - FREE during preview! ($0/$0 per 1M tokens)
  • Azure GPT-4o-mini - Enterprise option ($0.165/$0.66 per 1M tokens)

🚒 Deployment

Deploy to Railway (Free Tier)

Backend:

# Install Railway CLI
npm install -g @railway/cli

# Login
railway login

# Deploy
railway up

Frontend (Vercel):

# Install Vercel CLI
npm install -g vercel

# Deploy
cd frontend
vercel --prod

Environment Variables for Production

Backend (Railway):

ANTHROPIC_API_KEY=sk-ant-xxxxx
OPENAI_API_KEY=sk-xxxxx
GOOGLE_API_KEY=xxxxx
DATABASE_URL=postgresql://...
REDIS_URL=redis://...
DEBUG=False

Frontend (Vercel):

NEXT_PUBLIC_API_URL=https://your-railway-backend.up.railway.app

πŸ’° Cost Breakdown

Development (Testing)

  • LLM APIs: ~$1-3/month (Gemini 2.0 Flash is FREE!)
  • Hosting: $0 (free tiers)
  • Total: ~$1-3/month

Production (Active Use)

  • LLM APIs: ~$5-10/month (mostly using free/cheap models)
  • Railway: $5/month (500 hrs)
  • Database: $0 (Supabase free tier)
  • Total: ~$10-15/month

Cost Optimization:

  • Use Gemini 2.0 Flash for most tasks (FREE!)
  • Use GPT-5 mini for fast structured outputs (latest OpenAI!)
  • Reserve Claude for complex reasoning only
  • Enable prompt caching
  • Use DuckDB for in-memory analytics (free)

🎯 Roadmap

Phase 1: MVP (Week 1-2) βœ…

  • Multi-model routing setup
  • Data profiling with LLM insights
  • Next.js frontend with file upload
  • Docker development environment

Phase 2: SQL & BI (Week 3)

  • Natural language to SQL generation
  • Query execution with DuckDB
  • Interactive chart generation

Phase 3: Vision & Dashboards (Week 4)

  • Dashboard screenshot OCR (Gemini)
  • Metric extraction
  • Auto-dashboard generation

Phase 4: Advanced Features (Week 5+)

  • Data cleaning workflows
  • Schema mapping
  • Export to Power BI/Tableau
  • User authentication
  • Database integration (Snowflake, BigQuery)

🀝 Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

πŸ“ License

MIT License - feel free to use this for your portfolio or commercial projects!


πŸ‘¨β€πŸ’Ό Built By

William Kim - AI Engineer


⭐ Show Your Support

If this helped you land interviews, give it a ⭐️!


Note: This is a portfolio project demonstrating full-stack AI engineering skills. For production use, add proper authentication, rate limiting, and monitoring.

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published