A full-stack application that intelligently correlates columns between two CSV files using AI-driven context collection, semantic matching, and statistical analysis. Reduces false positives by 30-50% through business context awareness.
Project Euler helps you automatically discover relationships between columns in two different CSV filesβeven when they have different names, formats, or structures. By collecting business context about your datasets, it dramatically improves correlation accuracy and provides confidence scores for each match.
Perfect for:
- Data migration and ETL pipelines
- Database schema mapping
- Data integration projects
- Business intelligence workflows
- Legacy system modernization
- AI-Driven Question Generation: Automatically creates relevant questions based on your data
- Multi-Step Wizard: Collects business context about datasets (purpose, domain, entities)
- Smart Matching: Uses context to filter false positives and boost confidence scores
- Custom Mappings: Define specific column pairs with 95% confidence guarantee
- Column Exclusions: Filter out debug/temporary columns from analysis
- Statistical Analysis: Correlation coefficients for numeric data
- Semantic Matching: AI-powered name similarity and meaning analysis
- Distribution Comparison: Matches columns with similar data patterns
- Confidence Scoring: 0-100% confidence for each column pair
- Interactive Visualization: Flow diagram showing relationships with color-coded confidence
- API Key Encryption: AES-GCM encryption for localStorage (Web Crypto API)
- Security Headers: CSP, X-Frame-Options, X-Content-Type-Options, Referrer-Policy
- Rate Limiting: Sliding window algorithm with HTTP 429 responses
- HTTPS Enforcement: Production SSL/TLS support with Nginx reverse proxy
- CORS Protection: Configurable allowed origins for production
- React Portal Modal: Full-screen context wizard with smooth animations
- Two-Panel Layout: Vertical stepper + questionnaire for intuitive navigation
- Progress Indicators: Real-time feedback on context collection progress
- Export Functionality: Download correlation mappings as JSON
- Responsive Design: Works seamlessly on desktop and tablet
- Local LLM: Ollama support (Llama3, Mistral, Qwen, etc.)
- Cloud LLM: Optional OpenAI/Anthropic/Gemini integration
- Configurable UI: Change model and endpoint through the app
- Fallback Support: Graceful degradation when LLM unavailable
βββββββββββββββββββββββββββ
β Next.js Frontend β
β (React + TypeScript) β
β β
β β’ Context Wizard β
β β’ Dashboard β
β β’ API Key Manager β
β β’ Visualization β
ββββββββββββ¬βββββββββββββββ
β
β REST API (Port 8001)
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββ
β Backend (Choose One) β
ββββββββββββββββββββ¬ββββββββββββββββββββββββββββ€
β Python (FastAPI)β Go (Chi Router) β
β β β
β β’ Context Serviceβ β’ Context Service β
β β’ Question Gen β β’ Question Generator β
β β’ ML Matcher β β’ AI Semantic Matcher β
β β’ Rate Limiting β β’ Adaptive Learning β
β β’ Pandas Analysisβ β’ Pattern Learning β
β β β’ Confidence Calibration β
ββββββββββ¬ββββββββββ΄βββββββββββ¬βββββββββββββββββ
β β
ββββββββββ¬ββββββββββββ
β
ββββΊ Ollama (Local LLM)
ββββΊ OpenAI/Anthropic (Optional)
| Feature | Python (FastAPI) | Go (Chi) |
|---|---|---|
| CSV Parsing | Pandas | Native Go |
| ML Matching | Sentence Transformers | Heuristic + LLM |
| Learning | Basic | Adaptive Weights, Pattern Learning |
| Performance | Good | Excellent |
| Memory | Higher | Lower |
- Python 3.10+
- Node.js 18+
- Ollama (for local LLM) - Download
- Optional: OpenAI/Anthropic API Key (for cloud LLM)
# Download from https://ollama.ai/download
# Then pull a model
ollama pull qwen3-vl:2b
# or
ollama pull llama3
ollama pull mistralcd backend
# Create virtual environment
python -m venv venv
# Activate (Windows)
venv\Scripts\activate
# Activate (macOS/Linux)
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# (Optional) Create .env file
cp .env.template .env
# Edit .env with your API keys if using cloud LLM
# Start backend
python main.pyBackend runs on http://localhost:8001
Go Backend Features: Adaptive weight learning, pattern learning, confidence calibration, AI semantic matching via Ollama.
cd backend-go
# Build
go build ./cmd/server/main.go
# Run
go run ./cmd/server/main.go
# or
./main.exe # Windows
./main # Linux/macOSBackend runs on http://localhost:8001
Go Backend Endpoints:
| Endpoint | Method | Description |
|---|---|---|
/upload |
POST | Upload CSV files |
/column-similarity |
GET | Get column matches (add ?use_ai=true for LLM) |
/correlation |
GET | Get numeric correlations |
/feedback/match |
POST | Submit match feedback (π/π) |
/feedback/stats |
GET | Get learning statistics |
/config/ollama |
GET/POST | Configure Ollama |
cd frontend
# Install dependencies
npm install
# (Optional) Create .env.local for custom API URL
echo "NEXT_PUBLIC_API_URL=http://localhost:8001" > .env.local
# Start frontend
npm run devFrontend runs on http://localhost:3000
Navigate to http://localhost:3000 and start correlating!
-
Upload Two CSV Files
Click "Upload" for File 1 and File 2 (or drag & drop) -
Add Context (Recommended)
Click "Add Context & Generate" to open the wizard:- Step 1: Answer questions about File 1 (purpose, domain, entities)
- Step 2: Answer questions about File 2
- Step 3: Describe relationship between files
- Step 4: Review and confirm
-
View Correlation Results
Interactive flow diagram showing column relationships with confidence percentages -
Export Mapping
Download the correlation results as JSON for use in ETL pipelines
Define specific column pairs that should map together:
- Example:
user_id(File 1) βcustomer_id(File 2) - Automatically assigned 95% confidence
Exclude columns from correlation:
- Temp columns, debug fields, metadata, etc.
- Reduces noise and improves accuracy
When both files belong to the same business domain (e.g., "Sales"), similar column names receive a 10% confidence boost.
Files with overlapping key entities (e.g., "Customer", "Order") get up to 20% confidence boost for related columns.
# Environment
ENVIRONMENT=development # or production
# CORS
ALLOWED_ORIGINS=http://localhost:3000,http://127.0.0.1:3000
ALLOWED_ORIGINS_PROD=https://yourdomain.com # Production only
# Ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=qwen3-vl:2b
# Cloud LLM (Optional)
OPENAI_API_KEY=sk-your-key-here
# Rate Limiting
RATE_LIMIT_ENABLED=True
MAX_REQUESTS_PER_MINUTE=60
MAX_LLM_CALLS_PER_HOUR=100
# File Upload
MAX_FILE_SIZE=104857600 # 100MB
MAX_ROWS_FOR_ANALYSIS=1000000# API URL
NEXT_PUBLIC_API_URL=http://localhost:8001
# Environment
NEXT_PUBLIC_ENVIRONMENT=developmentYou can configure Ollama directly in the app:
- Click the "API Keys" button in the dashboard
- Scroll to "Ollama Local" section
- Set Base URL and Model Name
- Click "Save Ollama Config"
Changes take effect immediately without restarting the backend.
project_euler/
βββ backend/ # Python Backend (FastAPI)
β βββ app/
β β βββ routers/api.py # API endpoints
β β βββ services/
β β β βββ context_service.py # Context management
β β β βββ question_generator.py
β β β βββ similarity.py
β β β βββ llm.py
β β βββ utils/
β β βββ config.py
β βββ main.py
β βββ requirements.txt
β
βββ backend-go/ # Go Backend (Chi Router)
β βββ cmd/server/main.go # Entry point
β βββ internal/
β β βββ api/handlers.go # HTTP handlers
β β βββ service/
β β β βββ context.go # Context management
β β β βββ enhanced_similarity.go # Column matching
β β β βββ ai_matcher.go # LLM-powered matching
β β β βββ adaptive_learning.go # Weight learning
β β β βββ confidence_calibration.go
β β β βββ pattern_learning.go
β β β βββ feedback_learning.go
β β βββ llm/service.go # Ollama integration
β β βββ state/state.go # Global state
β βββ go.mod
β
βββ frontend/ # Next.js Frontend
β βββ app/
β βββ components/
β β βββ dashboard.tsx
β β βββ context-wizard.tsx
β β βββ ui/
β βββ lib/
β β βββ api-config.ts
β β βββ crypto.ts
β βββ package.json
β
βββ README.md
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
MIT License - see LICENSE file for details
- Provide detailed context: More context = better accuracy
- Use consistent domains: Files from the same business area correlate better
- Define custom mappings: For known column pairs, set them explicitly
- Exclude irrelevant columns: Temp/debug columns add noise
- Review confidence scores: Values <50% may need manual verification
- Export mappings: Save results for reuse in ETL pipelines
- Ollama - Local LLM runtime
- Next.js - React framework
- FastAPI - High-performance Python web framework
- Shadcn UI - Beautiful component library
- pandas - Data manipulation library
