Upload messy data. Get clean, structured results. Powered by GPT-4o-mini.
DataForge is a production-ready data processing pipeline that transforms unstructured CSV, JSON, and text files into clean, categorized, enriched data using AI — with a polished web dashboard and REST API.
- Drag & drop upload — CSV, JSON, TXT support up to 500 rows per job
- 4 built-in AI templates — plug in your data, choose a template, get results
- Real-time job queue — live progress tracking while AI processes your data
- Results preview — view structured output directly in the dashboard
- Download results — export as CSV or JSON
- Processing analytics — items processed per day, template distribution charts
- Graceful demo mode — dashboard works without API key (local fallback logic)
- REST API — integrate into any workflow with a clean HTTP API
| Template | Input | Output |
|---|---|---|
| Sentiment Analysis | Customer reviews, feedback | Sentiment, confidence score, topics |
| Product Attribute Extraction | Product descriptions | Name, price, color, material, size, brand |
| Entity Extraction | Documents, emails, contracts | People, organizations, locations, dates |
| Support Ticket Classification | Help desk messages | Urgency level, department, suggested action |
git clone https://github.com/xesta44/data-forge.git
cd data-forge
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txtcp .env.example .env
# Edit .env and add your OpenAI API keyuvicorn app:app --reload --port 8000Open http://localhost:8000 → Dashboard is live.
No API key? The dashboard runs in demo mode with local processing fallback — no OpenAI key needed to explore.
| Dashboard | Results |
|---|---|
| Upload area with drag & drop, job queue, live stats | Structured output table with sentiment badges, download buttons |
POST /api/process
Content-Type: multipart/form-data
file=@reviews.csv
template=sentiment
text_column=review_text (optional)Response:
{
"job_id": "abc123-...",
"items": 120,
"template": "sentiment"
}GET /api/jobs?limit=50GET /api/jobs/{job_id}Response:
{
"id": "abc123-...",
"template": "sentiment",
"status": "completed",
"filename": "reviews.csv",
"total_items": 120,
"processed_items": 120,
"created_at": "2025-03-24T12:00:00",
"completed_at": "2025-03-24T12:00:45",
"results": [
{
"input": "Absolutely love this product...",
"output": {
"sentiment": "positive",
"confidence": 0.97,
"topics": ["quality", "delivery"],
"summary": "Customer loves the product quality and fast delivery."
}
}
]
}GET /api/jobs/{job_id}/download?fmt=csv
GET /api/jobs/{job_id}/download?fmt=jsonGET /api/statsResponse:
{
"total_jobs": 47,
"completed_jobs": 45,
"failed_jobs": 2,
"total_items": 8420,
"daily": [
{"date": "2025-03-24", "items_processed": 320, "jobs_completed": 8}
]
}data-forge/
├── app.py # FastAPI app — routes, API, dashboard serving
├── processor.py # AI processing logic (OpenAI + local fallback)
├── database.py # SQLite models for job tracking
├── templates/
│ └── dashboard.html # Single-page dashboard (TailwindCSS + Chart.js)
├── sample_data/ # Example files to test with
│ ├── reviews.csv # Customer reviews → sentiment analysis
│ ├── products.csv # Product descriptions → attribute extraction
│ └── tickets.txt # Support tickets → classification
├── requirements.txt
├── .env.example
└── README.md
| Layer | Technology |
|---|---|
| Backend | Python 3.11+, FastAPI |
| AI | OpenAI GPT-4o-mini (with local fallback) |
| Data | Pandas |
| Storage | SQLite |
| Frontend | HTML/CSS/JS, Chart.js |
| Fonts | Inter (Google Fonts) |
- E-commerce — Bulk-enrich product catalogs with extracted attributes
- Customer Success — Auto-classify support tickets by urgency and route to the right team
- Market Research — Analyze survey responses and reviews at scale
- Sales Ops — Clean and normalize CRM data exports
- Legal/Finance — Extract entities and key data points from documents
MIT — free to use, modify, and deploy.
Built by Leo Voss · AI & Automation Engineering