Skip to content

xesta44/data-forge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔄 DataForge — AI-Powered Data Processing Pipeline

Upload messy data. Get clean, structured results. Powered by GPT-4o-mini.

DataForge is a production-ready data processing pipeline that transforms unstructured CSV, JSON, and text files into clean, categorized, enriched data using AI — with a polished web dashboard and REST API.


✨ Features

  • Drag & drop upload — CSV, JSON, TXT support up to 500 rows per job
  • 4 built-in AI templates — plug in your data, choose a template, get results
  • Real-time job queue — live progress tracking while AI processes your data
  • Results preview — view structured output directly in the dashboard
  • Download results — export as CSV or JSON
  • Processing analytics — items processed per day, template distribution charts
  • Graceful demo mode — dashboard works without API key (local fallback logic)
  • REST API — integrate into any workflow with a clean HTTP API

🧠 Processing Templates

Template Input Output
Sentiment Analysis Customer reviews, feedback Sentiment, confidence score, topics
Product Attribute Extraction Product descriptions Name, price, color, material, size, brand
Entity Extraction Documents, emails, contracts People, organizations, locations, dates
Support Ticket Classification Help desk messages Urgency level, department, suggested action

🚀 Quick Start

1. Clone & Install

git clone https://github.com/xesta44/data-forge.git
cd data-forge
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

2. Configure

cp .env.example .env
# Edit .env and add your OpenAI API key

3. Run

uvicorn app:app --reload --port 8000

Open http://localhost:8000 → Dashboard is live.

No API key? The dashboard runs in demo mode with local processing fallback — no OpenAI key needed to explore.


📸 Screenshots

Dashboard Results
Upload area with drag & drop, job queue, live stats Structured output table with sentiment badges, download buttons

🔌 REST API

Submit a Processing Job

POST /api/process
Content-Type: multipart/form-data

file=@reviews.csv
template=sentiment
text_column=review_text   (optional)

Response:

{
  "job_id": "abc123-...",
  "items": 120,
  "template": "sentiment"
}

List Jobs

GET /api/jobs?limit=50

Get Job Status & Results

GET /api/jobs/{job_id}

Response:

{
  "id": "abc123-...",
  "template": "sentiment",
  "status": "completed",
  "filename": "reviews.csv",
  "total_items": 120,
  "processed_items": 120,
  "created_at": "2025-03-24T12:00:00",
  "completed_at": "2025-03-24T12:00:45",
  "results": [
    {
      "input": "Absolutely love this product...",
      "output": {
        "sentiment": "positive",
        "confidence": 0.97,
        "topics": ["quality", "delivery"],
        "summary": "Customer loves the product quality and fast delivery."
      }
    }
  ]
}

Download Results

GET /api/jobs/{job_id}/download?fmt=csv
GET /api/jobs/{job_id}/download?fmt=json

Processing Stats

GET /api/stats

Response:

{
  "total_jobs": 47,
  "completed_jobs": 45,
  "failed_jobs": 2,
  "total_items": 8420,
  "daily": [
    {"date": "2025-03-24", "items_processed": 320, "jobs_completed": 8}
  ]
}

🗂️ Project Structure

data-forge/
├── app.py              # FastAPI app — routes, API, dashboard serving
├── processor.py        # AI processing logic (OpenAI + local fallback)
├── database.py         # SQLite models for job tracking
├── templates/
│   └── dashboard.html  # Single-page dashboard (TailwindCSS + Chart.js)
├── sample_data/        # Example files to test with
│   ├── reviews.csv     # Customer reviews → sentiment analysis
│   ├── products.csv    # Product descriptions → attribute extraction
│   └── tickets.txt     # Support tickets → classification
├── requirements.txt
├── .env.example
└── README.md

🛠️ Tech Stack

Layer Technology
Backend Python 3.11+, FastAPI
AI OpenAI GPT-4o-mini (with local fallback)
Data Pandas
Storage SQLite
Frontend HTML/CSS/JS, Chart.js
Fonts Inter (Google Fonts)

💡 Use Cases

  • E-commerce — Bulk-enrich product catalogs with extracted attributes
  • Customer Success — Auto-classify support tickets by urgency and route to the right team
  • Market Research — Analyze survey responses and reviews at scale
  • Sales Ops — Clean and normalize CRM data exports
  • Legal/Finance — Extract entities and key data points from documents

📄 License

MIT — free to use, modify, and deploy.


Built by Leo Voss · AI & Automation Engineering

About

AI-powered data processing pipeline. Upload messy CSVs/JSON/text → get clean, structured data back. Built with Python, FastAPI, and OpenAI.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors