FastAPI reimplementation of invoice-reader-api, originally built with Flask.
REST API that extracts structured data from PDF invoices using FastAPI and OpenAI GPT-4o-mini. Returns validated JSON responses with automatic Swagger documentation.
- Upload any PDF invoice and extract structured data automatically
- Handles invoices of any format or layout using GPT-4o-mini
- Returns vendor, client, line items, totals, tx, and currency
- Automatic request/response validation with Pydantic
- Interactive API documentation at
/docs - Tested with real invoices accross NZD, AUD, and USD
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/health |
Health check |
| POST | /api/invoices/upload |
Upload and process a PDF invoice |
| GET | /api/invoices/<id> |
Get a processed invoice by ID |
| GET | /api/invoices |
List all processed invoices |
- Clone this repository
- Install dependencies:
pip install -r requirements.txt
- Create a
.envfile:
OPENAI_API_KEY=your_key_here
- Run the API:
uvicorn main:app --reload
- API will be available at
http://127.0.0.1:8000 - Interactive docs at
http://127.0.0.1:8000/docs
curl http://127.0.0.1:800/api/invoices/upload \
-F "file=@invoice.pdf"
curl http://127.0.0.1:8000/api/invoices
- Python 3
- FastAPI
- Uvicorn
- Pydantic
- pdfplumber
- OpenAI API (gpt-4o-mini)
- AI parsing over regex: Invoice formats vary too much for regex to be reliable. GPT-4o-mini handles any layout consistently.
- temperature=0: Deterministic output ensures the same invoice always returns the same result.
- Pydantic models: All request and response data is validated automatically — no manual type checking needed.
- In-memory storage: Intentional for simplicity. Production version would use SQLite or PostgreSQL.