Before diving deep into the project, let's discuss Codex, the OpenAI CLI tool for automated code generation using the latest o4-mini and o3 models.
For this project, I exclusively used the o4-mini
model with --full-auto
and I planned the project idea in the codex.md
file using ChatGPT (o4-mini
again).
While I don't have experience with Claude Code for comparison, using Codex was excellent.
The tool and model handled almost everything in one shot, with only 2 errors due to outdated documentation in their training:
- OpenAI Python package (yes, you read that right)
- ChromaDB
I'm impressed by the model's speed, precision and overall performance. And it's cost-effective too. Building this app, including fixing the 2 errors and adding 2 features (frontend and stats routes), cost less than $1.50 USD. While larger apps will naturally cost more, the tool seems to optimize token usage to maintain reasonable costs. I still need to try o3, but so far, o4-mini has proven excellent and I'm eager to explore it further.
The main challenge remains the knowledge cutoff date. This should be moved to Feb/March 2025, or the OpenAI API/model should have internet access to fetch latest data, as outdated information is problematic in our rapidly evolving tech landscape. To bypass the model's knowledge cutoff, relevant documentation can be included in markdown files. For instance, doc_openai.md was added to provide details on the OpenAI Python Package.
Web Article Retrieval-Augmented Generation (RAG) API with FastAPI, Docker, and OpenAI
This project provides an API-first system for ingesting web articles, structuring their content via OpenAI, indexing them in a vector database, and answering user queries using a retrieval-augmented generation (RAG) approach.
- Asynchronous Ingestion: Background workers fetch URLs, extract article content with
trafilatura
, organize it into structured JSON using OpenAI GPT-4.1-nano, and store documents in MongoDB. - Vector Indexing: Text chunks are embedded with a local
sentence-transformers
model (all-MiniLM-L6-v2
) and stored in ChromaDB. - RAG Querying: FastAPI endpoints embed user questions, retrieve top-k relevant chunks from ChromaDB, and generate answers via OpenAI GPT-4.1-nano.
- Dockerized: All components (API, worker, MongoDB, Redis, ChromaDB) run in Docker containers orchestrated by Docker Compose.
- Language & Framework: Python 3.11, FastAPI
- HTTP Client:
httpx
- Article Extraction:
trafilatura
- LLM API: OpenAI
gpt-4.1-nano
viaopenai
Python SDK - Embeddings:
sentence-transformers
(all-MiniLM-L6-v2
) - Vector Database: ChromaDB
- Metadata Store: MongoDB
- Task Queue: Redis + RQ
- Containerization: Docker, Docker Compose
- Docker Engine (v20+)
- Docker Compose (v2+)
- OpenAI API key
- Clone the repository
git clone https://github.com/mxmarchal/neuraldocs.git cd neuraldocs
- Configure environment variables
Copy the example and set your OpenAI key:
cp .env.example .env # Edit .env and set OPENAI_API_KEY
- Start services
This will build the images and start the following services:
docker-compose up --build
- api: The main FastAPI application, accessible at
http://localhost:8000
. - worker: The RQ worker processing background tasks (no direct access needed).
- mongodb: The MongoDB database, accessible on the host at
mongodb://localhost:27018
(maps to container port 27017). - redis: The Redis server used for the task queue, accessible on the host at
redis://localhost:6379
. - chromadb: The ChromaDB vector store, accessible on the host at
http://localhost:8001
(maps to container port 8000).
- api: The main FastAPI application, accessible at
- Endpoint:
POST /add-url
- Body:
{ "url": "https://example.com/article" }
- Response:
{ "message": "URL queued for processing", "task_id": "<rq_job_id>" }
- Endpoint:
GET /tasks/{task_id}
- Response:
{ "task_id": "<rq_job_id>", "status": "queued|started|finished|failed", "result": { /* output of processing or error */ } }
- Endpoint:
POST /query
- Body:
{ "question": "What is the main idea of the article?", "top_k": 5 // optional, defaults to 5 }
- Response:
{ "answer": "The article explains ...", "sources": [ "https://example.com/article", ... ] }
- Endpoint:
GET /stats
- Response:
{ "documents": 123, // number of ingested documents in MongoDB "vectors": 456 // number of stored vectors in ChromaDB }
# 1. Add a URL
curl -X POST http://localhost:8000/add-url \
-H 'Content-Type: application/json' \
-d '{"url":"https://example.com/article"}'
# 2. Check ingestion status
curl http://localhost:8000/tasks/<task_id>
# 3. Ask a question
curl -X POST http://localhost:8000/query \
-H 'Content-Type: application/json' \
-d '{"question":"Key points from the article?"}'
.
├── docker-compose.yml # Multi-service orchestration
├── .env.example # Environment variable template
├── .env # Local environment variables (ignored by git)
├── README.md # Project overview & usage
└── app/
├── Dockerfile # API & worker image definition
├── requirements.txt # Python dependencies
├── config.py # Pydantic settings
├── db.py # DB & vector client setup
├── tasks.py # RQ tasks: ingestion pipeline
└── main.py # FastAPI application
Environment variables (in .env
):
Variable | Description | Default |
---|---|---|
OPENAI_API_KEY | OpenAI API key | |
MONGO_HOST | MongoDB hostname | mongodb |
MONGO_PORT | MongoDB port | 27017 |
REDIS_HOST | Redis hostname | redis |
REDIS_PORT | Redis port | 6379 |
CHROMA_HOST | ChromaDB hostname | chromadb |
CHROMA_PORT | ChromaDB port | 8000 |
EMBEDDING_MODEL_NAME | SentenceTransformer model name | all-MiniLM-L6-v2 |
NANO_MODEL_NAME | OpenAI model for structuring articles | gpt-4.1-nano |
RAG_MODEL_NAME | OpenAI model for RAG answering | gpt-4.1-nano |
TOP_K | Default number of retrieved chunks | 5 |