This repository implements a modular Retrieval-Augmented Generation (RAG) system with a complete ingestion workflow, semantic text chunking, embedding generation, vector search, and multi-turn conversational capabilities. The system is designed with extensibility and maintainability in mind, using FastAPI, SQLAlchemy, and OpenAI models.
The system is composed of independent but connected subsystems:
- Ingestion Pipeline — loads documents, extracts metadata, chunks text, and generates embeddings.
- Vector Search Engine — performs similarity search using stored embeddings.
- Chat Pipeline — orchestrates multi-turn conversations and optional retrieval.
- Knowledge Store — relational models for documents, chunks, embeddings, conversations, and messages.
- API Gateway — exposes
/ingest,/search, and/chatendpoints.
A high-level architecture diagram is available below:
Additional diagrams and design notes are available in the
docs/directory.
- Document ingestion with metadata and structured storage
- Semantic chunking optimized for embedding models
- Embedding generation using OpenAI embedding APIs
- Vector similarity search over chunked documents
- Retrieval-augmented multi-turn chat completion
- SQLAlchemy ORM modeling with UUID-based conversation sessions
- Modular services layer for easy extension or substitution
- RESTful API exposure via FastAPI
- Extensible codebase structured for testing and integration
The schema models the core elements of a RAG system.
Stores metadata for each ingested source file.
idnamepathcreated_atdocument_metadata(JSON)
Semantic text chunks with associated embeddings.
iddocument_idchunk_indextextembedding(JSON)created_atchunk_metadata(JSON)
Represents a conversational session.
id(UUID)knowledge_base_idcreated_at
Linked to conversations; stores user and assistant messages.
idconversation_idrolecontentcreated_at
Indexing is applied based on common retrieval patterns.
Processes documents and populates the knowledge base.
Performs semantic search over stored document embeddings.
Request
{
"query": "What does clause 7 describe?"
}Response
{
"results": [...],
"total_found": 5
}Generates a conversational response, optionally using retrieved context.
Request
{
"query": "Explain the confidentiality section",
"conversation_id": "uuid"
}app/
api/ # FastAPI route handlers
core/ # Configurations and shared constants
db/ # SQLAlchemy models and database session
ingest.py # Ingestion workflow entry point
logging_config.py # Application-wide logging setup
main.py # FastAPI application bootstrap
services/ # Embedding, chunking, retrieval, generation services
utils/ # Common utilities
docs/
architecture-diagram.png
additional-design-docs.md
sample_data/
tests/
requirements.txt
pip install -r requirements.txtCreate app/.env:
OPENAI_API_KEY=your_api_key_here
python3 -m app.ingestuvicorn app.main:app --reloadOpen API Documentation:
http://localhost:8000/docs
pytestpytest --cov=app tests/- Integrate a dedicated vector database (FAISS, Qdrant, Weaviate)
- Add hybrid retrieval (dense + sparse)
- Stream responses for chat completions
- Implement ingestion via REST endpoint
- Add web-based admin dashboard
- Enhance conversation summarization
MIT License
Towseef Altaf
Software Engineer – Distributed Systems, Developer Productivity, AI Engineering
