RAG Service

A small, production-shaped Retrieval-Augmented Generation API built with FastAPI. Ingest documents, then ask questions that are answered only from the indexed content, with inline citations and a hallucination guard.

The design is deliberately dependency-light and readable — every layer (chunking, embeddings, vector store, LLM client) is a single focused module you can swap out (e.g. FAISS or pgvector for the store, any OpenAI-compatible endpoint for the LLM) without touching the rest.

Architecture

                ┌─────────────┐   ingest    ┌──────────┐   ┌──────────────┐
  documents ───▶│  chunking   │────────────▶│ embedder │──▶│ vector store │
                └─────────────┘             └──────────┘   └──────────────┘
                                                                  │ cosine top-k
  question ──▶ embed ──▶ retrieve ───────────────────────────────┘
                              │
                              ▼
                     grounded prompt ──▶ LLM (Groq / OpenRouter / …) ──▶ answer + citations

app/rag/chunking.py — overlapping, boundary-aware text splitting.
app/rag/embeddings.py — sentence-transformers when available, with a deterministic hashing fallback so it always runs.
app/rag/store.py — numpy cosine-similarity store with JSON persistence.
app/llm/client.py — provider-agnostic OpenAI-compatible async client.
app/rag/pipeline.py — ties it together; grounded system prompt.
app/main.py — FastAPI endpoints.

Endpoints

Method	Path	Description
GET	`/health`	Store size, embedder backend, model
POST	`/ingest`	Index a document (`{text, source}`)
POST	`/query`	Ask a grounded question (`{question}`)

Interactive docs at /docs (Swagger UI) once running.

Quick start

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

cp .env.example .env        # set LLM_API_KEY (free key at console.groq.com)
uvicorn app.main:app --reload

# index a document
curl -X POST localhost:8000/ingest -H 'content-type: application/json' \
  -d '{"text":"The Eiffel Tower is in Paris, France.","source":"facts"}'

# ask a grounded question
curl -X POST localhost:8000/query -H 'content-type: application/json' \
  -d '{"question":"Where is the Eiffel Tower?"}'

Tests

pytest -q

Tests cover chunking edge cases, embedding normalisation, retrieval relevance, store round-trip persistence, and the full /ingest → /query flow with the LLM mocked (no API key needed).

Tech

FastAPI · Pydantic v2 · httpx · numpy · sentence-transformers (optional) · Docker. LLM provider is pluggable via LLM_BASE_URL / LLM_MODEL.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Service

Architecture

Endpoints

Quick start

Tests

Tech

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG Service

Architecture

Endpoints

Quick start

Tests

Tech

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages