Skip to content

wpalish/rag-service

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG Service

A small, production-shaped Retrieval-Augmented Generation API built with FastAPI. Ingest documents, then ask questions that are answered only from the indexed content, with inline citations and a hallucination guard.

The design is deliberately dependency-light and readable — every layer (chunking, embeddings, vector store, LLM client) is a single focused module you can swap out (e.g. FAISS or pgvector for the store, any OpenAI-compatible endpoint for the LLM) without touching the rest.

Architecture

                ┌─────────────┐   ingest    ┌──────────┐   ┌──────────────┐
  documents ───▶│  chunking   │────────────▶│ embedder │──▶│ vector store │
                └─────────────┘             └──────────┘   └──────────────┘
                                                                  │ cosine top-k
  question ──▶ embed ──▶ retrieve ───────────────────────────────┘
                              │
                              ▼
                     grounded prompt ──▶ LLM (Groq / OpenRouter / …) ──▶ answer + citations
  • app/rag/chunking.py — overlapping, boundary-aware text splitting.
  • app/rag/embeddings.pysentence-transformers when available, with a deterministic hashing fallback so it always runs.
  • app/rag/store.py — numpy cosine-similarity store with JSON persistence.
  • app/llm/client.py — provider-agnostic OpenAI-compatible async client.
  • app/rag/pipeline.py — ties it together; grounded system prompt.
  • app/main.py — FastAPI endpoints.

Endpoints

Method Path Description
GET /health Store size, embedder backend, model
POST /ingest Index a document ({text, source})
POST /query Ask a grounded question ({question})

Interactive docs at /docs (Swagger UI) once running.

Quick start

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

cp .env.example .env        # set LLM_API_KEY (free key at console.groq.com)
uvicorn app.main:app --reload
# index a document
curl -X POST localhost:8000/ingest -H 'content-type: application/json' \
  -d '{"text":"The Eiffel Tower is in Paris, France.","source":"facts"}'

# ask a grounded question
curl -X POST localhost:8000/query -H 'content-type: application/json' \
  -d '{"question":"Where is the Eiffel Tower?"}'

Tests

pytest -q

Tests cover chunking edge cases, embedding normalisation, retrieval relevance, store round-trip persistence, and the full /ingest/query flow with the LLM mocked (no API key needed).

Tech

FastAPI · Pydantic v2 · httpx · numpy · sentence-transformers (optional) · Docker. LLM provider is pluggable via LLM_BASE_URL / LLM_MODEL.

About

Retrieval-Augmented Generation API (FastAPI): ingest docs, ask grounded questions with citations. Pluggable LLM provider.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors