A conversational interface for a long thesis and the kind friends who would rather ask questions than read the whole PDF.
This project is a retrieval-augmented generation (RAG) app built around my PhD thesis at Utrecht University. It turns the thesis into something you can query through a browser UI or a simple command-line interface, using an in-memory ChromaDB collection, hybrid retrieval, optional cross-encoder reranking, and Gemini-generated answers.
The source material is my Utrecht University PhD thesis: thesis PDF.
This started after some friends basically told me, "I support you, but I am not reading all of that." Fair enough. So I gave the thesis a chat interface.
- The thesis is real.
- The PDF is long.
Ctrl+Fis helpful, but it has no patience for follow-up questions.
- FastAPI backend with a browser-based chat interface
- Hybrid retrieval that blends embedding search and keyword scoring
- Optional cross-encoder reranking for cleaner result ordering
- Citation lookup from bibliography entries when the query calls for references
- Conversation memory that keeps the last 10 turns
- CLI entrypoint for local interactive use, if the terminal is your preferred habitat
- Python 3.13 or newer
uv- A
GEMINI_API_KEY - Internet access for Gemini API calls and, on first use, any model downloads triggered by dependencies
- Install dependencies:
uv sync- Create an environment file:
cp .env.example .env- Edit
.envand set your Gemini key:
GEMINI_API_KEY=your_api_key_here- Start the web app:
uv run python run_web.py- Open
http://localhost:8000.
Recommended:
uv run python run_web.pyDevelopment with auto-reload:
uv run uvicorn app:app --app-dir src --reload --host 0.0.0.0 --port 8000uv run python src/main.pyThe CLI prompts for:
- query text
- number of results
- embedding weight
- whether reranking is enabled
- which cross-encoder model to use
GET /: serves the web interfacePOST /query: runs retrieval and answer generationGET /health: reports whether the app and database are initializedGET /memory/status: returns conversation-memory metadataGET /memory/history: returns the stored conversation turnsPOST /memory/clear: clears the current conversation memory
{
"query": "What are decision maps?",
"n_results": 5,
"embedding_weight": 0.7,
"use_reranking": true,
"rerank_model": "cross-encoder/ms-marco-MiniLM-L-6-v2"
}curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{
"query": "What are decision maps?",
"n_results": 5,
"embedding_weight": 0.7,
"use_reranking": true,
"rerank_model": "cross-encoder/ms-marco-MiniLM-L-6-v2"
}'{
"answer": "Generated answer text",
"query": "What are decision maps?",
"memory_turns": 1
}- On startup, the app reads
data/chunks.txtand loads the text chunks into a ChromaDB collection. - For each query, it retrieves a candidate set using embedding similarity and BM25-style keyword scoring.
- If enabled, it reranks the retrieved chunks with a cross-encoder.
- It asks Gemini to produce the final answer from the retrieved chunks and current conversation context.
- If the query appears to need references, it extracts citation keys from retrieved LaTeX snippets and resolves them through
data/bib_entries.json. - The resulting turn is stored in a rolling 10-turn conversation memory.
.
|- data/
| |- bib_entries.json
| `- chunks.txt
|- src/
| |- app.py
| |- main.py
| |- generation/
| | |- answer_generator.py
| | `- citation_handler.py
| |- models/
| | `- memory.py
| |- search/
| | |- database.py
| | |- hybrid_search.py
| | `- reranking.py
| `- utils/
| `- config.py
|- static/
| `- index.html
|- pyproject.toml
|- run_web.py
`- uv.lock
run_web.py: small startup wrapper for the FastAPI appsrc/app.py: API routes, startup lifecycle, and frontend servingsrc/main.py: CLI entrypointsrc/search/database.py: ChromaDB collection creation from thesis chunkssrc/search/hybrid_search.py: hybrid retrieval and reranking orchestrationsrc/generation/answer_generator.py: Gemini answer generationsrc/generation/citation_handler.py: citation detection and bibliography lookupstatic/index.html: browser chat UI
- The Chroma collection is rebuilt on every startup and is not persisted between runs.
- The app raises an error during import if
GEMINI_API_KEYis missing. embedding_weightmust stay between0.0and1.0.- Debug logging is currently enabled in
src/utils/config.py. - The current citation extraction logic looks for LaTeX
\citep{...}and\citeyear{...}patterns in retrieved chunks.
- There is no automated test suite in the repo.
- There is no
.envcommitted by design, so the app is not runnable until the environment variable is set. - The web UI and API are functional, but this is still a research-style codebase rather than a production-hardened service.