Hierarchical retrieval for structured documents. FAISS cosine k-NN seeds a result set, then subtree_expand walks the chunk parent/child graph to pull every parent, sibling, and child of each hit. The flat encoder finds the right region of the document; the hierarchy walker fills in the surrounding context.
+154% Recall@5 on GDPR · +120% on CPython stdlib · <1 ms/query CPU · no GPU
Most RAG pipelines treat documents as flat bags of chunks. When the right answer lives in paragraph 15(1)(c) of a regulation, flat retrieval returns the chunk for 15(1)(c) — but loses the article header, the surrounding paragraphs, and the chapter context that make the answer interpretable.
HypRAG keeps that structure. Each chunk carries a node_path (e.g. gdpr.ch3.art15.p1.pa) and a depth tag. After the FAISS lookup, subtree_expand returns the parent, the siblings, and the children of every hit. Same recall as flat FAISS at the seed step, but a much higher hit rate after expansion — the answer arrives with its scaffolding intact.
| Condition | Recall@5 | Precision | Latency |
|---|---|---|---|
| FAISS (flat) | 0.286 | 0.590 | 0.1 ms |
| FAISS + subtree_expand | 0.727 | 0.441 | 0.6 ms |
| Hybrid (BM25+FAISS) + expand | 0.683 | 0.408 | 1.8 ms |
Expansion lift: +154.2 %. BM25 hybrid hurts on regulatory text (uniform vocabulary).
| Chunker | Chunks | FAISS | FAISS+expand | Lift |
|---|---|---|---|---|
GDPRChunker (domain-specific) |
821 | 0.221 | 0.549 | +148 % |
HTMLChunker (generic, no domain) |
896 | 0.256 | 0.564 | +120 % |
The expansion lift is algorithm-driven, not chunker-biased. A source-agnostic chunker that only uses HTML heading levels and <ol>/<ul> nesting reaches essentially the same post-expansion recall as a hand-crafted GDPR parser.
| Condition | Recall@5 |
|---|---|
| FAISS (flat) | 0.092 |
| FAISS + subtree_expand | 0.203 |
Expansion lift: +120 %.
Reproducing the GDPR numbers:
python -m benchmarks.run_legal_comparison --html-path gdpr_corpus.html
python -m benchmarks.compare_chunkers --html-path gdpr_corpus.htmlpip install hyprag # core (faiss, sentence-transformers, numpy)
pip install hyprag[legal] # adds beautifulsoup4 for HTML chunkers
pip install hyprag[api] # adds fastapi + uvicorn for the HTTP server
pip install hyprag[dev] # pytest, ruff, mypyfrom hyprag.retriever import HypragRetriever
r = HypragRetriever() # default encoder: BAAI/bge-base-en-v1.5
r.index_path("./myproject") # AST-based chunker, module → class → method
for chunk in r.query("how does the parser handle escape sequences?", k=5):
print(chunk.depth, chunk.node_path, chunk.start_line)from hyprag.chunkers import GDPRChunker # domain-specific, +154% lift
from hyprag.chunkers import HTMLChunker # generic, +120% lift, zero domain knowledge
from hyprag.retriever import HypragRetriever
# Fetch the corpus once (per-article from gdpr-info.eu; takes ~5 min)
chunks = GDPRChunker().load() # or .load(html_path=Path("..."))
r = HypragRetriever()
r.index_chunks(chunks)
for chunk in r.query("when must a data breach be reported?", k=5):
print(chunk.depth, chunk.node_path)
print(chunk.text[:200])HTMLChunker works on any HTML document — Wikipedia, documentation, statutes — using only <h1>–<h6> levels and <ol>/<ul>/<li> nesting as hierarchy signals.
uvicorn api.main:app --reloadPOST /index/gdpr, POST /index/codebase, POST /index/texts build indexes. POST /search queries them. Each request is authenticated via X-API-Key; tiering (free / paid) caps vectors, queries/day, and TTL — see api/auth.py.
Every IndexResponse returns depth_distribution and warnings, so callers can verify the chunker recovered the hierarchy as expected without inspecting internals.
subtree_expand(results, corpus) is the core algorithm. Given any list of seed chunks and the full corpus, it returns the seeds plus every chunk that is:
- a parent —
chunk.node_pathmatches a seed'sparent_path - a child —
chunk.parent_pathmatches a seed'snode_path - a sibling — same
parent_pathas a seed
All three are toggleable; max_expand caps the result size. The walk is O(N) per query — cheap enough to run on every search.
- No geometry. Earlier versions used a Poincaré-ball backend for hyperbolic embeddings. Four experiments across two corpora produced numerically identical results to FAISS at up to 257× the latency. Removed in v0.5.0; the git history preserves the code.
- No LLM summaries. Tested; recall regressed. Not coming back.
- No cross-encoder reranking by default.
bge-reranker-basehurt on code (Recall@5 0.349 → 0.080). Plug your own in if you have a domain-tuned one. - No BM25 by default. Hurts on legal text (uniform vocabulary). Opt-in per-request via
HybridRetrieverfor code corpora where identifiers carry signal.
v0.5.x. The algorithm is stable. The API is stable. The chunkers are tested against real corpora. What's missing is a hosted demo and packaging polish.
MIT.