Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions DEVELOPMENT_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,25 @@ pnpm typecheck

Useful in CI as a fast gate before the slower `build`.

### Run the eval harness

```bash
pnpm eval # 32-doc corpus, 50 queries
pnpm eval -- --verbose # per-query lines
pnpm eval -- --save baseline.json # snapshot
pnpm eval -- --compare baseline.json # diff vs snapshot
```

Reports NDCG@10 / MRR / Recall@10 — overall, per category, per
router-chosen strategy. The harness instantiates a default `Augur`,
indexes the corpus, runs every query, and grades the retrieved chunks
against the labeled relevant docs. Because the runner is a pure function
of an `Augur` instance, swap the embedder, adapter, router, or reranker
between runs to measure the real impact of any change.

Corpus and queries live at `evaluations/{corpus,queries}.json` —
add domain-specific labeled pairs there to evaluate on your own data.

### Format

```bash
Expand Down
193 changes: 193 additions & 0 deletions EXAMPLES.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,199 @@ const augr = new Augur({

Three constructor args. No code changes elsewhere. The router automatically adapts to Pinecone's keyword-incapable status (it stops picking keyword and lets the reranker carry precision).

### Picking a better default embedder (no API key needed)

The default `HashEmbedder` is a feature-hashed bag-of-tokens — useful as
a deterministic placeholder, but its vectors are not semantically
meaningful. Three offline upgrade paths:

**TF-IDF (no extra deps).** Feature-hashed TF-IDF with Porter stemming and
stopword removal — the classical IR baseline:

```ts
import { Augur, TfIdfEmbedder, MetadataChunker, SentenceChunker } from "@augur/core";

const augr = new Augur({
embedder: new TfIdfEmbedder(),
chunker: new MetadataChunker({ base: new SentenceChunker() }),
});
```

**Local sentence-transformer (recommended for production-grade local).**
`LocalEmbedder` runs a real sentence-transformer model entirely on-device
via `@huggingface/transformers` (ONNX Runtime). Default model is
`Xenova/all-MiniLM-L6-v2` (~22MB, 384d). First run downloads the model
to `~/.cache/huggingface/hub`; subsequent runs are instant.

```ts
import {
Augur,
LocalEmbedder,
LocalReranker,
MetadataChunker,
SentenceChunker,
} from "@augur/core";

const augr = new Augur({
embedder: new LocalEmbedder(), // 22MB, 384d
reranker: new LocalReranker(), // 22MB cross-encoder
chunker: new MetadataChunker({ base: new SentenceChunker() }),
});
```

You'll need to install the optional peer dep:

```bash
pnpm add @huggingface/transformers
```

For higher accuracy at a slightly larger size, swap the model and supply
the model's required prefixes:

```ts
// BGE-small: top of MTEB at this size; query prefix required.
new LocalEmbedder({
model: "Xenova/bge-small-en-v1.5",
queryPrefix: "Represent this sentence for searching relevant passages: ",
});

// E5-small: balanced; both prefixes required.
new LocalEmbedder({
model: "Xenova/e5-small-v2",
queryPrefix: "query: ",
docPrefix: "passage: ",
});

// nomic-embed-text-v1.5: 768d, 137MB; instruction-tuned.
new LocalEmbedder({
model: "nomic-ai/nomic-embed-text-v1.5",
dimension: 768,
queryPrefix: "search_query: ",
docPrefix: "search_document: ",
});
```

`MetadataChunker` wraps any base chunker and prepends
`[doc-id | topic | title]` to each chunk before embedding — the
"Doc2Query lite" pattern.

**Measured impact on the bundled 504-query eval (no API keys):**

```
HashEmbedder (default) NDCG@10 = 0.786
TfIdfEmbedder NDCG@10 = 0.825 (+0.039)
TfIdfEmbedder + MetadataChunker NDCG@10 = 0.848 (+0.062)
LocalEmbedder (all-MiniLM-L6-v2) NDCG@10 = 0.845 (+0.059)
LocalEmbedder + LocalReranker NDCG@10 = 0.877 (+0.091)
LocalEmbedder + LocalReranker + MetadataChunker NDCG@10 = 0.899 (+0.113)
LocalEmbedder + LocalReranker + MetadataChunker + stemmed BM25 NDCG@10 = 0.910 (+0.124)
```

Vector-strategy NDCG goes from 0.638 (HashEmbedder) to **0.922** with the
full local stack — the kind of jump you typically need a hosted API for.

### Doc2Query — synthetic-question expansion at index time

For each chunk, generate N questions the chunk could answer using a small
T5 model (`Xenova/LaMini-T5-61M`, ~24MB), then append them to the chunk's
content before embedding and BM25 indexing. Cost is paid once at index;
**zero query-time latency**. Works particularly well on conversational
queries against reference-style content (and on non-English chunks
indexed alongside English questions).

```ts
import { Augur, Doc2QueryChunker, SentenceChunker, MetadataChunker } from "@augur/core";

const augr = new Augur({
chunker: new Doc2QueryChunker({
base: new MetadataChunker({ base: new SentenceChunker() }),
numQueries: 3, // questions per chunk; more = better recall, longer index
model: "Xenova/LaMini-T5-61M",
}),
});
```

Requires `@huggingface/transformers` (same dep as LocalEmbedder).

### Query-aware hybrid weights

`Augur.search()` now picks the BM25-vs-vector mix per query from the
router's signals — quoted phrases / specific tokens / very short queries
lean BM25 (0.3-0.4 vector weight), long natural-language questions lean
vector (0.7), default is 0.5. Production hybrid systems all do some
version of this; a fixed 0.5/0.5 mix under-weights whichever side is
wrong for the current query shape.

No configuration needed — it's automatic when strategy = "hybrid".

### Stemmed BM25 (`InMemoryAdapter({ useStemming: true })`)

Turns on Porter stemming + English stopword filtering for the keyword
path. Same pipeline Lucene/Elasticsearch use by default. On the bundled
eval this single flag adds ~+0.022 NDCG@10 to *any* config and is the
biggest cheap win on quoted / named-entity / short keyword queries
(running ↔ runs, connection ↔ connections all collapse to one stem).

```ts
import { Augur, InMemoryAdapter, LocalEmbedder, LocalReranker } from "@augur/core";

const augr = new Augur({
adapter: new InMemoryAdapter({ useStemming: true }),
embedder: new LocalEmbedder(),
reranker: new LocalReranker(),
});
```

### MMR (Maximal Marginal Relevance) for diverse top-K

For ambiguous queries with multiple relevant docs, pure-relevance reranking
concentrates on near-duplicates. `MMRReranker` rebalances toward novelty:

```ts
import { CascadedReranker, LocalReranker, MMRReranker, Augur } from "@augur/core";

// Cross-encoder narrows by relevance; MMR diversifies the survivors.
const reranker = new CascadedReranker([
[new LocalReranker(), 50],
[new MMRReranker({ lambda: 0.7 }), 10],
]);

const augr = new Augur({ reranker, /* ... */ });
```

`λ = 1.0` is pure relevance; `λ = 0.7` is the standard "relevance with
diversity boost". Not enabled by default — on QA-style queries (one
relevant doc) MMR pushes hits out in favor of variety, which hurts. Reach
for it on multi-aspect queries, search-results pages, and RAG pipelines
where the LLM benefits from non-redundant context.

### Picking a reranker

The `Reranker` interface has five ready implementations:

```ts
import {
HeuristicReranker, // zero-dep, sub-ms; weak baseline
LocalReranker, // local ONNX cross-encoder (~22MB)
CohereReranker, // hosted cross-encoder, multilingual v3
JinaReranker, // hosted cross-encoder, multilingual v2
CascadedReranker, // chain rerankers: cheap-broad → expensive-narrow
} from "@augur/core";

// Cascaded rerank — heuristic narrows 100 → 50, cross-encoder narrows 50 → 10:
const cascade = new CascadedReranker([
[new HeuristicReranker(), 50],
[new LocalReranker(), 10],
]);
```

For any other cross-encoder hosted behind HTTP, implement the four-method
`Reranker` interface directly — it's about 30 lines. `HeuristicReranker`
is fine for a smoke test; cross-encoder rerankers are typically the
single biggest accuracy lever once embeddings are decent. Cascaded
reranking gives you both: cheap narrowing on a wide first pass,
expensive scoring only on the survivors.

---

## 6. Postgres with pgvector
Expand Down
68 changes: 64 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,12 +103,72 @@ Writing a new adapter is implementing five methods. See [`examples/custom-adapte

| You bring | Augur provides |
|-----------------------------------|--------------------------------------------------|
| Documents | Chunking (3 strategies) |
| (optional) An embedder + API key | A default `HashEmbedder` that runs offline |
| (optional) A vector DB | A default `InMemoryAdapter` |
| (optional) A reranker | A default `HeuristicReranker` + `CohereReranker` |
| Documents | Chunking (3 strategies + `MetadataChunker`, `Doc2QueryChunker` wrappers) |
| (optional) An embedder + API key | Offline: `HashEmbedder`, `TfIdfEmbedder`, `LocalEmbedder` (ONNX, no network). Hosted: `OpenAIEmbedder`, `GeminiEmbedder` |
| (optional) A vector DB | A default `InMemoryAdapter` (BM25 + brute-force vector + RRF hybrid) |
| (optional) A reranker | Offline: `HeuristicReranker`, `LocalReranker` (cross-encoder ONNX), `MMRReranker` (diversity). Hosted: `CohereReranker`, `JinaReranker`. Plus `CascadedReranker` for staged pipelines. |
| Nothing | Routing, hybrid fusion, traces, dashboard, HTTP API |

## Evaluation

Augur ships a built-in eval harness (**182 docs, 504 labeled queries**
across 12 archetypes — factoid, procedural, definitional, code,
error_code, quoted, short_kw, named_entity, negation, non_english,
ambiguous, internal). The corpus covers Postgres, Kubernetes, Redis,
networking, ML/AI, security/compliance, code snippets, company-internal
runbooks/policies, and 12 foreign languages (es, ja, fr, de, zh, ko, pt,
ru, ar, hi, it, vi). Metrics: NDCG@10, MRR, Recall@10 — overall, per
category, per router-chosen strategy.

```bash
pnpm eval # default config
pnpm eval -- --verbose # per-query lines
pnpm eval -- --save baseline.json # snapshot metrics
pnpm eval -- --compare baseline.json # diff vs snapshot
pnpm eval -- --embedder tfidf # swap to TfIdfEmbedder (offline, no deps)
pnpm eval -- --embedder local # offline ONNX (Xenova/all-MiniLM-L6-v2, ~22MB)
pnpm eval -- --embedder local --reranker local # + cross-encoder reranker (~22MB)
pnpm eval -- --embedder local --reranker local --metadata-chunker # full local stack
pnpm eval -- --embedder local --reranker local --metadata-chunker --bm25-stem # best (0.910 NDCG@10)
pnpm eval -- --embedder local --reranker local --mmr --mmr-lambda 0.7 # diversity-aware top-K
pnpm eval -- --embedder gemini --gemini-cache-dir .cache/gemini # Gemini API w/ disk cache
```

### Reference numbers (no API keys, no network)

Measured on the bundled 504-query / 182-doc eval. **All numbers below are
real, locally reproducible runs** — no remote APIs touched.

| Config | NDCG@10 | MRR | Recall@10 |
| ----------------------------------------------------------------------------------------------- | ------: | -----: | --------: |
| `HashEmbedder` (default placeholder, not semantic) | 0.786 | 0.782 | 0.857 |
| `TfIdfEmbedder` | 0.825 | 0.816 | 0.906 |
| `TfIdfEmbedder` + `MetadataChunker` | 0.848 | 0.839 | 0.923 |
| `LocalEmbedder` (Xenova/all-MiniLM-L6-v2) | 0.845 | 0.835 | 0.924 |
| `LocalEmbedder` + `LocalReranker` (ms-marco-MiniLM cross-encoder) | 0.877 | 0.871 | 0.932 |
| `LocalEmbedder` + `LocalReranker` + `MetadataChunker` | 0.899 | 0.896 | 0.943 |
| `LocalEmbedder` + `LocalReranker` + `MetadataChunker` + stemmed BM25 (`useStemming`) | **0.910** | **0.907** | **0.956** |

The best row uses ~44MB of on-device ONNX models, no network at query
time, and beats the HashEmbedder default by **+12.4% NDCG@10**, with
vector-strategy NDCG going from **0.638 → 0.922 (+28.4%)** and keyword
from 0.874 → 0.925 (+5.1%, almost entirely from Porter stemming).

Hosted production embedders (Cohere v3, OpenAI text-embedding-3, Voyage)
typically lift another 5-10% on top of all-MiniLM-L6-v2. The harness is
a pure function of the `Augur` instance, so swap the embedder, adapter,
router, or reranker between runs to measure the impact of any change.

### MMR for diverse top-K (opt-in)

`MMRReranker` implements Maximal Marginal Relevance — useful when queries
have multiple distinct relevant docs and you want the top-K to span them
rather than concentrate on near-duplicates. **Not on by default**: on the
bundled QA-style eval where most queries have 1 relevant doc, MMR pushes
hits out of top-10 in favor of diversity (NDCG drops ~0.04). Reach for it
on multi-aspect queries, recommendation feeds, and RAG pipelines where
the LLM benefits from non-redundant context. See [EXAMPLES §5](EXAMPLES.md#5-switching-to-openai--pinecone) for wiring.

## Status

This is a v0.1 MVP under active development. It is small enough to read end-to-end in an afternoon and useful enough to point at a real RAG project tomorrow. Issues, ideas, and PRs welcome — see [DEVELOPMENT_GUIDE.md](./DEVELOPMENT_GUIDE.md).
Expand Down
5 changes: 4 additions & 1 deletion apps/dashboard/app/page.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -87,8 +87,11 @@ export default function PlaygroundPage() {
Latency budget (ms)
<input
type="number"
min={0}
value={budget}
onChange={(e) => setBudget(e.target.value === "" ? "" : Number(e.target.value))}
onChange={(e) =>
setBudget(e.target.value === "" ? "" : Math.max(0, Number(e.target.value)))
}
className="mt-1 w-full bg-ink-800 border border-ink-700 rounded px-2 py-1.5"
/>
</label>
Expand Down
Loading
Loading