In-memory full-text search for Node.js — a fork of MiniSearch by Luca Ongaro, extended for production serving: smaller indexes, faster loads, and a read-only fast path.
Current release:
8.1.0· install withnpm install @yoch/minisearch
MiniSearch is excellent for building and querying an index in JavaScript. This fork keeps that API for mutable indexing, and adds FrozenMiniSearch for when the index is built once and queried many times:
Mutable MiniSearch |
FrozenMiniSearch |
|
|---|---|---|
| Use when | Documents change (add, remove, discard) |
Corpus is fixed, or you reload from disk |
| Memory | Maps and nested objects per posting | Flat Uint32Array / Uint8Array postings |
| On disk | toJSON / loadJSON |
saveBinary / loadBinary (MSv4 / MSv3) |
| Typical search | Baseline | Often ~20–35% faster p50 on the same corpus (see benchmarks) |
Same BM25 scoring, prefix/fuzzy search, autoSuggest, and query combinators — frozen indexes aim for search ranking parity with addAll + freeze() when built with the same options. Term frequencies are stored as Uint8 (max 255 per document/field); extreme repetition can cause a small score drift versus the mutable index.
npm install @yoch/minisearch
# pre-releases: npm install @yoch/minisearch@betaOne-shot frozen index (no mutable step):
import { FrozenMiniSearch } from '@yoch/minisearch'
const options = { fields: ['title', 'text'], storeFields: ['title'] }
const index = FrozenMiniSearch.fromDocuments(documents, options)
index.search('ishmael', { prefix: true })
index.autoSuggest('zen')
// Persist and reload
const buf = index.saveBinary()
const loaded = FrozenMiniSearch.loadBinary(buf, options)Mutable index, then freeze (incremental build):
import MiniSearch, { FrozenMiniSearch } from '@yoch/minisearch'
const ms = new MiniSearch({ fields: ['title', 'text'] })
ms.addAll(documents)
const frozen = ms.freeze() // immutable snapshot
const buf = frozen.saveBinary()// ESM
import MiniSearch, { FrozenMiniSearch, buildFrozenFromDocuments } from '@yoch/minisearch'
// CommonJS
const MiniSearch = require('@yoch/minisearch')
const { FrozenMiniSearch } = require('@yoch/minisearch')| Goal | API |
|---|---|
| Live index that changes over time | MiniSearch → freeze() when you need read-only serving |
| Fixed corpus, build frozen directly | FrozenMiniSearch.fromDocuments(documents, options) |
Build doc-by-doc (no documents[] buffer) |
createFrozenIndexBuilder(options) → .add(doc) → freezeFrozenIndexBuilder(builder) |
| Async stream of documents | FrozenMiniSearch.fromAsyncIterable(iterable, options) |
| Load a snapshot from disk | FrozenMiniSearch.loadBinary(buffer, options) |
| Custom assembly pipeline | buildFrozenFromDocuments, assembleFrozen, freezeFromMiniSearch |
fromDocuments matches new MiniSearch(opts).addAll(docs).freeze() for search ranking on the same corpus and options (fields, tokenize, processTerm, …). Frozen indexes do not support add / remove.
External corpus (e.g. lookup by id after search): keep full rows in your own store (dataCache, DB, etc.) and use minimal storeFields (often ['id'] only) so the frozen index does not duplicate payload text:
import { createFrozenIndexBuilder, freezeFrozenIndexBuilder } from '@yoch/minisearch'
function buildFrozenIndexFromRows (rows, options) {
const builder = createFrozenIndexBuilder(options, {
estimatedDocumentCount: rows.length
})
for (let i = 0; i < rows.length; i++) {
builder.add(buildIndexDocument(rows[i], i))
}
return freezeFrozenIndexBuilder(builder)
}
// After search: enrich from your store — frozen.getStoredFields(res.id) or dataCache[type][res.id]Async stream (no intermediate array; documents are indexed as they arrive):
import { createReadStream } from 'node:fs'
import { parse } from 'csv-parse'
import { FrozenMiniSearch } from '@yoch/minisearch'
async function buildFromCsv (path, options) {
async function * documents () {
const parser = createReadStream(path).pipe(parse({ columns: true }))
for await (const row of parser) {
yield { id: row.cis, denomination: row.denomination, /* … */ }
}
}
return FrozenMiniSearch.fromAsyncIterable(documents(), options)
}For a sync iterable (for...of on an array or generator), use the builder directly:
import { createFrozenIndexBuilder, freezeFrozenIndexBuilder } from '@yoch/minisearch'
const builder = createFrozenIndexBuilder(options)
for (const doc of documentGenerator()) {
builder.add(doc)
}
const frozen = freezeFrozenIndexBuilder(builder)estimatedDocumentCount in the second argument to createFrozenIndexBuilder pre-allocates
per-document arrays when the final size is known; internal buffers are trimmed to the actual
count on freeze if the hint was too large.
freeze()— snapshot a mutable index into compact typed postings + a radix tree keyed by term index.fromDocuments()— build that structure in one pass (skips nestedMappostings and radix cloning at freeze time).createFrozenIndexBuilder()— same output without a temporarydocuments[]array; finalize withfreezeFrozenIndexBuilder(builder)(orassembleFrozen(builder.freezeParams())for custom assembly).fromAsyncIterable()— async document stream (e.g. CSV parser) into a frozen index; equivalent to builder +for await+freezeFrozenIndexBuilder.saveBinary()/loadBinary()— MSv4 (sparse multi-field, Uint16 doc ids when possible) or MSv3 (single-field dense, Uint32 doc ids). MSv1/MSv2 are not supported — re-save older snapshots. Field names are stored in the snapshot;fieldsinloadBinaryoptions is optional (if provided, it must match exactly). Customtokenize/processTermare not stored — pass the same functions at load time if you customized them.storeFieldsdata is embedded in the snapshot.- Term frequencies — stored as
Uint8(max 255 per doc/term); only affects scores for extreme term repetition. frozenMemoryBreakdown()— introspect postings, radix tree, and stored-field footprint (estimates only; not exact heap accounting).
Mutable index → frozen: prefer a fixed corpus. If you used discard() on a MiniSearch index, run vacuum() before freeze() to shrink the snapshot; search parity is still expected without vacuum, but the binary may retain sparse slots.
Advanced API (assembleFrozen, freezeFromMiniSearch, FrozenIndexBuilder) is for custom pipelines — most apps should use fromDocuments, freeze(), or the builder helpers above.
Advanced exports:
import {
FrozenMiniSearch,
createFrozenIndexBuilder,
freezeFrozenIndexBuilder,
FrozenIndexBuilder,
type FrozenIndexBuilderHints,
buildFrozenFromDocuments,
assembleFrozen,
freezeFromMiniSearch,
frozenMemoryBreakdown
} from '@yoch/minisearch'Full upstream-style API: field boosts, fuzzy/prefix, nested queries, AND / OR / AND_NOT, filters, autoSuggest, vacuum after discard, etc.
import MiniSearch from '@yoch/minisearch'
const miniSearch = new MiniSearch({ fields: ['title', 'text'] })
miniSearch.addAll(documents)
miniSearch.search('zen art motorcycle')TypeScript definitions: dist/es/index.d.ts.
| Area | Change | Effect |
|---|---|---|
| Format | MSv3 replaces MSv1/MSv2 (breaking) | CRC32 payload check; binary field names, ids, stored fields, term tree |
| Binary load | Structural validation in decodeFrozenSnapshot / validateFrozenSnapshot |
Corrupt snapshots fail fast with Invalid frozen index: … |
loadBinary |
fields optional (embedded in snapshot); if provided, must match exactly |
Simpler reload; no silent field subset |
saveBinary |
Single pre-allocated buffer | Lower peak memory while serializing |
| Search | Per-query cache for fieldTermDataFor(termIndex) |
Fewer allocations on prefix/fuzzy queries |
Measure regressions with benchmarks/ (freezeMs, saveBinary, loadBinary, search p50, heap frozen).
| Priority | Topic | Idea | Trade-off |
|---|---|---|---|
| Format | Term dictionary | Drop runtime _terms[] duplicate at rest |
Saves heap; more complex save path |
| API | loadBinaryAsync |
Chunked/async load like loadJSONAsync |
Better cold start on huge indexes |
| API | Input types | Accept Uint8Array as well as Buffer on loadBinary |
Broader runtime support |
| Build | freeze / builder |
One-pass posting flatten with size estimate | Faster freeze on very large corpora |
| Search | Wildcard | Iterate only active document slots after dense remap | Faster wildcard after many discards |
| Search | Hot path | Direct subarray posting access in aggregateTerm |
Lower GC; invasive |
Intentionally deferred: embedding tokenize / processTerm in the snapshot. Raising the Uint8 term-frequency cap needs a new postings encoding.
For contributor-oriented notes, see DESIGN_DOCUMENT.md — FrozenMiniSearch.
Reproducible comparisons (heap, load time, search latency) live under benchmarks/:
npm run benchmark:compare # terminal report
npm run benchmark:diff # vs versioned baselinenpm install
npm test
npm run buildUse npm run for scripts (Yarn 1.x on Node 22 prints url.parse deprecation noise when invoking yarn test / yarn build).
Publish stable (updates npm latest):
npm run release:stablePublish a pre-release (dist-tag beta only):
npm run release:betaRequirements: Node.js ES2018+. No browser UMD/CDN build in this fork (Node-only ESM + CJS).
See CHANGELOG.md.
- MiniSearch — Luca Ongaro (MIT)
- This fork — yoch/minisearch:
FrozenMiniSearch, MSv4/MSv3 binary snapshots, shared scoring refactor
Upstream docs: MiniSearch site · intro article