-
-
Notifications
You must be signed in to change notification settings - Fork 53
Concepts
This page explains the ideas behind stream-json — the problem it solves, how it solves it, and the handful of concepts everything else builds on. If you would rather learn by reading code, Intro by examples shows the same ideas as runnable pipelines.
JSON.parse handles JSON in one line — as long as the whole document fits in memory. The cases stream-json is built for are the ones where it does not: multi-gigabyte API responses, database dumps with millions of records, log exports, anything generated continuously or simply larger than RAM. JSON.parse reads every byte and builds the entire value before you can touch any of it; on a 10 GB file that is 10 GB of heap (usually several times that), or an out-of-memory crash.
And usually you do not want the whole thing in memory at once anyway. There are two ways around that, and they combine: take only a piece of the document (say, the summary object at the end of a huge data dump), or read it piece by piece (one record at a time from a huge array). Either way you hold only what fits in memory.
You get a stream of your objects out of a JSON source too big to hold at once — one at a time, taking only the part you want. You write a short pipeline (a source, a parser(), an optional pick(), and a streamer), then run your own code on each object it hands you:
file -> parser() -> pick()? -> streamArray() / streamValues() -> your objects
You think about which parts of the document you want. The library reads the bytes, parses, assembles objects, and keeps memory flat. (Internally it works on a stream of tokens, but you rarely deal with tokens directly — see the token foundation.)
You assemble these stages with chain() — a small factory that builds a stream-like processing pipeline from a list of stages. It comes from stream-chain, the library stream-json is built on, so you already have it (import chain from 'stream-chain').
A streamer turns the parser's token stream into a stream of objects, one top-level item at a time, so nothing larger than a single item is ever in memory. Each item arrives as {key, value} — your code reads value, and key (the array index or property name) when you need it.
-
StreamArray — the elements of one big top-level array. The everyday case: a JSON document is usually an envelope — some metadata plus the real payload under a property like
data— so you pick that property andstreamArray()its elements. By far the most common pipeline. -
StreamValues — a sequence of separate values, used mostly after a pick that matches several subobjects, to hand you each one as an isolated object. (It also reads JSON Streaming input when the parser runs in
jsonStreamingmode.) - StreamObject — the properties of one big top-level object. For the occasional document whose top level is a huge object rather than an array.
Streamers are built on the Assembler, which consumes the token stream and rebuilds a complete JavaScript value — a streaming JSON.parse. Use it directly when the whole value fits in memory, or — more often — to control how objects are built: drive the Assembler from your own stage reading the token stream, assembling into richer types than plain objects and arrays (custom containers, class instances) JSON cannot represent natively. It is advanced machinery, but the first advanced tool most users reach for, ahead of the filters. FlexAssembler is the ready-made version for custom containers (a Map or Set at chosen paths).
Every streamer takes an objectFilter — a predicate checked during assembly. Reject an object and it is dropped before it is finished assembling, saving that work. You decide from the part built so far (for example a type field near the top of each record) and can stay undecided until the field you need arrives. No tokens involved — a per-object performance lever; see StreamBase and Performance.
Often the data you want is one member of a larger wrapper object. pick keeps a matching subobject and drops the rest, selecting by path:
chain([source, parser(), pick({filter: 'data'}), streamArray()]);The other filters — Replace, Ignore, Filter — edit the document in more advanced ways; most pipelines only need pick.
In most cases the source is a file. The simplest path is parseFile, a file-edge component that reads and parses in one stage; or use fs.createReadStream(path) followed by parser(). Any stream works as a source — an HTTP response, a socket, your own generator — but a local file is the common case.
You never load the whole document. stream-json is built on stream-chain, which handles backpressure for you, so memory stays flat whether the file is 10 KB or 10 TB. Nothing to configure. This holds even with the file-edge components (parseFile / stringerToFile) at both ends and no stream in between: they read and write fixed-size blocks on demand, so the source never outruns the sink.
-
JSON Streaming — several JSON values in one stream, separated only syntactically (
[]{}1 2 3"z"is six values).parser({jsonStreaming: true})+ StreamValues reads it — JSONL included, since JSONL is a subset.-
JSONL / NDJSON — one value per line, ~99% of real-world dumps; because each line is a complete value, a dedicated splitter can
JSON.parselines natively, much faster than tokenizing — so for JSONL prefer that component, now in stream-chain.
-
JSONL / NDJSON — one value per line, ~99% of real-world dumps; because each line is a complete value, a dedicated splitter can
- JSONC (JSON with Comments) — a parser, stringer, and verifier that preserve comments and formatting for byte-faithful edits.
For browser bundles import from stream-json/web/...; for embedding without stream adapters, stream-json/core/.... (ESM; see Supported runtimes.)
Everything above rides on a layer you usually do not see: a token stream. The parser turns JSON text into a flat sequence of SAX-style events — startObject, keyValue, stringChunk, numberValue, and so on — and every component operates on that sequence. You reach this level only when you want more than the streamers give, and most pipelines never do; but it is worth knowing what is underneath, because it sets what the library can and cannot do.
You cannot edit an object inside a JSON file, but you can edit the token stream. JSON is hierarchical and the parser emits balanced tokens, so every value is a self-contained run — a startObject … endObject and everything between. That structure is what lets the filters select by path: pick keeps a matching subobject and ignore drops one, by where it sits in the document. Ignoring a single property (a key and its value) is more involved but works the same way. pick and ignore cover most selection; for harder cases replace substitutes a matched value with one you compute, and filter selects by an arbitrary predicate while keeping the surrounding shape.
Selecting at the token level is how you avoid building. Assembling a value into a JavaScript object costs memory, and a value larger than RAM cannot be built at all. Because the filters drop sub-streams as tokens, the parts you reject are never assembled — you skip the cost entirely. (When you do assemble, objectFilter drops objects mid-build, as above.)
The catch: it is one forward pass, in bounded memory. Anything that needs the whole document at once is out. You cannot globally re-sort or re-arrange a file the way XSLT restructures a tree — that would mean holding it all. The token tools rewrite each value as it flows past; they do not move data across the stream.
Tokens flow both ways. The Disassembler turns a JavaScript value into a token stream, and the Stringer turns a token stream back into JSON text. With the parser and filters, that closes the loop: read a huge file, edit it as tokens, and write it back out — never holding more than a window of it.
That is the whole foundation: text becomes tokens; tokens are filtered, assembled, generated, and serialized; memory stays flat throughout. The streamers, the Assembler, the filters, the stringer — every part is an operation on one stream. You already have it; the Parser page has the full token vocabulary when you need it.
- Intro by examples — the pipelines above, runnable.
- Recipes — ready-made solutions to common tasks.
- Performance — getting the most out of it.
- Parser — the token layer in full.
Start here
Core
Filters
Streamers
Essentials
Utilities
File I/O (Node-only)
JSONC
JSONL (use stream-chain)
Reference
Built on stream-chain