-
-
Notifications
You must be signed in to change notification settings - Fork 53
Concepts
This page explains the ideas behind stream-json — the problem it solves, how it solves it, and the handful of concepts everything else builds on. If you would rather learn by reading code, Intro by examples shows the same ideas as runnable pipelines.
JSON.parse handles JSON in one line — as long as the whole document fits in memory. The cases stream-json is built for are the ones where it does not: multi-gigabyte API responses, database dumps with millions of records, log exports, anything generated continuously or simply larger than RAM. JSON.parse reads every byte and builds the entire value before you can touch any of it; on a 10 GB file that is 10 GB of heap (usually several times that), or an out-of-memory crash.
And usually you do not want the whole thing anyway. Frequently you want part of it reading it piece by piece. For example, reading a log you want individual records in the order they appear in the document.
You get a stream of your objects out of a JSON source too big to hold at once — one at a time, taking only the part you want. You write a short pipeline (a source, a parser(), an optional pick(), and a streamer), then run your own code on each object it hands you:
file -> parser() -> pick()? -> streamArray() / streamValues() -> your objects
You think about which parts of the document you want. The library reads the bytes, parses, assembles objects, and keeps memory flat. (Internally it works on a stream of tokens, but you rarely deal with tokens directly — see Advanced: the token stream.)
You assemble these stages with chain() — a small factory that builds a stream-like processing pipeline from a list of stages. It comes from stream-chain, the library stream-json is built on, so you already have it (import chain from 'stream-chain').
A streamer turns a stream of tokens produced by a parser from the document into a stream of objects, one top-level item at a time, so nothing larger than a single item is ever in memory. Each item arrives as {key, value} — your code reads value, and key (the array index or property name) when you need it.
-
StreamArray — the elements of one big top-level array. The everyday case: a JSON document is usually an envelope — some metadata plus the real payload under a property like
data— so you pick that property andstreamArray()its elements. By far the most common pipeline. - StreamValues — a sequence of separate values, used mostly after a pick that matches several subobjects, to hand you each one as an isolated object. (It also reads JSON Streaming input.)
- StreamObject — the properties of one big top-level object. For the occasional document whose top level is a huge object rather than an array.
When the whole value does fit in memory, use the Assembler instead of a streamer: it rebuilds the complete JavaScript value from the stream, like a streaming JSON.parse. Lower-level, but a capable workhorse — you can control how values are built to restore richer types than plain objects and arrays (custom containers, class instances) for data JSON cannot represent natively. FlexAssembler is the ready-made version: it assembles into custom containers (a Map or Set at chosen paths).
Every streamer takes an objectFilter — a predicate checked during assembly. Reject an object and it is dropped before it is finished assembling, saving that work. You decide from the part built so far (for example a type field near the top of each record) and can stay undecided until the field you need arrives. No token knowledge required — a free, modest, and underused performance win (see StreamBase and Performance).
Often the data you want is one member of a larger wrapper object. pick keeps a matching subobject and drops the rest, selecting by path:
chain([source, parser(), pick({filter: 'data'}), streamArray()]);The other filters — Replace, Ignore, Filter — edit the document in more advanced ways; most pipelines only need pick.
In most cases the source is a file. The simplest path is parseFile, a file-edge component that reads and parses in one stage; or use fs.createReadStream(path) followed by parser(). Any stream works as a source — an HTTP response, a socket, your own generator — but a local file is the common case.
You never load the whole document. stream-json is built on stream-chain, which handles backpressure for you, so memory stays flat whether the file is 10 KB or 10 TB. Nothing to configure. This holds even with the file-edge components (parseFile / stringerToFile) at both ends and no stream in between: they read and write fixed-size blocks on demand, so the source never outruns the sink.
To edit a document and write it out, run the stream through the Stringer, which turns tokens back into JSON text. To feed plain JavaScript values into such a pipeline, the Disassembler turns an object back into a stream. Together they let you rewrite a huge file without ever loading it.
-
JSON Streaming — several JSON values in one stream, separated only syntactically (
[]{}1 2 3"z"is six values).parser({jsonStreaming: true})+ StreamValues reads it — JSONL included, since JSONL is a subset.-
JSONL / NDJSON — one value per line, ~99% of real-world dumps; because each line is a complete value, a dedicated splitter can
JSON.parselines natively, much faster than tokenizing — so for JSONL prefer that component, now in stream-chain.
-
JSONL / NDJSON — one value per line, ~99% of real-world dumps; because each line is a complete value, a dedicated splitter can
- JSONC (JSON with Comments) — a parser, stringer, and verifier that preserve comments and formatting for byte-faithful edits.
For browser bundles import from stream-json/web/...; for embedding without stream adapters, stream-json/core/.... (ESM, Node 22+.)
Under the streamers, stream-json works on a token stream: the parser emits a SAX-style event for each piece of the document (startObject, keyValue, stringChunk, numberValue, …), and filters and streamers operate on those tokens. You go to this level only deliberately, to:
- edit JSON at the syntax level — rewrite, reformat, or transform tokens directly;
-
tune performance — e.g. drop the packed or streamed half of each value you do not use (
streamValues: false/packValues: false); - do something the streamers do not cover.
Most users never need any of this. The Parser page documents the token vocabulary and the options.
- Intro by examples — the pipelines above, runnable.
- Recipes — ready-made solutions to common tasks.
- Performance — getting the most out of it.
- Parser — the token layer in full.
Start here
Core
Filters
Streamers
Essentials
Utilities
File I/O (Node-only)
JSONC
JSONL (use stream-chain)
Reference
Built on stream-chain