parseFile

(Since 3.3.0) parseFile() is a Node-only, file-based input-edge stage for stream-json pipelines. Drop it at the head of a gen([…]) chain and drive the pipeline by passing the file path as the input value — parseFile opens the file, reads it block-by-block via fs/promises.open, decodes UTF‑8 with StringDecoder, and emits the standard {name, value} token stream downstream.

Introduction

import {gen} from 'stream-chain/core';
import {parseFile} from 'stream-json/file/parser.js';
import {pick} from 'stream-json/filters/pick.js';
import {streamArray} from 'stream-json/streamers/stream-array.js';
import {pipe} from 'stream-chain/utils/pipe.js';
import {drain} from 'stream-chain/utils/drain.js';

const c = pipe(parseFile(), pick({filter: 'items'}), streamArray(), ({value}) => console.log(value));
await drain(c('data.json'));

JSONC variant (handles comments and trailing commas):

import {parseFile} from 'stream-json/file/jsonc/parser.js';

How it works

parseFile(options) returns gen(asyncBlockReader(options), jsonParser(options)) — an fList that drops transparently into any gen([…]) / chain([…]). The asyncBlockReader is an async generator function: when called with a path, it opens a FileHandle, reads readBlockSize-sized blocks, decodes each through StringDecoder('utf8'), and yields the decoded strings. exec.next iterates the generator (one await per block, not per token); each yielded chunk feeds the existing jsonParser flushable, whose many(tokens) output is expanded synchronously through downstream stages.

`parseFile(options?)`

Combined parser and read-block options.

readBlockSize — read-block size in bytes. Default 65536 (64 KB). A memory/syscall knob — throughput-neutral in our benchmarks.
All parser() options pass through: packKeys, packStrings, packNumbers, streamKeys, streamStrings, streamNumbers, packValues, streamValues, jsonStreaming. See the Parser wiki for details.

Also exported as parser (matching the file name file/parser.js), so import {parser} from 'stream-json/file/parser.js' works too.

Why a starter, not a chain stage that streams chunks

A functional pipeline (gen(...)) doesn't auto-start — it's driven by an external loop or, equivalently, by passing an async generator as the input value: exec.next duck-types anything with .next and iterates it. parseFile() packages exactly that — an async block-yielding generator paired with the parser — into one composable factory. It mirrors how chain([createReadStream(path), parser()]) works on the Node-Duplex substrate, but stays in the pure-functional core: no Duplex boundaries in the middle of the pipe.

Driving with `pipe` and `drain`

gen returns an async generator function shaped (value) → AsyncGenerator<Token>; to run it on a value you have to drive AND flush:

pipe(...stages) returns a single-shot driver that constructs a fresh gen, calls it with the value, then sends none through to flush all flushable stages (notably stringerToFile's writer, which only closes the file on flush).
drain(asyncGen) consumes the resulting async generator and returns the last yielded value (or undefined if nothing was yielded). Perfect for sink-terminated pipelines.

import {pipe} from 'stream-chain/utils/pipe.js';
import {drain} from 'stream-chain/utils/drain.js';

await drain(pipe(parseFile() /* … */)('input.json'));

If you want to keep using the Node-stream substrate, the idiomatic chain([createReadStream(path), parser()]) still works — parseFile is an addition for the pure-functional path, not a replacement.

Performance

On the recorded fixture (Intel i3‑10110U, Node 26, ~100 KB JSON):

Realistic parse-with-work (bench/parse-count.js, counter stage inside the pipeline): pipe(parseFile(), counter) ≈ 9.4 ms vs idiomatic chain([createReadStream, parser()]) + on('data', counter) ≈ 15.8 ms — the new code is ~68% faster. The chain-base pays a per-token Node Duplex boundary on the external on('data'); the merged path keeps the sink inside the executor. chain([parseFile(), counter]) (chain executor wrapping the same stages) is within noise of the gen-driven form — the executor choice barely matters once the sink lives inside the pipeline.
Round-trip (parse → … → write) (bench/file-roundtrip.js): pipe(parseFile(), …, stringerToFile()) is roughly 1.6× faster than the idiomatic chain([createReadStream, parser(), …, stringer()]).pipe(createWriteStream), because the merged write side avoids the extra Node Duplex boundary between the stringer and the file.
Stress-test (unrealistic) — pipe(parseFile())(path) drained per-token by a for-await with no in-pipeline sink puts the gen async-bridge on the hot path and runs ~3.7× slower than chain-base. Real pipelines don't have this shape (you always have downstream work on tokens), so the case is documented but not on the recommended path.

Uh oh!

parseFile

Introduction

How it works

parseFile(options?)

Why a starter, not a chain stage that streams chunks

Driving with pipe and drain

Performance

Related

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stream-json

Clone this wiki locally

`parseFile(options?)`

Driving with `pipe` and `drain`