Skip to content
Eugene Lazutkin edited this page Jun 7, 2026 · 2 revisions

(Since 3.3.0) parseFile() is a Node-only, file-based input-edge stage for stream-json pipelines. Drop it at the head of a gen([…]) chain and drive the pipeline by passing the file path as the input value — parseFile opens the file, reads it block-by-block via fs/promises.open, decodes UTF‑8 with StringDecoder, and emits the standard {name, value} token stream downstream.

Introduction

import {gen} from 'stream-chain/core';
import {parseFile} from 'stream-json/file/parser.js';
import {pick} from 'stream-json/filters/pick.js';
import {streamArray} from 'stream-json/streamers/stream-array.js';
import {pipe} from 'stream-chain/utils/pipe.js';
import {drain} from 'stream-chain/utils/drain.js';

const c = pipe(parseFile(), pick({filter: 'items'}), streamArray(), ({value}) => console.log(value));
await drain(c('data.json'));

JSONC variant (handles comments and trailing commas):

import {parseFile} from 'stream-json/file/jsonc/parser.js';

How it works

parseFile(options) returns gen(asyncBlockReader(options), jsonParser(options)) — an fList that drops transparently into any gen([…]) / chain([…]). The asyncBlockReader is an async generator function: when called with a path, it opens a FileHandle, reads readBlockSize-sized blocks, decodes each through StringDecoder('utf8'), and yields the decoded strings. exec.next iterates the generator (one await per block, not per token); each yielded chunk feeds the existing jsonParser flushable, whose many(tokens) output is expanded synchronously through downstream stages.

parseFile(options?)

Combined parser and read-block options.

  • readBlockSize — read-block size in bytes. Default 65536 (64 KB). A memory/syscall knob — throughput-neutral in our benchmarks.
  • All parser() options pass through: packKeys, packStrings, packNumbers, streamKeys, streamStrings, streamNumbers, packValues, streamValues, jsonStreaming. See the Parser wiki for details.

Also exported as parser (matching the file name file/parser.js), so import {parser} from 'stream-json/file/parser.js' works too.

Why a starter, not a chain stage that streams chunks

A functional pipeline (gen(...)) doesn't auto-start — it's driven by an external loop or, equivalently, by passing an async generator as the input value: exec.next duck-types anything with .next and iterates it. parseFile() packages exactly that — an async block-yielding generator paired with the parser — into one composable factory. It mirrors how chain([createReadStream(path), parser()]) works on the Node-Duplex substrate, but stays in the pure-functional core: no Duplex boundaries in the middle of the pipe.

Driving with pipe and drain

gen returns an async generator function shaped (value) → AsyncGenerator<Token>; to run it on a value you have to drive AND flush:

  • pipe(...stages) returns a single-shot driver that constructs a fresh gen, calls it with the value, then sends none through to flush all flushable stages (notably stringerToFile's writer, which only closes the file on flush).
  • drain(asyncGen) consumes the resulting async generator and returns the last yielded value (or undefined if nothing was yielded). Perfect for sink-terminated pipelines.
import {pipe} from 'stream-chain/utils/pipe.js';
import {drain} from 'stream-chain/utils/drain.js';

await drain(pipe(parseFile() /* … */)('input.json'));

If you want to keep using the Node-stream substrate, the idiomatic chain([createReadStream(path), parser()]) still works — parseFile is an addition for the pure-functional path, not a replacement.

Performance

On the recorded fixture (Intel i3‑10110U, Node 26, ~100 KB JSON):

  • Realistic parse-with-work (bench/parse-count.js, counter stage inside the pipeline): pipe(parseFile(), counter) ≈ 9.4 ms vs idiomatic chain([createReadStream, parser()]) + on('data', counter) ≈ 15.8 ms — the new code is ~68% faster. The chain-base pays a per-token Node Duplex boundary on the external on('data'); the merged path keeps the sink inside the executor. chain([parseFile(), counter]) (chain executor wrapping the same stages) is within noise of the gen-driven form — the executor choice barely matters once the sink lives inside the pipeline.
  • Round-trip (parse → … → write) (bench/file-roundtrip.js): pipe(parseFile(), …, stringerToFile()) is roughly 1.6× faster than the idiomatic chain([createReadStream, parser(), …, stringer()]).pipe(createWriteStream), because the merged write side avoids the extra Node Duplex boundary between the stringer and the file.
  • Stress-test (unrealistic)pipe(parseFile())(path) drained per-token by a for-await with no in-pipeline sink puts the gen async-bridge on the hot path and runs ~3.7× slower than chain-base. Real pipelines don't have this shape (you always have downstream work on tokens), so the case is documented but not on the recommended path.

Related

  • stringerToFile — symmetric output-edge sink.
  • verifyFile — file validator with the same starter shape.
  • Parser — the underlying token-producing flushable.

Clone this wiki locally