Skip to content
Eugene Lazutkin edited this page Jun 21, 2018 · 3 revisions

This is the workhorse of the package. It is a Transform stream, which consumes text, and produces a stream of data items corresponding to high-level tokens. It is always the first in a pipe chain being directly fed with a text from a file, a socket, the standard input, or any other text stream.

Its Writable part operates in a buffer/text mode, while its Readable part operates in an objectMode.

This Parser was modeled after stream-json's Parser. Its options and the behavior is compatible.

Introduction

The simple example (streaming from a file):

const Parser = require('stream-csv-as-json/Parser');
const parser = new Parser();

const fs = require('fs');

let rowCounter = 0;
parser.on('data', data => data.name === 'startArray' && ++rowCounter);
parser.on('end', console.log(`Found ${rowCounter} rows.`));

fs.createReadStream('sample.csv').pipe(parser);

The alternative example:

const {parser} = require('stream-csv-as-json/Parser');
const fs = require('fs');

const pipeline = fs.createReadStream('sample.csv').pipe(parser());

let rowCounter = 0;
pipeline.on('data', data => data.name === 'startArray' && ++rowCounter);
pipeline.on('end', console.log(`Found ${rowCounter} rows.`));

API

The module returns the constructor of Parser. Being a stream Parser doesn't have any special interfaces. The only thing required is to configure it during construction.

Parser produces a rigid stream of tokens, which order is strictly defined. It is impossible to get an item out of sequence. All data items (strings, numbers, even object keys) are streamed in chunks and potentially they can be of any size: gigabytes, terabytes, and so on.

In many real cases, while files are huge, individual data items can fit into memory. It is better to work with them as a whole, so they can be inspected. In that case, Parser can optionally pack items efficiently.

The details of the stream of tokens are described later.

constructor(options)

options is an optional object described in details in node.js' Stream documentation. Additionally, the following custom flags are recognized, which can be truthy or falsy:

  • Packing options control packing values. They have no default values.
    • packValues serves as the initial value for packing strings. It is here mostly for consistency with stream-json.
    • packStrings specifies, if we need to pack strings and send them as a value.
    • More details in the section below.
  • Streaming options control sending unpacked values. They have no default values. It is here mostly for consistency with stream-json.
    • streamValues serves as the initial value for streaming strings. It is here for consistency with stream-json.
    • streamStrings specifies, if we need to send items related to unpacked strings.
    • More details in the section below.

By default, Parser follows the CSV format (RFC 4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files), streams all values by chunks and individual (packed) values.

Stream of tokens

The stream of tokens is defined in stream-json's Parser documentation. The CSV parser uses a subset of those tokens:

  • startArray indicates that a row is started.
  • startString indicates that a value is about to start.
  • stringChunk carries a piece of a value as a string. 0 or more chunks is allowed.
  • endString indicates that a value is finished.
  • stringValue is a total value of a string that precedes it.
  • endArray indicates that a row has finished.

In short:

  • a CSV file may have 0 or more rows,
  • each row is represented as an array of string values,
  • each value is a string:
    • If streaming values is on, it is always startString, then 0 or more stringChunk tokens, then endString.
    • If packing is on, stringValue is sent out.

Some CSV files come with a header, which lists column names. In this case, it is possible to translate arrays to objects with corresponding column names as keys. AsObjects is the filter, which does it.

Packing options

Parser can pack strings. It should be on only when we know that individual values can fit into memory.

Internally a value packing is controlled by a flag:

  • By default, this flag is true.
  • If packValues is set, it is assigned to each flag. This option is here for consistency with stream-json.
  • If packString is set, it is assigned to the flag.

Examples:

Supplied options packStrings
{} true
{packValues: false} false
{packValues: false, packStrings: true} true
{packStrings: true, packValues: false} true
{packStrings: false} false

Streaming options

Parser can optionally skip streaming strings for optimization purposes, if a corresponding packing option is enabled. It means that only three configurations are supported for values and keys:

  • The default: startString, 0 or more stringChunk, endString, stringValue.
  • packStrings is false: startString, 0 or more stringChunk, endString.
  • packStrings is true, streamStrings is false: stringValue.

Internally a value streaming is controlled by a flag:

  • By default, this flag is true.
  • If streamValues is set, it is assigned to each flag. This option is here for consistency with stream-json.
  • If streamStrings is set, it is assigned to the flag.
  • If packStrings is false, it is set to true.

Examples:

Supplied options streamStrings
{} true
{packValues: true, streamValues: false} false
{packStrings: true, streamStrings: false} false
{packStrings: false, streamStrings: false} true
{packValues: true, streamValues: false, streamStrings: true} true
{streamStrings: false} false

Static methods and properties

parser(options) and make(options)

make() and parser() are two aliases of the factory function. It takes options described above, and return a new instance of Parser. parser() helps to reduce a boilerplate when creating data processing pipelines:

const {chain}  = require('stream-chain');
const {parser} = require('stream-csv-as-json/Parser');

const fs = require('fs');

const pipeline = chain([
  fs.createReadStream('sample.csv'),
  parser()
]);

let rowCounter = 0;
pipeline.on('data', data => data.name === 'startArray' && ++rowCounter);
pipeline.on('end', console.log(`Found ${rowCounter} rows.`));

make.Constructor

Constructor property of make() (and parser()) is set to Parser. It can be used for indirect creating of parsers or metaprogramming if needed.