Skip to content
Vitaly Tomilov edited this page Sep 28, 2022 · 59 revisions

Operator split offers a versatile way of partitioning an iterable object. It is an indispensable unit of logic, particularly when processing binary files as an iterable buffer, as it offers an easy and high-performance way of splitting any such buffer into blocks, so you can process it further, all through a single iteration.


  • the operator takes a predicate, to signal when values are to be split (default split logic);
  • it supports option toggle, so one signal from predicate starts the new selection, and the next one ends;

By default, split values themselves are skipped. But options carryStart and carryEnd can change that, to indicate if you want start or end split values carried back or forward. Note that in standard split mode, only carryEnd is used, while toggle mode uses both carryStart and carryEnd.

Examples

Default Split

The example below uses the default split logic, to split an array of numbers:

import {pipe, split} from 'iter-ops';

const data = [0, 1, 2, 0, 0, 3, 4, 5];

const i = pipe(
    data,
    split(v => v === 0) // split on value 0
);

console.log([...i]); //=> [ [], [ 1, 2 ], [], [ 3, 4, 5 ] ]

The output is, by design, consistent with the logic of String.split, where start/end and middle gaps produce empty elements. And since operator split always produces arrays of values, you get empty arrays for gaps.

If you do not want any such gaps, you can simply filter them out by length, as shown below:

const i = pipe(
    data,
    split(v => v === 0), // split on value 0
    filter(a => !!a.length) // skip empty arrays
);

console.log([...i]); //=> [ [ 1, 2 ], [ 3, 4, 5 ] ]

Toggle Split

In the default split scenario, we only know one value, by which we split the sequence. But when we know two split values - start + end, we need to use toggle logic instead, whereby one return of true from the predicate marks the beginning, and the next one marks the end of each block.

Below, let's consider 0 as the start of each block, and 1 as the end of each block...

const data = [0, 33, 22, 1, 77, 44, 0, 55, 88];

const i = pipe(
    data,
    split(v => v === 0 || v === 1, {toggle: true}), // toggle-split in blocks with border values 0 and 1
);

console.log([...i]); //=> [ [ 33, 22 ], [ 55, 88 ] ]

Above, we skipped [77, 44] as being outside any toggle block. And the last block [55, 88] was included, even though it didn't close, which is consistent with the general split logic.

Let's say, we want both 0 and 1 included into the same block, because they also represent valid block values:

const i = pipe(
    data,
    // toggle-split in blocks with border values 0 and 1,
    // plus carry each block start forward, and carry each block end back:
    split(v => v === 0 || v === 1, {toggle: true, carryStart: 1, carryEnd: -1}),
);

console.log([...i]); //=> [ [ 0, 33, 22, 1 ], [ 0, 55, 88 ] ]

See SplitValueCarry.

Paging

To further demonstrate the logic and flexibility of operator split, it lets you replicate operator page...

Splitting data into pages of fixed size, by calling page(size), is the same as:

  • splitting by internal list index, and carrying the split value forward:
    • split((_, index) => index.list >= size, {carryEnd: 1})
  • or, splitting by internal list index - 1, and carrying the split value back:
    • split((_, index) => index.list >= size - 1, {carryEnd: -1})

Afterword

Although the general split logic works only with single-value splits, it is possible to work-around it, and handle multi-value splits, by using parameter state (third parameter of the predicate = iteration session state), to buffer extra split values.

Clone this wiki locally