JSFive to support VBZ compression #26

MrTeej · 2021-12-20T10:35:48Z

Part of our project at Oxford Nanopore Tech, we need to read fast5 files that are VBZ compressed. We would like to extend the JSFive Project to support VBZ compression.

Any support/guidance would be appreciated please, thank you.

bmaranville · 2021-12-20T14:13:54Z

I think it would be straightforward to add a very lightweight plugin mechanism for filters, where users can add a new function with corresponding filter_id to a "user_filters" object, which would then get called in btree.js when running chunks through the data filter pipeline. Then applications that use jsfive would import both the jsfive and VBZ libraries, and register the vbz decoder with jsfive before opening VBZ-compressed files.

bmaranville · 2021-12-20T14:58:55Z

See the separate_filters branch - I moved all the filters to a map that is exported, so that you can add new filters to the list with hdf5.Filters.set(filter_id, function my_filter(buf: ArrayBuffer) => ArrayBuffer)) where my_filter should act on the ArrayBuffer for each chunk.

MrTeej · 2022-01-19T11:59:54Z

Thanks for this, we're currently investigating implementing this into our code. We will let you know how we get on.

MrTeej · 2022-01-20T11:11:00Z

Hi @bmaranville could you please confirm how the chunking works inside JSFive? - Would this send an entire read as a chunk or parts of a read?
As the VBZ decompression code we have checks for the output size in the header section of a chunk passed into the function.

bmaranville · 2022-01-20T13:45:17Z

Sure - in HDF5 the filter pipeline operates on individual chunks, which are defined when the dataset is created and information about chunk size is encoded in the file for that dataset. In jsfive the filter functions are called with two arguments: the chunk ArrayBuffer, and the byte size of an individual data element (this is needed for the HDF5 predefined shuffle filter). The filters are applied sequentially to a chunk in the order they are listed in the filter pipeline, and later the data buffer is constructed by concatenating the filtered chunks.

The size of the chunk's buffer is not passed to the filter functions, but is available through inspection (buf.byteLength). The output buffer is usually not the same length as the input buffer.

Here is the typescript signature of a filter function:

type FilterFn = {
  (buf: ArrayBuffer, itemsize: number): ArrayBuffer;
};

MrTeej · 2022-01-27T11:39:31Z

Hi thanks for this we've managed to get it working with our vbz decompression method but with hardcoded compression options.

Is there a way we can view the compression options on a particular read so that we can feed this into our a decompression method?

bmaranville · 2022-01-27T12:53:34Z

There is an array of "client_data" encoded in the filter settings. I think it needs to be passed to the filter functions! So far none of the filters I'd seen used that but I think VBZ might...

I added it to the end of the function call, so if you pull the latest version from the separate_filters branch, your filter should now be called on each chunk as (where client_data is an array of integers):

      if (Filters.has(filter_id)) {
        buf = Filters.get(filter_id)(buf, itemsize, client_data);
      }

MrTeej · 2022-01-28T10:38:24Z

That's brilliant thank you, we believe everything is working, would it be possible to publish the latest changes to NPM when you're ready please.

bmaranville · 2022-01-28T12:43:39Z

I will merge and publish a new version soon.

bmaranville · 2022-01-28T17:09:25Z

Version 0.3.8 was published just now, with support for user filters.

bmaranville closed this as completed Jan 28, 2022

anddigital-jh mentioned this issue Oct 17, 2022

Range Error with Compressed Files #35

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSFive to support VBZ compression #26

JSFive to support VBZ compression #26

MrTeej commented Dec 20, 2021

bmaranville commented Dec 20, 2021

bmaranville commented Dec 20, 2021

MrTeej commented Jan 19, 2022

MrTeej commented Jan 20, 2022

bmaranville commented Jan 20, 2022 •

edited

Loading

MrTeej commented Jan 27, 2022

bmaranville commented Jan 27, 2022

MrTeej commented Jan 28, 2022

bmaranville commented Jan 28, 2022

bmaranville commented Jan 28, 2022

JSFive to support VBZ compression #26

JSFive to support VBZ compression #26

Comments

MrTeej commented Dec 20, 2021

bmaranville commented Dec 20, 2021

bmaranville commented Dec 20, 2021

MrTeej commented Jan 19, 2022

MrTeej commented Jan 20, 2022

bmaranville commented Jan 20, 2022 • edited Loading

MrTeej commented Jan 27, 2022

bmaranville commented Jan 27, 2022

MrTeej commented Jan 28, 2022

bmaranville commented Jan 28, 2022

bmaranville commented Jan 28, 2022

bmaranville commented Jan 20, 2022 •

edited

Loading