Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSFive to support VBZ compression #26

Closed
MrTeej opened this issue Dec 20, 2021 · 10 comments
Closed

JSFive to support VBZ compression #26

MrTeej opened this issue Dec 20, 2021 · 10 comments

Comments

@MrTeej
Copy link

MrTeej commented Dec 20, 2021

Part of our project at Oxford Nanopore Tech, we need to read fast5 files that are VBZ compressed. We would like to extend the JSFive Project to support VBZ compression.

Any support/guidance would be appreciated please, thank you.

@bmaranville
Copy link
Member

I think it would be straightforward to add a very lightweight plugin mechanism for filters, where users can add a new function with corresponding filter_id to a "user_filters" object, which would then get called in btree.js when running chunks through the data filter pipeline. Then applications that use jsfive would import both the jsfive and VBZ libraries, and register the vbz decoder with jsfive before opening VBZ-compressed files.

@bmaranville
Copy link
Member

See the separate_filters branch - I moved all the filters to a map that is exported, so that you can add new filters to the list with hdf5.Filters.set(filter_id, function my_filter(buf: ArrayBuffer) => ArrayBuffer)) where my_filter should act on the ArrayBuffer for each chunk.

@MrTeej
Copy link
Author

MrTeej commented Jan 19, 2022

Thanks for this, we're currently investigating implementing this into our code. We will let you know how we get on.

@MrTeej
Copy link
Author

MrTeej commented Jan 20, 2022

Hi @bmaranville could you please confirm how the chunking works inside JSFive? - Would this send an entire read as a chunk or parts of a read?
As the VBZ decompression code we have checks for the output size in the header section of a chunk passed into the function.

@bmaranville
Copy link
Member

bmaranville commented Jan 20, 2022

Sure - in HDF5 the filter pipeline operates on individual chunks, which are defined when the dataset is created and information about chunk size is encoded in the file for that dataset. In jsfive the filter functions are called with two arguments: the chunk ArrayBuffer, and the byte size of an individual data element (this is needed for the HDF5 predefined shuffle filter). The filters are applied sequentially to a chunk in the order they are listed in the filter pipeline, and later the data buffer is constructed by concatenating the filtered chunks.

The size of the chunk's buffer is not passed to the filter functions, but is available through inspection (buf.byteLength). The output buffer is usually not the same length as the input buffer.

Here is the typescript signature of a filter function:

type FilterFn = {
  (buf: ArrayBuffer, itemsize: number): ArrayBuffer;
};

@MrTeej
Copy link
Author

MrTeej commented Jan 27, 2022

Hi thanks for this we've managed to get it working with our vbz decompression method but with hardcoded compression options.

Is there a way we can view the compression options on a particular read so that we can feed this into our a decompression method?

@bmaranville
Copy link
Member

There is an array of "client_data" encoded in the filter settings. I think it needs to be passed to the filter functions! So far none of the filters I'd seen used that but I think VBZ might...

I added it to the end of the function call, so if you pull the latest version from the separate_filters branch, your filter should now be called on each chunk as (where client_data is an array of integers):

      if (Filters.has(filter_id)) {
        buf = Filters.get(filter_id)(buf, itemsize, client_data);
      }

@MrTeej
Copy link
Author

MrTeej commented Jan 28, 2022

That's brilliant thank you, we believe everything is working, would it be possible to publish the latest changes to NPM when you're ready please.

@bmaranville
Copy link
Member

I will merge and publish a new version soon.

@bmaranville
Copy link
Member

Version 0.3.8 was published just now, with support for user filters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants