Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container does not open in h5wasm that opens in jsfive #6

Closed
alexpreynolds opened this issue Jan 22, 2022 · 5 comments
Closed

Container does not open in h5wasm that opens in jsfive #6

alexpreynolds opened this issue Jan 22, 2022 · 5 comments

Comments

@alexpreynolds
Copy link

I'm working on a React component that reads in data from an HDF5 container.

The container is readable with the jsfive library, but does not open correctly with the h5wasm library, where I get "invalid file name" errors.

I am using yarn add to add both libraries. Yarn adds v0.3.6 of jsfive and v0.1.8 of h5wasm.

My test container is generated with the following script, which creates the container from a very rudimentary UMAP clustering result: https://gist.github.com/alexpreynolds/8e3a29c75f2ff86fa922b7a092f5e299

For convenience, the container is also available here: https://somebits.io/data.h5

The relevant code for loading the container via jsfive is:

import * as hdf5 from 'jsfive';

...

class App extends React.Component<Props, State> {
  ...
  async componentDidMount() {
    await fetch("https://somebits.io/data.h5")
      .then(function(response) { 
        return response.arrayBuffer() 
      })
      .then(function(buffer) {
        const f = new hdf5.File(buffer, "data.h5");
        console.log(`f.keys ${JSON.stringify(f.keys)}`);
        const data = f.get('data');
        console.log(`data.keys ${JSON.stringify(data.keys)}`);
        const dataGroup = data.get('tsg8n0ki');
        console.log(`dataGroup.attrs ${JSON.stringify(dataGroup.attrs)}`);
        const metadata = f.get('metadata');
        console.log(`metadata.keys ${JSON.stringify(metadata.keys)}`);
        const groups = metadata.get('groups');
        const group = groups.get('tsg8n0ki');
        console.log(`metadata.groups['tsg8n0ki'].attrs ${JSON.stringify(group.attrs)}`);
      })
      .catch(err => {
        console.log(`err ${err}`);
      });
  }
  ...
}

The browser console reports the following expected messages with jsfive:

f.keys ["data","metadata"]
data.keys ["tsg8n0ki"]
dataGroup.attrs {}
metadata.keys ["axes","groups","summary"]
metadata.groups['tsg8n0ki'].attrs {"name":"RGBa colorspace"}

When testing h5wasm, I use the following import:

import * as hdf5 from 'h5wasm';

The rest of the code is identical.

The head of output from the console (including warnings):

hdf5_util.js:9 HDF5-DIAG: Error detected in HDF5 (1.12.1) thread 0:
hdf5_util.js:9   #000: /home/brian/dev/h5wasm/hdf5-hdf5-1_12_1/src/H5F.c line 487 in H5Fcreate(): invalid file name
hdf5_util.js:9     major: Invalid arguments to routine
hdf5_util.js:9     minor: Bad value
f.keys undefined
...

I'm leaving out the rest of the console messages, which show values being undefined due to the container not loading correctly.

@bmaranville
Copy link
Member

The jsfive and h5wasm packages have different approaches for opening files: jsfive is built around working with ArrayBuffer objects (including internally), while h5wasm is built on the HDF5 C API and uses a virtual filesystem (the native filesystem for nodejs, and a virtual MEMFS filesystem in the browser).

There are directions on loading an hdf file from an ArrayBuffer in the h5wasm README... you have to "save" it to the virtual filesystem first:

let response = await fetch("https://ncnr.nist.gov/pub/ncnrdata/vsans/202003/24845/data/sans59510.nxs.ngv");
let ab = await response.arrayBuffer();

hdf5.FS.writeFile("sans59510.nxs.ngv", new Uint8Array(ab));

// use mode "r" for reading.  All modes can be found in hdf5.ACCESS_MODES
let f = new hdf5.File("sans59510.nxs.ngv", "r");
// File {path: "/", file_id: 72057594037927936n, filename: "data.h5", mode: "r"}

The constructor for h5wasm could be modified so that it automatically creates a backing file if an ArrayBuffer is passed as the first argument, and closes it when the File object is closed (and then deletes it?). Alternatively I could look into exposing the HDF5 API function H5LTopen_file_image for loading a file image directly from memory.

@alexpreynolds
Copy link
Author

Thanks, and sorry for my confusion about the API.

Can I ask briefly if the entire container is loaded into memory before any processing can be done? I'd like to know if I can progressively load chunks or slices of a data matrix, if I have a larger container.

@bmaranville
Copy link
Member

bmaranville commented Jan 24, 2022

The short answer is yes, you can load slices efficiently without loading the entire file into memory, but only if you use the nodejs version of h5wasm that directly accesses the hdf5 file from the filesystem combined with the Dataset.slice function.

If you use the browser version, it by necessity loads the entire file into memory first, though you will see performance benefits from using Dataset.slice() in this case also, as you don't have to decode the entire dataset before using parts of it. For e.g. Compound datatypes that are expensive to decode this could be important.

There is another ticket #4 where a request was made for random access to files over a network - this is not easy to implement and may be done in the future if the next version of the emscripten filesystem supports this directly.

@alexpreynolds
Copy link
Author

Thanks for your help.

I think I am having trouble opening compound data, both in the browser and in the nodejs version (v0.1.8):

$ node
Welcome to Node.js v16.13.1.
Type ".help" for more information.
> const hdf5 = require('h5wasm')
> let f = new hdf5.File("/Users/areynolds/Desktop/data.h5", "r")
> let t = f.get('data/tsg8n0ki')

When I print out the first row:

> t.slice([[0,1]])
[
  [
    Uint8Array(12) [
      231,  27, 202,  64, 195,
      182, 103,  64, 139,  50,
      161,  64
    ],
    0
  ]
]

In reality, this row is a compound of three 32-bit floats (np.float32) and one unsigned 32-bit integer (np.uint32). From the Python script used to make data.h5:

ds_dtype = [('xyz', np.float32, (3, )), ('label_idx', np.uint32)]

The size is right — twelve bytes gives three 32-bit floats — but I'm getting a raw array of those bytes and not the original floats.

Am I doing something wrong to access this data, or would it help to open a new issue? I should probably close this one up, as well.

@bmaranville
Copy link
Member

I don't have good version notes at the moment - but decoding compound datasets should work in h5wasm >= 0.1.8 (this was a recent feature addition)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants