Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dimension scales, NetCDF-4 support #60

Open
adamshaylor opened this issue Sep 1, 2023 · 15 comments
Open

Dimension scales, NetCDF-4 support #60

adamshaylor opened this issue Sep 1, 2023 · 15 comments

Comments

@adamshaylor
Copy link

Do you have plans to add dimension scales to h5wasm? If not, would you be open to doing so?

I’ve been tinkering with h5wasm as a possible means of dynamically creating NetCDF-4 files in the browser. Given the similarities between NetCDF-4 and HDF5, h5wasm is the lightest-weight approach I’ve found. (By the way I appreciate the work you’ve done on it! The TypeScript type definitions have made it very easy for me to learn the API quickly.) The one thing I’ve found to be missing for NetCDF-4 support (thus far anyway) is dimension scales. Perhaps you’re aware of others, but this seems to be the most obvious one.

References:

@bmaranville
Copy link
Member

Sure, I would be open to doing so. What kind of interface did you have in mind? Something like the h5py approach here https://docs.h5py.org/en/stable/high/dims.html?

@adamshaylor
Copy link
Author

@bmaranville, great! I read the h5py docs you linked to and shared them with a colleague who’s more familiar with Python than me. I don’t think we’re too picky about the interface. Whatever you think suits the project best will probably be fine with us.

Our primary concern is getting data out of the browser’s JavaScript runtime and into a NetCDF-4 file our users can open in Panoply or QGIS. The way I’ve been testing this approach as a proof of concept is with a little script built on h5wasm that traverses all the attributes and paths of an input file, copies them to a new file, and generates a Blob-based link to download that output. Then we use ncdump or h5dump to compare the headers of the input an output and Panoply to try to create a map-based plot. This is the only feature that sticks out to us as obviously missing to get to parity. If there’s any more information I can provide that would inform your approach, e.g. sample data and/or access to the script I mentioned, let me know.

@bergmorten
Copy link

bergmorten commented Oct 11, 2023

We need also this option. I think the best approach is to make it equal to the hdf5 specification, e.g. set_scale, attach_scale and detach_scale:

https://docs.hdfgroup.org/hdf5/develop/group___h5_d_s.html
example:
https://docs.hdfgroup.org/archive/support/HDF5/Tutor/h5dimscale.html

@bergmorten
Copy link

@bmaranville Have you had any chance on looking at this feature?

@bmaranville
Copy link
Member

I looked at the underlying HDF5 libraries, and it seems straightforward. Exposing the minimal write/create/modify functions you listed from the HDF5 API would be quick. Would that be useful?

I don't have time to immediately implement a more complete solution (that would allow e.g. reading dimension scales or identifying attached dimension scales).

@bergmorten
Copy link

bergmorten commented Nov 2, 2023 via email

@adamshaylor
Copy link
Author

For the time being, it so happens that we’re really mainly interested in generating NetCDF-4 files, so I think this should work for us, too, thanks.

@bmaranville
Copy link
Member

Ok, write support is in v0.6.8 - please give it a try. Documentation can be found in the CHANGELOG or the release notes

I'll leave this open because basic read support is not yet implemented, and more convenient functions have not been added to the typescript API in hdf5_hl.ts

@bmaranville
Copy link
Member

This is purely from curiosity, but what projects are you working on that will involve writing netcdf4 files with h5wasm?

@bergmorten
Copy link

Hi, I'll test the code tomorrow :-)

We use h5wasm to publish data to IOOS https://ioos.us/ and do not want to use the old node.js lib for netcdf, and need browser support. They support both hdf5 and netcdf, however the file must have dimensions.

@adamshaylor
Copy link
Author

Thank you very much, @bmaranville. I will find some time to test next week.

In our case, we (Lobelia) are responsible for EU web applications that allow scientists and policy makers to view and download climate data from the web (for example, the Copernicus Marine MyOcean Viewer). In some of these applications, the data the user wants to download is already present in the browser. Since our users tend to use QGIS, we need to export to NetCDF. Rather than rely on a web service as we do now, we are looking into how we can export from within the browser.

@bergmorten
Copy link

Set and attach scale worked for us :-) Thank you very much

A suggestion is to add these functions into the dataset object so that you do not need to set file_id or dataset name

@bmaranville
Copy link
Member

bmaranville commented Nov 6, 2023

I was surprised that requests for write functionality came in before requests for read functionality! My reason for getting into the hdf5-in-the-browser game was to support web-based visualization and inspection of web-based datasets. Just in case there is demand for visualizing netcdf4 files, here are the read functions that go along with the write functions: (and the the write functions have been added to the TypeScript API as requested above)

v0.6.9 2023-11-06

Fixed

  • added missing FileSystem API function mkdirTree to Emscripten typescript interface

Added

  • Functions for working with dimension scales in typescript interface:
// convert dataset to dimension scale:
Dataset.make_scale(scale_name: string)
// attach a dimension scale to the "index" dimension of this dataset:   
Dataset.attach_scale(index: number, scale_dset_path: string)
// detach a dimension scale from "index" dimension
Dataset.detach_scale(index: number, scale_dset_path: string)
// get full paths to all datasets that are attached as dimension scales
// to the specified dimension (at "index") of this dataset:
Dataset.get_attached_scales(index: number)
// if this dataset is a dimension scale, returns name as string
// (returns empty string if no name defined, but it is a dimension scale)
// else returns null if it is not set as a dimension scale:
Dataset.get_scale_name()
  • Functions for working with dimension labels (not related to dimension scales)
// label dimension at "index" of this dataset with string "label":
Dataset.set_dimension_label(index: number, label: string)
// fetch labels for all dimensions of this dataset (null if label not defined):
Dataset.get_dimension_labels()

@bmaranville
Copy link
Member

I have one further question - the HDF5 mapping spec for NetCDF4 indicates that all groups should be created with link and attribute creation order preserved, and that all datasets should be created with attribute creation order preserved (see https://www.earthdata.nasa.gov/sites/default/files/imported/ESDS-RFC-022v1.pdf)

Currently h5wasm does not have a mechanism for doing this. Is it very important for your work?

@bergmorten
Copy link

For me this is not an issue. The IOOS compliance checker approved the generated files with the new h5wasm. However, I think they accept both hdf5 and NetCDF as long they have dimensions/scales.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants