Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

h5netcdf: Don't assume there are dimensions when there are chunks #10092

Closed

Conversation

rho-novatron
Copy link

@rho-novatron rho-novatron commented Mar 3, 2025

Copy link

welcome bot commented Mar 3, 2025

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient.
If you have questions, some answers may be found in our contributing guidelines.

@dcherian dcherian changed the title Don't assume there are dimensions when there are chunks h5netcdf: Don't assume there are dimensions when there are chunks Mar 3, 2025
@rho-novatron
Copy link
Author

I added a somewhat artificial test to not have to include an hsds service in the CI chain.

@dcherian dcherian requested a review from kmuehlbauer March 6, 2025 19:53
Copy link
Contributor

@kmuehlbauer kmuehlbauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look reasonable. @rho-novatron Is this a speciality of h5pyd/hsds to return strings with chunks? I just want to make sure there is no other issue hidden which could be resolved upstream.

@rho-novatron
Copy link
Author

That is a good question, and I don't know the answer. I'll check and get back.

@rho-novatron
Copy link
Author

It really does look as if chunking isn't really supported for scalar datasets, but that they are stored with layout class H5D_CHUNKED and thus having chunk dimensions == (1,). There are several places, both in h5netcdf and h5pyd that verifies that scalar datasets cannot be created with chunks. However, I can't really find anything in the HSDS documentation that says that a scalar dataset couldn't have a chunk.

Perhaps it would be better to check for scalar datasets in h5netcdf and return None for scalar datasets? I think that might be the best balance between keeping xarray generic, h5pyd slightly raw and h5netcdf the adaptor logic? Do you have a better idea?

@rho-novatron
Copy link
Author

I added a draft PR that makes concrete what I meant: h5netcdf/h5netcdf#259

@kmuehlbauer
Copy link
Contributor

Thanks for digging into this @rho-novatron.

I was about suggesting something along your PR h5netcdf/h5netcdf#259. Responded over there.

@rho-novatron rho-novatron deleted the rho/chunks-without-dimensions branch March 7, 2025 14:52
@kmuehlbauer
Copy link
Contributor

Thanks @rho-novatron for this fast interaction, great experience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

With backend h5netcdf and driver h5pyd there are cases when len(var.chunks) > 0 but len(var.dimensions) == 0
4 participants