Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

open_datatree() keeps the hdf file open preventing writes #325

Open
KareemShalabi opened this issue Mar 30, 2024 · 1 comment
Open

open_datatree() keeps the hdf file open preventing writes #325

KareemShalabi opened this issue Mar 30, 2024 · 1 comment

Comments

@KareemShalabi
Copy link

Consider this analysis pipline:
Multiple arrays for the same data variable organized in a group hierarchy inside HDF file according to some attributes. A datatree is a perfect data structure container for that. I can read all arrays in a chunked dask datasets, and map the function over the datatree collecting the results on the way.

Because the size of the final result of the function is way out of memory, I tried saving the intermediary results(result of computation in a single iteration) to the same file and group path returning the new chunked dataarray after reloading. An exception is thrown, because the file is hold open by the datatree object. This does not happen when I create datatree object myself ( from a dict of group paths and dataarray objects).

@TomNicholas
Copy link
Collaborator

Thanks for raising this. I think this issue is a duplicate of #93. There was a PR opened to fix it but realistically given that we're currently integrating datatree into Xarray main, we'll probably prioritize fixing there instead of in this package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants