-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
open_virtual_dataset with dmr++ #113
base: main
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
virtualizarr/dmrpp.py
Outdated
chunk_num = ( | ||
chunk_pos // chunks | ||
) # [0,1023,10235] // [1, 1023, 2047] -> [0,1,5] | ||
chunk_key = ".".join(map(str, chunk_num)) # [0,0,1] -> "0.0.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a join
function in virtualizarr.zarr
for doing this.
chunk key parsing speedup
for more information, see https://pre-commit.ci
Thanks for taking a look and giving my suggested changes to the chunk key parsing a try @ayushnag ! Continuing the discussion on performance I think the remaining bottlenecks (aside from your point about I/O in the cloud maybe) with this now lie primarily outside the scope of this work, and I don't expect changing XML readers to make a significant improvement. |
virtualizarr/xarray.py
Outdated
group : str, default None | ||
Group path within the dataset to open. For example netcdf4 and hdf5 groups |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to separate out the addition of this kwarg into a separate pull request, and implement it for the existing HDF5 reader. Then this PR wouldn't need to change the API of open_virtual_dataset
.
@@ -0,0 +1,331 @@ | |||
from typing import Optional | |||
from xml.etree import ElementTree as ET |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the only extra import required? (And this is a built-in python library module right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is the only extra import and it is built in
virtualizarr/manifests/manifest.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The int32 to int64 change had to be made since I ran into some large byte offsets with the Atlas ICE-SAT dataset. Here is an example error: OverflowError: Python integer 6751178683 out of bounds for int32
Some questions about writing unit tests:
|
group=
param