Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds open_datatree and load_datatree to the tutorial module #10082

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

eni-awowale
Copy link
Collaborator

@eni-awowale eni-awowale commented Feb 27, 2025

Adds open_datatree and load_datatree to the tutorial module

  • Closes #xxxx
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

Copy link
Collaborator

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got two comments. Other than that, we'll probably have to refactor the way we call pooch since we now basically duplicated the code of open_dataset (doesn't have to be in this PR, though).

cache_dir = tmp_path / tutorial._default_cache_dir_name
ds = tutorial.open_dataset(self.testfile, cache_dir=cache_dir).load()
ds = tutorial.open_dataset(testfile, cache_dir=cache_dir).load()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's probably better to just hard-code the dataset name into the test, there's no point in parametrizing this (to be clear, this part of the test suite is pretty old):

Suggested change
ds = tutorial.open_dataset(testfile, cache_dir=cache_dir).load()
ds = tutorial.open_dataset("tiny", cache_dir=cache_dir).load()

url = external_urls[name]
else:
path = pathlib.Path(name)
if not path.suffix:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do the hdf5 file work with both netcdf4 and h5netcdf? Otherwise we might need to specialize, like we do with grib

Copy link
Collaborator Author

@eni-awowale eni-awowale Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, imerghh_730.HDF5 and imerghh_830.HDF5, works for both engines. I think if we wanted to add hdf5 files without named dimensions we would want to specify the engine as h5netcdf.

EDIT:
Since pydata/xarray-data#32 was merged, we do have to explicitly add the extension, e.g. xr.tutorial.open_datatree('imerghh_830.hdf5'), otherwise it defaults to .nc

from xarray import DataArray, tutorial
from xarray.tests import assert_identical, network
from xarray import DataArray, DataTree, tutorial
from xarray.testing import assert_identical
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this to use the xarray.testing module's assert_identical because xarray.tests didn't support DataTree objects.

@eni-awowale eni-awowale marked this pull request as ready for review February 27, 2025 22:03
@TomNicholas TomNicholas added topic-documentation topic-DataTree Related to the implementation of a DataTree class labels Feb 28, 2025
@eni-awowale
Copy link
Collaborator Author

FYI these checks look like they are failing in main 😬

@keewis
Copy link
Collaborator

keewis commented Feb 28, 2025

yep, that's a change to array-api-strict. If you want to fix them within the next four hours, I think it should be sufficient to cast the condition of duck_array_ops.where to a bool dtype (otherwise I'll send in a PR myself).

@eni-awowale
Copy link
Collaborator Author

Sure! Is there a issue for this yet? I can get a PR started and you can jump in when you're free.

@keewis
Copy link
Collaborator

keewis commented Feb 28, 2025

there's #10084, but nothing else. I think you can just open the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-DataTree Related to the implementation of a DataTree class topic-documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants