open_virtual_dataset with dmr++ #113

ayushnag · 2024-05-14T15:16:59Z

Closes Reading from dmrcp index files? #85
Tests added
Changes are documented in docs/releases.rst
New functions/methods are listed in api.rst
Open dataset via group= param
Use reader options for dmr file opening pattern

for more information, see https://pre-commit.ci

virtualizarr/xarray.py

TomNicholas · 2024-05-14T17:15:35Z

virtualizarr/dmrpp.py

+            chunk_num = (
+                chunk_pos // chunks
+            )  # [0,1023,10235] // [1, 1023, 2047] -> [0,1,5]
+            chunk_key = ".".join(map(str, chunk_num))  # [0,0,1] -> "0.0.1"


I have a join function in virtualizarr.zarr for doing this.

virtualizarr/dmrpp.py

chunk key parsing speedup

for more information, see https://pre-commit.ci

agoodm · 2024-05-15T02:19:32Z

Thanks for taking a look and giving my suggested changes to the chunk key parsing a try @ayushnag !

Continuing the discussion on performance I think the remaining bottlenecks (aside from your point about I/O in the cloud maybe) with this now lie primarily outside the scope of this work, and I don't expect changing XML readers to make a significant improvement.

TomNicholas · 2024-05-30T16:13:25Z

virtualizarr/xarray.py

+    group : str, default None
+        Group path within the dataset to open. For example netcdf4 and hdf5 groups


It would be nice to separate out the addition of this kwarg into a separate pull request, and implement it for the existing HDF5 reader. Then this PR wouldn't need to change the API of open_virtual_dataset.

virtualizarr/readers/dmrpp.py

TomNicholas · 2024-05-30T16:19:55Z

virtualizarr/readers/dmrpp.py

@@ -0,0 +1,331 @@
+from typing import Optional
+from xml.etree import ElementTree as ET


Is this the only extra import required? (And this is a built-in python library module right?)

Yes this is the only extra import and it is built in

ayushnag · 2024-06-27T22:49:32Z

virtualizarr/manifests/manifest.py

The int32 to int64 change had to be made since I ran into some large byte offsets with the Atlas ICE-SAT dataset. Here is an example error: OverflowError: Python integer 6751178683 out of bounds for int32

ayushnag · 2024-06-27T23:35:46Z

Some questions about writing unit tests:

How to load test dmrpp’s?
- These files are available over https but need netrc login (NASA Earthdata authentication)
- I will check how earthaccess gets creds and does testing
What should I compare my result to?
- My understanding is that the dmr parsed dataset should match the dataset made by vz.open_dataset(“data.nc”)
- Should everything match or are there some main portions to check? (dims, attrs, variables)
- Related to Get xarray.testing.assert_identical to work on datasets containing ManifestArrays #161

ayushnag and others added 2 commits May 13, 2024 11:51

basic dmr parsing functionality

18b53bd

[pre-commit.ci] auto fixes from pre-commit.com hooks

47d8901

for more information, see https://pre-commit.ci

TomNicholas added references generation Reading byte ranges from archival files enhancement New feature or request labels May 14, 2024

ayushnag changed the title ~~basic dmr parsing functionality~~ open_dataset with dmr++ May 14, 2024

TomNicholas changed the title ~~open_dataset with dmr++~~ open_virtual_dataset with dmr++ May 14, 2024

TomNicholas reviewed May 14, 2024

View reviewed changes

virtualizarr/xarray.py Outdated Show resolved Hide resolved

TomNicholas reviewed May 14, 2024

View reviewed changes

virtualizarr/xarray.py Outdated Show resolved Hide resolved

TomNicholas reviewed May 14, 2024

View reviewed changes

virtualizarr/dmrpp.py Outdated Show resolved Hide resolved

ayushnag and others added 4 commits May 14, 2024 12:07

Merge branch 'TomNicholas:main' into dmr-adapter

f3bfa82

Speedup DMR chunk key parsing

aaf6af2

Merge pull request #1 from agoodm/dmr-adapter

fc8b0d8

chunk key parsing speedup

[pre-commit.ci] auto fixes from pre-commit.com hooks

7b81eeb

for more information, see https://pre-commit.ci

added groups, docs, and bug fixes

8334d0a

TomNicholas reviewed May 30, 2024

View reviewed changes

TomNicholas mentioned this pull request May 30, 2024

Reading from dmrcp index files? #85

Open

Merge branch 'TomNicholas:main' into dmr-adapter

64d59b1

ayushnag mentioned this pull request Jun 18, 2024

Opening virtual datasets with NASA dmrpp files nsidc/earthaccess#605

Open

TomNicholas mentioned this pull request Jun 19, 2024

Opening virtual datasets (dmr-adapter) nsidc/earthaccess#606

Draft

8 tasks

ayushnag added 2 commits June 21, 2024 15:51

Merge branch 'zarr-developers:main' into dmr-adapter

1a3b787

rework hdf5 parser and group logic

7580fdc

ayushnag commented Jun 27, 2024

View reviewed changes

TomNicholas mentioned this pull request Jun 29, 2024

GDAL Virtual Rasters #166

Open

Merge remote-tracking branch 'upstream/main' into dmr-adapter

52ceba0

ayushnag mentioned this pull request Jul 8, 2024

Missing zlib (deflate) compression level from dmrpp files OPENDAP/bes#954

Open

ayushnag added 2 commits July 10, 2024 13:41

update attrs cast to python dtype

b1f9aee

parser passing tests

ae29176

ayushnag added 3 commits July 14, 2024 16:39

match main manifest dtypes

6e763f9

Merge branch 'zarr-developers:main' into dmr-adapter

0824ed2

Merge branch 'zarr-developers:main' into dmr-adapter

659ab65

ayushnag temporarily deployed to test-release July 15, 2024 19:19 — with GitHub Actions Inactive

Merge branch 'zarr-developers:main' into dmr-adapter

b8531c8

ayushnag temporarily deployed to test-release July 19, 2024 17:48 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

open_virtual_dataset with dmr++ #113

open_virtual_dataset with dmr++ #113

ayushnag commented May 14, 2024 •

edited

Loading

TomNicholas May 14, 2024

agoodm commented May 15, 2024

TomNicholas May 30, 2024

TomNicholas May 30, 2024

ayushnag Jun 27, 2024

ayushnag Jun 27, 2024 •

edited

Loading

ayushnag commented Jun 27, 2024 •

edited

Loading

		group : str, default None
		Group path within the dataset to open. For example netcdf4 and hdf5 groups

		@@ -0,0 +1,331 @@
		from typing import Optional
		from xml.etree import ElementTree as ET

open_virtual_dataset with dmr++ #113

Are you sure you want to change the base?

open_virtual_dataset with dmr++ #113

Conversation

ayushnag commented May 14, 2024 • edited Loading

TomNicholas May 14, 2024

Choose a reason for hiding this comment

agoodm commented May 15, 2024

TomNicholas May 30, 2024

Choose a reason for hiding this comment

TomNicholas May 30, 2024

Choose a reason for hiding this comment

ayushnag Jun 27, 2024

Choose a reason for hiding this comment

ayushnag Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

ayushnag commented Jun 27, 2024 • edited Loading

ayushnag commented May 14, 2024 •

edited

Loading

ayushnag Jun 27, 2024 •

edited

Loading

ayushnag commented Jun 27, 2024 •

edited

Loading