Ignore missing dims when mapping over tree #67

TomNicholas · 2022-03-03T20:56:00Z

This tree has a dimension present in some nodes and not others (the "people" dimension).

DataTree('root', parent=None)
│   Dimensions:  (people: 2)
│   Coordinates:
│     * people   (people) <U5 'alice' 'bob'
│       species  <U5 'human'
│   Data variables:
│       heights  (people) float64 1.57 1.82
└── DataTree('simulation')
    ├── DataTree('coarse')
    │   Dimensions:  (x: 2, y: 3)
    │   Coordinates:
    │     * x        (x) int64 10 20
    │   Dimensions without coordinates: y
    │   Data variables:
    │       foo      (x, y) float64 0.1242 -0.2324 0.2469 0.5168 0.8391 0.8686
    │       bar      (x) int64 1 2
    │       baz      float64 3.142
    └── DataTree('fine')
        Dimensions:  (x: 6, y: 3)
        Coordinates:
          * x        (x) int64 10 12 14 16 18 20
        Dimensions without coordinates: y
        Data variables:
            foo      (x, y) float64 0.1242 -0.2324 0.2469 ... 0.5168 0.8391 0.8686
            bar      (x) float64 1.0 1.2 1.4 1.6 1.8 2.0
            baz      float64 3.142

If a user calls dt.mean(dim='people'), then at the moment this will raise an error. That's because it maps the .mean call over each group, and when it gets to either the 'coarse' group or the 'fine' group it will not find a dimension called 'people'.

However the user might want to take the mean of groups only where this makes sense, and ignore the rest.

I think the best solution is to have a missing_dims argument, like xarray's .isel already has. Then the user can do dt.mean(dim='people', missing_dims='ignore').

To actually implement this I think only requires changes in xarray, not here, because those changes should propagate down to datatree. pydata/xarray#5030

The text was updated successfully, but these errors were encountered:

abkfenris · 2023-01-09T15:44:05Z

Continuing from related discussion in https://discourse.pangeo.io/t/xarray-and-collections-of-forecasts/3054/6

It would also be helpful to have it on .sel for my usage.

I haven't dug around in the guts of datatree enough to understand how it's mapping functions to each group, but would it be possible to add a missing_dims kwarg at the mapping level? Then use it to decide if to catch KeyErrors from the underlying dataset methods or not?

If I'm understanding things right, Datatree uses a mixin (MappedDatasetMethodsMixin) to manage mapping methods to datasets. Could map_over_subtree pick missing_dims off the kwargs?

TomNicholas · 2023-01-09T16:01:03Z

Hi @abkfenris !

datatree.mapping is where the guts of the mapping occurs. The mixin just steals certain methods from xarray.Dataset and wraps them with map_over_subtree. The mapping code is basically just this:

def map_over_subtree(func, dt, *args, **kwargs):
    new_tree = ...
    for node in dt.subtree
        result_ds = func(node.ds, *args, **kwargs)
        new_tree[node.path] = result_ds

but generalised to potentially map over multiple trees simultaneously (e.g. for binary operations like __add__), with error checking, and usable as a decorator.

would it be possible to add a missing_dims kwarg at the mapping level?

We could, but missing_dims wouldn't make sense for every function we might map - that's the challenge here. That's why I suggested we might want to add something to map_over_subtree that allows you to ignore any KeyError? Or another approach would be to modify .sel upstream.

abkfenris · 2023-01-09T21:16:06Z

I wonder if ignoring KeyError might be too broad and could catch more than intended (I'm thinking Dask or fsspec KeyErrors bubbling up). Might be worth exploring getting more tightly defined errors upstream.

TomNicholas · 2023-01-09T21:20:38Z

Yes that's a good point. I am not sure what the best solution is here.

…

On Mon, Jan 9, 2023, 2:16 PM Alex Kerney ***@***.***> wrote: I wonder if ignoring KeyError might be too broad and could catch more than intended (I'm thinking Dask or fsspec KeyErrors bubbling up). Might be worth exploring getting more tightly defined errors upstream. — Reply to this email directly, view it on GitHub <#67 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AISNPIZJLLCJRG375IC4CSLWRR52BANCNFSM5P3S737Q> . You are receiving this because you authored the thread.Message ID: ***@***.***>

TomNicholas · 2024-04-17T00:13:49Z

See pydata/xarray#8949 for a much more thought-out solution to this problem

TomNicholas added the enhancement New feature or request label Mar 3, 2022

TomNicholas mentioned this issue Apr 22, 2022

Indexing tree should create new tree #77

Open

TomNicholas mentioned this issue Jul 7, 2023

Slicing/selecting datree nodes using coordinates similar to xarray.dataset.sel (?) #244

Open

TomNicholas mentioned this issue Oct 24, 2023

Error in datatree.DataTree.sel when attributes are set #262

Closed

TomNicholas mentioned this issue Feb 14, 2024

Datatree design discussions - weekly meeting pydata/xarray#8747

Open

11 tasks

TomNicholas mentioned this issue Apr 16, 2024

Mapping DataTree methods over nodes with variables for which the args are invalid pydata/xarray#8949

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignore missing dims when mapping over tree #67

Ignore missing dims when mapping over tree #67

TomNicholas commented Mar 3, 2022 •

edited

Loading

abkfenris commented Jan 9, 2023

TomNicholas commented Jan 9, 2023

abkfenris commented Jan 9, 2023

TomNicholas commented Jan 9, 2023 via email

TomNicholas commented Apr 17, 2024

Ignore missing dims when mapping over tree #67

Ignore missing dims when mapping over tree #67

Comments

TomNicholas commented Mar 3, 2022 • edited Loading

abkfenris commented Jan 9, 2023

TomNicholas commented Jan 9, 2023

abkfenris commented Jan 9, 2023

TomNicholas commented Jan 9, 2023 via email

TomNicholas commented Apr 17, 2024

TomNicholas commented Mar 3, 2022 •

edited

Loading