Skip to content

improving the API for binned groupby #191

@keewis

Description

@keewis

I've been trying to use flox for multi-dimensional binning and found the API a bit tricky to understand.

For some context, I have two variables (depth(time) and temperature(time)), which I'd like to bin into time_bounds(time, bounds) and depth_bounds(time, bounds).

I can get this to work using

arr = ds.set_coords("depth")["temperature"]
coords = [reference[name] for name in ["depth", "time"]]
vertices = [
    cf_xarray.bounds_to_vertices(reference[name], bounds_dim="bounds")
    for name in ["depth_bounds", "time_bounds"]
]
flox.xarray.xarray_reduce(
    arr,
    *coords,
    expected_groups=vertices,
    isbin=[True] * len(coords),
    func="mean",
)

but in the process of getting this right I frequently hit the Needs better message error from

raise ValueError("Needs better message.")

which certainly did not help too much. However, ignoring that it was pretty difficult to make sense of the combination of *by, expected_groups, and isbin, and I'm not confident I won't be going through the same cycle of trial and error if I were to retry in a few months.

Instead, I wonder if we could change the call to something like:

bins = [
    flox.Bin(along=name, labels=reference[name], bounds=reference[f"{name}_bounds"])
    for name in ["depth", "time"]
]
flox.xarray.xarray_reduce(arr, *bins, func="mean")

(leaving aside the question of which bounds convention(s) this Bin object should support)

Another option might be to just use an interval index. Something like:

flox.xarray.xarray_reduce(arr, time=pd.IntervalIndex(...), depth=pd.IntervalIndex(...), func="mean")

That would be pretty close to the existing groupby interface. And we could even combine both:

flox.xarray.xarray_reduce(
    arr,
    time=flox.Bin(labels=reference[name], bounds=reference[f"{name}_bounds"]),
    depth=flox.Bin(labels=reference[name], bounds=reference[f"{name}_bounds"]),
    func="mean",
)

xref pydata/xarray#6610, where we probably want to adopt whatever signature we figure out here. Also, do tell me if you'd prefer to have this discussion in that issue instead (but figuring this out here might allow for quicker iteration). And maybe I'm trying to get xarray_reduce to do something too similar to groupby?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions