You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Something similar would be useful for Xarray, I reckon, although just like there's argmin and argmax next to min and max, having arg_nsmallest and arg_nlargest (or something) would convenient as well.
It could match the existing method signatures, requiring an extra n argument:
The basic idea is to wrap numpy or bottleneck argpartition, I currently use this quick and dirty utility for a DataArray and a single dimension:
defarg_nsmallest(da: xr.DataArray, dim: str, n: int):
""" Return the index or indices of the ``n`` smallest values along dimension ``dim``. Parameters ---------- da: xr.DataArray dim: str Dimension over which to find the ``n`` smallest values. n: int The number of items to retrieve. Returns ------- result: xr.DataArray """# Find the axis over which to apply the partition.axis=da.dims.index(dim)
# Set up output coordinates.dim_index=np.arange(n)
coords=da.coords.copy()
coords[dim] =dim_indexshape=list(da.shape)
shape[axis] =ntemplate=xr.DataArray(
data=dask.array.zeros(shape, dtype=int),
coords=coords,
dims=da.dims,
)
def_nsmallest(da: xr.DataArray):
# NOTE: numpy (arg)partition moves NaNs to the back;# bottleneck partition does not!smallest=np.argpartition(da.to_numpy(), kth=n, axis=axis)
returntemplate.copy(data=np.take(smallest, indices=np.arange(n), axis=axis))
returnxr.map_blocks(_nsmallest, da, template=template)
Describe alternatives you've considered
In principle, the same can be achieved using e.g. xarray's argsort, but this is much more costly when e.g. only the three highest or lowest values are required. Argsort doesn't support dimensions and isn't NaN-aware either; nsmallest is more straightforward since nlargest is obstructed by the NaNs moved to the end.
Additional context
No response
The text was updated successfully, but these errors were encountered:
Agree that a wrapper around partition/argpartition would be useful. Note the python array community calls this topkdata-apis/array-api#629 so we should probably follow that.
At least Dask provides topk and argtopk so we need not use map_blocks.
As an aside, that naming sadly clashes with finding the top-k most frequent elements in a stream (example) :/
Is your feature request related to a problem?
I find that I need the (index of) N largest or N smallest values along some dimension with some regularity.
Describe the solution you'd like
Pandas provides
nsmallest
andnlargest
:Something similar would be useful for Xarray, I reckon, although just like there's
argmin
andargmax
next tomin
andmax
, havingarg_nsmallest
andarg_nlargest
(or something) would convenient as well.It could match the existing method signatures, requiring an extra
n
argument:The basic idea is to wrap numpy or bottleneck
argpartition
, I currently use this quick and dirty utility for a DataArray and a single dimension:Describe alternatives you've considered
In principle, the same can be achieved using e.g. xarray's
argsort
, but this is much more costly when e.g. only the three highest or lowest values are required. Argsort doesn't support dimensions and isn't NaN-aware either;nsmallest
is more straightforward sincenlargest
is obstructed by the NaNs moved to the end.Additional context
No response
The text was updated successfully, but these errors were encountered: