Skip to content

Clean-up indexing adapter classes #10355

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Jul 7, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
bc94c6d
clean-up indexing.PandasIndexingAdapter typing
benbovy May 9, 2025
17ff7e9
streamline PandasIndexingAdapter indexing logic
benbovy May 9, 2025
2b25155
clean-up PandasIndexingAdapter dtype handling
benbovy May 9, 2025
9981078
more clean-up
benbovy May 9, 2025
29098ac
repr: prevent loading lazy variables into memory
benbovy May 26, 2025
5f09354
fix array (index) subsetting
benbovy May 26, 2025
c4a853e
Merge branch 'main' into cleanup-pandas-indexing-adapter
benbovy Jul 4, 2025
0e5154c
treat multi-index and coord-transform variables as lazy
benbovy Jul 4, 2025
4efb135
update whats new
benbovy Jul 4, 2025
ef73a7e
add benchmarks for pandas and xarray RangeIndex
benbovy Jul 7, 2025
28b661a
Merge branch 'main' into cleanup-pandas-indexing-adapter
benbovy Jul 7, 2025
a2ccb7d
fix benchmark numba import error (numpy 2.3)
benbovy Jul 7, 2025
07f6cdb
benchmark: pin numpy in conf + consistent conda env
benbovy Jul 7, 2025
a953b41
pyproject: bump setuptools(-scm)
benbovy Jul 7, 2025
2be275d
ci benchmarks: try fixing package install
benbovy Jul 7, 2025
825cdb1
next try
benbovy Jul 7, 2025
c890a69
next try
benbovy Jul 7, 2025
16fe98b
next try
benbovy Jul 7, 2025
8ae12f7
benchmarks: try disabling no build isolation
benbovy Jul 7, 2025
f40f38c
Revert "benchmarks: try disabling no build isolation"
benbovy Jul 7, 2025
0ecc214
Revert "next try"
benbovy Jul 7, 2025
74e993c
Revert "next try"
benbovy Jul 7, 2025
3420fc9
Revert "next try"
benbovy Jul 7, 2025
97579f5
Revert "ci benchmarks: try fixing package install"
benbovy Jul 7, 2025
86df720
Revert "pyproject: bump setuptools(-scm)"
benbovy Jul 7, 2025
0887a8e
I'm tired of Python packaging
benbovy Jul 7, 2025
8a76b46
Let's fix all this later
benbovy Jul 7, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/benchmarks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
runs-on: ubuntu-latest
env:
ASV_DIR: "./asv_bench"
CONDA_ENV_FILE: ci/requirements/environment.yml
CONDA_ENV_FILE: ci/requirements/environment-benchmark.yml

steps:
# We need the full repo to avoid this issue
Expand All @@ -29,7 +29,7 @@ jobs:
with:
micromamba-version: "1.5.10-0"
environment-file: ${{env.CONDA_ENV_FILE}}
environment-name: xarray-tests
environment-name: xarray-benchmark
cache-environment: true
cache-environment-key: "${{runner.os}}-${{runner.arch}}-py${{env.PYTHON_VERSION}}-${{env.TODAY}}-${{hashFiles(env.CONDA_ENV_FILE)}}-benchmark"
# add "build" because of https://github.com/airspeed-velocity/asv/issues/1385
Expand Down
2 changes: 1 addition & 1 deletion asv_bench/asv.conf.json
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@
// },
"matrix": {
"setuptools_scm": [""], // GH6609
"numpy": [""],
"numpy": ["2.2"],
"pandas": [""],
"netcdf4": [""],
"scipy": [""],
Expand Down
28 changes: 28 additions & 0 deletions asv_bench/benchmarks/repr.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,3 +57,31 @@ def time_repr(self):

def time_repr_html(self):
self.da._repr_html_()


class ReprPandasRangeIndex:
# display a memory-saving pandas.RangeIndex shouldn't trigger memory
# expensive conversion into a numpy array
def setup(self):
index = xr.indexes.PandasIndex(pd.RangeIndex(1_000_000), "x")
self.ds = xr.Dataset(coords=xr.Coordinates.from_xindex(index))

def time_repr(self):
repr(self.ds.x)

def time_repr_html(self):
self.ds.x._repr_html_()


class ReprXarrayRangeIndex:
# display an Xarray RangeIndex shouldn't trigger memory expensive conversion
# of its lazy coordinate into a numpy array
def setup(self):
index = xr.indexes.RangeIndex.arange(1_000_000, dim="x")
self.ds = xr.Dataset(coords=xr.Coordinates.from_xindex(index))

def time_repr(self):
repr(self.ds.x)

def time_repr_html(self):
self.ds.x._repr_html_()
23 changes: 23 additions & 0 deletions ci/requirements/environment-benchmark.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
name: xarray-benchmark
channels:
- conda-forge
- nodefaults
dependencies:
- bottleneck
- cftime
- dask-core
- distributed
- flox
- netcdf4
- numba
- numbagg
- numexpr
- numpy>=2.2,<2.3 # https://github.com/numba/numba/issues/10105
- opt_einsum
- packaging
- pandas
- pyarrow # pandas raises a deprecation warning without this, breaking doctests
- sparse
- scipy
- toolz
- zarr
7 changes: 7 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,13 @@ Documentation
Internal Changes
~~~~~~~~~~~~~~~~

- Refactored the ``PandasIndexingAdapter`` and
``CoordinateTransformIndexingAdapter`` internal indexing classes. Coordinate
variables that wrap a :py:class:`pandas.RangeIndex`, a
:py:class:`pandas.MultiIndex` or a
:py:class:`xarray.indexes.CoordinateTransform` are now displayed as lazy variables
in the Xarray data reprs (:pull:`10355`).
By `Benoit Bovy <https://github.com/benbovy>`_.

.. _whats-new.2025.07.0:

Expand Down
11 changes: 10 additions & 1 deletion xarray/core/formatting.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,11 @@
from xarray.core.datatree_render import RenderDataTree
from xarray.core.duck_array_ops import array_all, array_any, array_equiv, astype, ravel
from xarray.core.extension_array import PandasExtensionArray
from xarray.core.indexing import MemoryCachedArray
from xarray.core.indexing import (
BasicIndexer,
ExplicitlyIndexed,
MemoryCachedArray,
)
from xarray.core.options import OPTIONS, _get_boolean_with_default
from xarray.core.treenode import group_subtrees
from xarray.core.utils import is_duck_array
Expand Down Expand Up @@ -87,6 +91,8 @@ def first_n_items(array, n_desired):

if n_desired < array.size:
indexer = _get_indexer_at_least_n_items(array.shape, n_desired, from_end=False)
if isinstance(array, ExplicitlyIndexed):
indexer = BasicIndexer(indexer)
array = array[indexer]

# We pass variable objects in to handle indexing
Expand All @@ -111,6 +117,8 @@ def last_n_items(array, n_desired):

if n_desired < array.size:
indexer = _get_indexer_at_least_n_items(array.shape, n_desired, from_end=True)
if isinstance(array, ExplicitlyIndexed):
indexer = BasicIndexer(indexer)
array = array[indexer]

# We pass variable objects in to handle indexing
Expand Down Expand Up @@ -659,6 +667,7 @@ def short_array_repr(array):
def short_data_repr(array):
"""Format "data" for DataArray and Variable."""
internal_data = getattr(array, "variable", array)._data

if isinstance(array, np.ndarray):
return short_array_repr(array)
elif is_duck_array(internal_data):
Expand Down
Loading
Loading