Skip to content

dividing with unloaded data causes dimension to change order #10338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
5 tasks done
Pietervanhalem opened this issue May 20, 2025 · 1 comment
Open
5 tasks done

dividing with unloaded data causes dimension to change order #10338

Pietervanhalem opened this issue May 20, 2025 · 1 comment
Labels

Comments

@Pietervanhalem
Copy link

Pietervanhalem commented May 20, 2025

What happened?

When I open a dataset without loading it and perform opperations with it. The data-array gets corrupted. The dimensions seem to be in a different order then the coordinates. Therefore you cannot use the data-array anymore. If I load the dataset after opening it I dont have the issue anymore.

What did you expect to happen?

I expect the data-array to keep the correct references to the correct coordinates when doing operations with it. I expect the same to happen as when I do load the data.

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np

coords = {
    "location": ["a", "b", "c"],
    "duration": [0.3, 0.25, 0.5, 1.0, 3.0],
    "dof": ["x", "y", "z", "rx", "ry", "rz"],
    "motion": ["dis", "vel"],
    "wave_tp": np.arange(3, 19, 1),
    "wave_dir": np.arange(0, 361, 15),
}
ds = xr.Dataset(
    {
        "X": (list(coords.keys()), np.random.rand(*[len(e) for e in coords.values()])),
    },
    coords=coords,
)

with open("tmp.nc", "wb") as fp:
    ds.to_netcdf(fp)

with open("tmp.nc", "rb") as fp:
    # If I perform a .load() here, the bug disappears
    ds = xr.open_dataset(fp) #.load()

a = ds["X"].sel(
    wave_dir=np.arange(0, 360, 30),
    dof="z",
    motion="vel",
)
b = 1 / a

# Here you can see that the dataset has the wrong coordinates. 
# It says location has 12 values, but it should have 3.
display(b)

b.sel(location='a')

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

{
	"name": "ValueError",
	"message": "conflicting sizes for dimension 'location': length 12 on <this-array> and length 3 on {'wave_dir': 'wave_dir', 'wave_tp': 'wave_tp', 'duration': 'duration', 'location': 'location'}",
	"stack": "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[1;31mValueError\u001b[0m                                Traceback (most recent call last)\nCell \u001b[1;32mIn[1], line 37\u001b[0m\n\u001b[0;32m     33\u001b[0m \u001b[38;5;66;03m# Here you can see that the dataset has the wrong coordinates. \u001b[39;00m\n\u001b[0;32m     34\u001b[0m \u001b[38;5;66;03m# It says location has 12 values, but it should have 3.\u001b[39;00m\n\u001b[0;32m     35\u001b[0m display(b)\n\u001b[1;32m---> 37\u001b[0m \u001b[43mb\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43msel\u001b[49m\u001b[43m(\u001b[49m\u001b[43mlocation\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43ma\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m)\u001b[49m\n\nFile \u001b[1;32mc:\\tools\\python312\\Lib\\site-packages\\xarray\\core\\dataarray.py:1683\u001b[0m, in \u001b[0;36mDataArray.sel\u001b[1;34m(self, indexers, method, tolerance, drop, **indexers_kwargs)\u001b[0m\n\u001b[0;32m   1567\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21msel\u001b[39m(\n\u001b[0;32m   1568\u001b[0m     \u001b[38;5;28mself\u001b[39m,\n\u001b[0;32m   1569\u001b[0m     indexers: Mapping[Any, Any] \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m   (...)\u001b[0m\n\u001b[0;32m   1573\u001b[0m     \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mindexers_kwargs: Any,\n\u001b[0;32m   1574\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Self:\n\u001b[0;32m   1575\u001b[0m \u001b[38;5;250m    \u001b[39m\u001b[38;5;124;03m\"\"\"Return a new DataArray whose data is given by selecting index\u001b[39;00m\n\u001b[0;32m   1576\u001b[0m \u001b[38;5;124;03m    labels along the specified dimension(s).\u001b[39;00m\n\u001b[0;32m   1577\u001b[0m \n\u001b[1;32m   (...)\u001b[0m\n\u001b[0;32m   1681\u001b[0m \u001b[38;5;124;03m    Dimensions without coordinates: points\u001b[39;00m\n\u001b[0;32m   1682\u001b[0m \u001b[38;5;124;03m    \"\"\"\u001b[39;00m\n\u001b[1;32m-> 1683\u001b[0m     ds \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_to_temp_dataset\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241m.\u001b[39msel(\n\u001b[0;32m   1684\u001b[0m         indexers\u001b[38;5;241m=\u001b[39mindexers,\n\u001b[0;32m   1685\u001b[0m         drop\u001b[38;5;241m=\u001b[39mdrop,\n\u001b[0;32m   1686\u001b[0m         method\u001b[38;5;241m=\u001b[39mmethod,\n\u001b[0;32m   1687\u001b[0m         tolerance\u001b[38;5;241m=\u001b[39mtolerance,\n\u001b[0;32m   1688\u001b[0m         \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mindexers_kwargs,\n\u001b[0;32m   1689\u001b[0m     )\n\u001b[0;32m   1690\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_from_temp_dataset(ds)\n\nFile \u001b[1;32mc:\\tools\\python312\\Lib\\site-packages\\xarray\\core\\dataarray.py:598\u001b[0m, in \u001b[0;36mDataArray._to_temp_dataset\u001b[1;34m(self)\u001b[0m\n\u001b[0;32m    597\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_to_temp_dataset\u001b[39m(\u001b[38;5;28mself\u001b[39m) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Dataset:\n\u001b[1;32m--> 598\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_to_dataset_whole\u001b[49m\u001b[43m(\u001b[49m\u001b[43mname\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m_THIS_ARRAY\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mshallow_copy\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m)\u001b[49m\n\nFile \u001b[1;32mc:\\tools\\python312\\Lib\\site-packages\\xarray\\core\\dataarray.py:665\u001b[0m, in \u001b[0;36mDataArray._to_dataset_whole\u001b[1;34m(self, name, shallow_copy)\u001b[0m\n\u001b[0;32m    662\u001b[0m indexes \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_indexes\n\u001b[0;32m    664\u001b[0m coord_names \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mset\u001b[39m(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_coords)\n\u001b[1;32m--> 665\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mDataset\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_construct_direct\u001b[49m\u001b[43m(\u001b[49m\u001b[43mvariables\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcoord_names\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mindexes\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mindexes\u001b[49m\u001b[43m)\u001b[49m\n\nFile \u001b[1;32mc:\\tools\\python312\\Lib\\site-packages\\xarray\\core\\dataset.py:1133\u001b[0m, in \u001b[0;36mDataset._construct_direct\u001b[1;34m(cls, variables, coord_names, dims, attrs, indexes, encoding, close)\u001b[0m\n\u001b[0;32m   1129\u001b[0m \u001b[38;5;250m\u001b[39m\u001b[38;5;124;03m\"\"\"Shortcut around __init__ for internal use when we want to skip\u001b[39;00m\n\u001b[0;32m   1130\u001b[0m \u001b[38;5;124;03mcostly validation\u001b[39;00m\n\u001b[0;32m   1131\u001b[0m \u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[0;32m   1132\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m dims \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[1;32m-> 1133\u001b[0m     dims \u001b[38;5;241m=\u001b[39m \u001b[43mcalculate_dimensions\u001b[49m\u001b[43m(\u001b[49m\u001b[43mvariables\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m   1134\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m indexes \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m   1135\u001b[0m     indexes \u001b[38;5;241m=\u001b[39m {}\n\nFile \u001b[1;32mc:\\tools\\python312\\Lib\\site-packages\\xarray\\core\\variable.py:3072\u001b[0m, in \u001b[0;36mcalculate_dimensions\u001b[1;34m(variables)\u001b[0m\n\u001b[0;32m   3070\u001b[0m             last_used[dim] \u001b[38;5;241m=\u001b[39m k\n\u001b[0;32m   3071\u001b[0m         \u001b[38;5;28;01melif\u001b[39;00m dims[dim] \u001b[38;5;241m!=\u001b[39m size:\n\u001b[1;32m-> 3072\u001b[0m             \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\n\u001b[0;32m   3073\u001b[0m                 \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mconflicting sizes for dimension \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mdim\u001b[38;5;132;01m!r}\u001b[39;00m\u001b[38;5;124m: \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m   3074\u001b[0m                 \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mlength \u001b[39m\u001b[38;5;132;01m{\u001b[39;00msize\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m on \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mk\u001b[38;5;132;01m!r}\u001b[39;00m\u001b[38;5;124m and length \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mdims[dim]\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m on \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mlast_used\u001b[38;5;132;01m!r}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m   3075\u001b[0m             )\n\u001b[0;32m   3076\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m dims\n\n\u001b[1;31mValueError\u001b[0m: conflicting sizes for dimension 'location': length 12 on <this-array> and length 3 on {'wave_dir': 'wave_dir', 'wave_tp': 'wave_tp', 'duration': 'duration', 'location': 'location'}"
}

Anything else we need to know?

Image

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.12.7 (tags/v3.12.7:0b05ead, Oct 1 2024, 03:06:41) [MSC v.1941 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 11 machine: AMD64 processor: Intel64 Family 6 Model 186 Stepping 2, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: ('English_United Kingdom', '1252') libhdf5: None libnetcdf: None

xarray: 2025.4.0
pandas: 2.2.3
numpy: 2.2.6
scipy: 1.15.3
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: None
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: None
pip: 25.1.1
conda: None
pytest: None
mypy: None
IPython: 9.2.0
sphinx: None

@Pietervanhalem Pietervanhalem added bug needs triage Issue that has not been reviewed by xarray team member labels May 20, 2025
@keewis
Copy link
Collaborator

keewis commented Jun 9, 2025

I think this has to do with the fact that you're closing the file before xarray has a chance to read the data: try replacing

with open("tmp.nc", mode="rb") as fp:
    ds = xr.open_dataset(fp)

with

ds = xr.open_dataset("tmp.nc")

and you'll see that the bug goes away.

Given that, the bug may be either in the way we interact with scipy or somewhere within scipy.io.

@keewis keewis removed the needs triage Issue that has not been reviewed by xarray team member label Jun 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants