Description
What happened?
When holding a reference to an open, lazily-loaded Dataset ds
, overwriting the file that backs ds
causes ds
to reference invalid memory.
What's weird is that this happens when using "coords", and doesn't happen without them.
I understand this access pattern is a bit weird, but I'm raising this bug in case this isn't an expected behaviour.
What did you expect to happen?
I expected ds
to continue existing in memory, with its original values. Failing that, I expected a warning or error, either when the file backing ds
was overwritten or, more realistically, when I went to access ds
after it had been overwritten and was no longer referencing valid memory.
Minimal Complete Verifiable Example
import xarray as xr
# Test without coordinates (works as expected)
print("Testing WITHOUT coordinates:")
xr.Dataset({"x": [1, 2], "y": [3, 4]}).to_netcdf("test1.nc")
lazy1 = xr.open_dataset("test1.nc")
# overwrite the file
xr.Dataset({"z": [5, 6]}).to_netcdf("test1.nc")
# lazy1 still holds valid data
print(f"Expected y: [3, 4], Actual: {lazy1['y'].values}")
# Test with coordinates (invalid memory access)
print("\nTesting WITH coordinates:")
xr.Dataset({"x": ("dim", [1, 2]), "y": ("dim", [3, 4])}, coords={"dim": [0, 1]}).to_netcdf("test2.nc")
lazy2 = xr.open_dataset("test2.nc")
# overwrite the file
xr.Dataset({"z": ("dim", [5, 6])}, coords={"dim": [0, 1]}).to_netcdf("test2.nc")
# Accesses invalid memory
print(f"Expected y: [3, 4], Actual: {lazy2['y'].values}")
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
Anything else we need to know?
No response
Environment
xarray: 2025.6.1
pandas: 2.3.0
numpy: 2.2.6
scipy: 1.15.3
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: None
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.10.3
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: None
pip: None
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None