Skip to content

Writing to an open Dataset corrupts the Dataset's memory #10418

Open
@williamsnell

Description

@williamsnell

What happened?

When holding a reference to an open, lazily-loaded Dataset ds, overwriting the file that backs ds causes ds to reference invalid memory.

What's weird is that this happens when using "coords", and doesn't happen without them.

I understand this access pattern is a bit weird, but I'm raising this bug in case this isn't an expected behaviour.

What did you expect to happen?

I expected ds to continue existing in memory, with its original values. Failing that, I expected a warning or error, either when the file backing ds was overwritten or, more realistically, when I went to access ds after it had been overwritten and was no longer referencing valid memory.

Minimal Complete Verifiable Example

import xarray as xr

# Test without coordinates (works as expected)

print("Testing WITHOUT coordinates:")
xr.Dataset({"x": [1, 2], "y": [3, 4]}).to_netcdf("test1.nc")
lazy1 = xr.open_dataset("test1.nc")
# overwrite the file
xr.Dataset({"z": [5, 6]}).to_netcdf("test1.nc")

# lazy1 still holds valid data
print(f"Expected y: [3, 4], Actual: {lazy1['y'].values}")

# Test with coordinates (invalid memory access)
print("\nTesting WITH coordinates:")

xr.Dataset({"x": ("dim", [1, 2]), "y": ("dim", [3, 4])}, coords={"dim": [0, 1]}).to_netcdf("test2.nc")
lazy2 = xr.open_dataset("test2.nc")
# overwrite the file
xr.Dataset({"z": ("dim", [5, 6])}, coords={"dim": [0, 1]}).to_netcdf("test2.nc")

# Accesses invalid memory
print(f"Expected y: [3, 4], Actual: {lazy2['y'].values}")

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.11 (main, Mar 17 2025, 21:02:09) [Clang 20.1.0 ] python-bits: 64 OS: Linux OS-release: 6.8.0-60-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_NZ.UTF-8 LOCALE: ('en_NZ', 'UTF-8') libhdf5: None libnetcdf: None

xarray: 2025.6.1
pandas: 2.3.0
numpy: 2.2.6
scipy: 1.15.3
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: None
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.10.3
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: None
pip: None
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions