Description
What happened?
Hi,
I think I found a memory bug that happens when using xarray from version 2025.3.0 when also dask in any version is present. The memory of the very fist Dataset created is never released. For all later created Datasets it works and a workaround for me is in fact to initialize a small Dataset at the beginning of the code.
What did you expect to happen?
a deleted Dataset should release the memory as in xarray version 2025.1.2 or older.
Minimal Complete Verifiable Example
import xarray as xr
import numpy as np
# starting with defining a tiny Dataset would mitigate the problem, as only the very first Dataset is never released from memory
# xr.Dataset({}, coords={ "a": [1]})
def dummy(n):
ds = xr.Dataset( {
"A": (["x", "y"], np.random.randn(n, n))
},
coords={
"x": range(n),
"y": range(n),
},
)
dummy(25000)
input("Check your memory usage now... ~4GB is not released")
# Dockerfile to reproduce
# FROM python:3.11.5-slim-bullseye
# RUN pip install xarray==2025.1.2 dask==2025.5.1
# Using xarray==2025.1.2 or lower shows correct behavior
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
using docker stats to monitor memory usage (while waiting for user prompt):
MEM USAGE 4.728GiB
When using xarray version 2025.1.2:
MEM USAGE: 82.77MiB
Anything else we need to know?
initializing a tiny Datasets at the top of the code mitigates the problem
xr.Dataset({}, coords={ "a": [1]})
funnily, even calling xr.show_versions() does....
Feels like the very first call to Dataset leaves a reference somewhere, so it is not picked up by the garbage collector.
Might be related to: #9807
But here we have a much simpler minimal example.
Environment
xarray: 2025.4.0
pandas: 2.2.3
numpy: 2.2.6
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: None
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: 2025.5.1
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2025.5.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.5.1
pip: 23.2.1
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None