Skip to content

memory bug with Dataset starting in version 2025.3.0 in combination with dask #10350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 5 tasks
Krenciszek opened this issue May 23, 2025 · 2 comments
Open
2 of 5 tasks
Labels
plan to close May be closeable, needs more eyeballs

Comments

@Krenciszek
Copy link

What happened?

Hi,
I think I found a memory bug that happens when using xarray from version 2025.3.0 when also dask in any version is present. The memory of the very fist Dataset created is never released. For all later created Datasets it works and a workaround for me is in fact to initialize a small Dataset at the beginning of the code.

What did you expect to happen?

a deleted Dataset should release the memory as in xarray version 2025.1.2 or older.

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np

# starting with defining a tiny Dataset would mitigate the problem, as only the very first Dataset is never released from memory
# xr.Dataset({}, coords={ "a": [1]})

def dummy(n):
    ds = xr.Dataset(        {
            "A": (["x", "y"], np.random.randn(n, n))
        },
        coords={
            "x": range(n),
            "y": range(n),
        },
        )


dummy(25000)
input("Check your memory usage now... ~4GB is not released")

# Dockerfile to reproduce
# FROM python:3.11.5-slim-bullseye
# RUN pip install xarray==2025.1.2 dask==2025.5.1
# Using xarray==2025.1.2 or lower shows correct behavior

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

using docker stats to monitor memory usage (while waiting for user prompt):
MEM USAGE 4.728GiB

When using xarray version 2025.1.2:
MEM USAGE: 82.77MiB

Anything else we need to know?

initializing a tiny Datasets at the top of the code mitigates the problem
xr.Dataset({}, coords={ "a": [1]})

funnily, even calling xr.show_versions() does....

Feels like the very first call to Dataset leaves a reference somewhere, so it is not picked up by the garbage collector.

Might be related to: #9807
But here we have a much simpler minimal example.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.5 (main, Sep 20 2023, 11:03:59) [GCC 10.2.1 20210110] python-bits: 64 OS: Linux OS-release: 5.15.167.4-microsoft-standard-WSL2 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None

xarray: 2025.4.0
pandas: 2.2.3
numpy: 2.2.6
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: None
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: 2025.5.1
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2025.5.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.5.1
pip: 23.2.1
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None

@Krenciszek Krenciszek added bug needs triage Issue that has not been reviewed by xarray team member labels May 23, 2025
@dcherian
Copy link
Contributor

This is probably just that the garbage collector hasn't run. Please use an explicit import gc; gc.collect() to verify that memory isn't leaked

@dcherian dcherian added plan to close May be closeable, needs more eyeballs and removed bug needs triage Issue that has not been reviewed by xarray team member labels May 28, 2025
@Krenciszek
Copy link
Author

I tested that gc.collect() has no effect here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to close May be closeable, needs more eyeballs
Projects
None yet
Development

No branches or pull requests

2 participants