Skip to content

Severe performance degradation with bundled netcdf library in wheel packages (Linux) #1393

Open
@xavierabellan

Description

@xavierabellan

We have noticed a very significant degradation in read performance from version 1.7 of the netcdf4-python, at least reading certain files. The culprit seems to be the bundled netcdf library in the python wheel.

As a reproducer, the International Best Track Archive for Climate Stewardship (IBTrACS) data can be used:

wget https://www.ncei.noaa.gov/data/international-best-track-archive-for-climate-stewardship-ibtracs/v04r01/access/netcdf/IBTrACS.ALL.v04r01.nc

Here are two minimal examples:

$ python3 -m timeit -s "from netCDF4 import Dataset; nc = Dataset('IBTrACS.ALL.v04r01.nc', 'r')"  "lat = nc.variables['lat'][-1]"
5 loops, best of 5: 59.8 msec per loop
$ LD_PRELOAD=/path/to/system/libnetcdf.so python3 -m timeit -s "from netCDF4 import Dataset; nc = Dataset('IBTrACS.ALL.v04r01.nc', 'r')"  "lat = nc.variables['lat'][-1]"
1000 loops, best of 5: 182 usec per loop

I could reproduce the above with versions 1.7.1 and 1.7.2 (latest as of today) of netcdf4 binary wheels on Linux for python 3.11, 3.12 and 3.13. Linux wheels for version 1.6.5 or below seem to be unaffected, as well as the wheels for Mac arm64. I did not test on additional platforms.

When installing from source against an existing netcdf library or using the conda packages, performance is also at the expected level with no degradation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions