Description
We have noticed a very significant degradation in read performance from version 1.7 of the netcdf4-python, at least reading certain files. The culprit seems to be the bundled netcdf library in the python wheel.
As a reproducer, the International Best Track Archive for Climate Stewardship (IBTrACS) data can be used:
wget https://www.ncei.noaa.gov/data/international-best-track-archive-for-climate-stewardship-ibtracs/v04r01/access/netcdf/IBTrACS.ALL.v04r01.nc
Here are two minimal examples:
$ python3 -m timeit -s "from netCDF4 import Dataset; nc = Dataset('IBTrACS.ALL.v04r01.nc', 'r')" "lat = nc.variables['lat'][-1]"
5 loops, best of 5: 59.8 msec per loop
$ LD_PRELOAD=/path/to/system/libnetcdf.so python3 -m timeit -s "from netCDF4 import Dataset; nc = Dataset('IBTrACS.ALL.v04r01.nc', 'r')" "lat = nc.variables['lat'][-1]"
1000 loops, best of 5: 182 usec per loop
I could reproduce the above with versions 1.7.1 and 1.7.2 (latest as of today) of netcdf4 binary wheels on Linux for python 3.11, 3.12 and 3.13. Linux wheels for version 1.6.5 or below seem to be unaffected, as well as the wheels for Mac arm64. I did not test on additional platforms.
When installing from source against an existing netcdf library or using the conda packages, performance is also at the expected level with no degradation.