Description
🐛 Bug Report
With Iris >= 3.11, and the introduction of LOAD_POLICY/COMBINE_POLICY, loading some pp files from the UM fails with memory issues. Note that the files load as expected with the 'legacy' LOAD_POLICY, but fail as described with the 'default' LOAD_POLICY.
I've tried running on our compute cluster with 64GB memory and still get the same issue. I can point someone to the files in the Met Office if required.
Script to reproduce:
import glob
import iris
policy = "legacy" # or "default"
iris.LOAD_POLICY.set(policy)
files = glob.glob("*.pp")
cubes = iris.load(files)
print(cubes)
default LOAD_POLICY
$ time python read.py
...
long traceback
...
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 15.6 GiB for an array with shape (71, 6, 1920, 2560, 2) and data type float32
real 38m48.660s
user 35m16.419s
sys 3m7.086s
legacy LOAD_POLICY
$ time python read.py
0: m01s03i318 / (unknown) (pseudo_level: 5; time: 13; latitude: 1920; longitude: 2560)
1: high_type_cloud_area_fraction / (1) (time: 37; latitude: 1920; longitude: 2560)
2: land_binary_mask / (1) (latitude: 1920; longitude: 2560)
3: land_binary_mask / (1) (latitude: 1920; longitude: 2560)
4: low_type_cloud_area_fraction / (1) (time: 37; latitude: 1920; longitude: 2560)
5: mass_fraction_of_cloud_ice_in_air / (kg kg-1) (time: 37; model_level_number: 71; latitude: 1920; longitude: 2560)
6: mass_fraction_of_cloud_liquid_water_in_air / (kg kg-1) (time: 37; model_level_number: 71; latitude: 1920; longitude: 2560)
7: medium_type_cloud_area_fraction / (1) (time: 37; latitude: 1920; longitude: 2560)
8: surface_altitude / (m) (time: 37; latitude: 1920; longitude: 2560)
9: surface_altitude / (m) (time: 37; latitude: 1920; longitude: 2560)
10: surface_snow_amount / (kg m-2) (time: 37; latitude: 1920; longitude: 2560)
11: x_wind / (m s-1) (time: 37; latitude: 1921; longitude: 2560)
12: y_wind / (m s-1) (time: 37; latitude: 1921; longitude: 2560)
real 12m57.740s
user 9m32.780s
sys 2m51.927s
How To Reproduce
Steps to reproduce the behaviour:
- Some UM pp files downloaded from the MASS archive
$ du -ch *pp | grep total
8.0G total
- Read the files
import glob
import iris
files = glob.glob("*.pp")
cubes = iris.load(files)
print(cubes)
- Script is killed with memory issues
$ time python read.py
...
long traceback
...
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 15.6 GiB for an array with shape (71, 6, 1920, 2560, 2) and data type float32
real 38m48.660s
user 35m16.419s
sys 3m7.086s
Expected behaviour
These are pretty standard pp files coming out of the UM global model forecast, so I would expect that the files could be loaded successfully with the default LOAD_POLICY/COMBINE_POLICY.
Environment
- OS & Version: RHEL7
- Iris Version: >=3.11
Metadata
Metadata
Assignees
Type
Projects
Status
Status
Activity
pp-mo commentedon Apr 14, 2025
Status update 2025-04-14
I have recently investigated this, and found that the problem relates to multiple orography fields,
It is "cured" by adopting
iris.LOAD_POLICY.set("legacy")
(as you might expect -- i.e. this gives results equivalent to Iris 3.10).Unfortunately I haven't (yet) managed to produce a standalone testcase, but am loading a single 500Mb file
(for reference,
prodm_op_gl-mn_20250327_06_012.pp
)to reproduce the problem.
This (slightly) reduced testcase has 3 timesteps, and contains an orography field for each timestep, i.e. each with a different timepoint, though in fact the payload content is (unsurprisingly) identical.
In these cases, the "old" Iris simply ignoring the varying orography, producing 3 cubes like this :
In these legacy loads, the main data cubes mya have a time dimension, but they never have a time-dependent orography.
But the separate orography (i.e. "surface_altitude") cubes are time-dependent, if that appears in the input.
(
Aside :
I'm surprised that it is not issuing a "multiple references" warning for this in all cases.
... /iris/fileformats/rules.py:44: IrisUserWarning: Multiple reference cubes for orography
I'm seeing that it does do that if I load the whole original set of files, but not for my reduced single-file case.
There's a distinction here that I don't understand.
)
Summary
LOAD_POLICY.set("legacy")
cures the problem in these cases,trexfeathers commentedon May 13, 2025
@pp-mo has explained to me an opportunity for improving chunk awareness in hybrid height coordinates. Regardless of the problems with
prodm_op_gl-mn_20250327_06_012.pp
, this may become necessary when we start seeing more time-dependent hybrid heights due to over-large chunk sizes.My loose understanding
_optimum_chunksize
:iris/lib/iris/_lazy_data.py
Line 216 in 1285efb
iris/lib/iris/aux_factory.py
Lines 486 to 492 in 1285efb
pp-mo commentedon Jun 4, 2025
WIP Status update 2025-06-04
Fix prototype is currently up on branch "concatmem_prob" : see changes here
NOTE : although this problem was seen as potentially due to the change in loading policy introducing a "concatenate" step, in practice I don't now think this is the key difference : It is just that the chunks got bigger since the derived coordinate now varies on the time dimension as well - i.e. it is the time-varying nature that matters.
Meantime, I think that the one-chunk nature of the derived coord has always been there, but I think it only just got bad enough to notice.
ASIDE : there is a lesson here about dask chunking in general.
E.G. we have y * x where they broadcast together as (1, N) and (M, 1) and are chunked as (1, N) and (M, 1) -- both quite small individually (e.g. dim coords, single chunks ).
But the result can then end up with chunks of (M, N), i.e. a size of M*N, which may be rather huge.
So we need to rechunk to get, say, (Q, N) for manageable size : and this will requires rechunking x from (M, 1) to (Q, 1).