Merge remote-tracking branch 'upstream/main' into dark_theme_ready

* upstream/main: Updated environment lockfiles (SciTools#5332) Support netcdf variable emulation (SciTools#5212) Support netCDF load+save on dataset-like objects as well as filepaths. (SciTools#5214) minor refinement to release do-nothing script (SciTools#5326) update bibtex citation for v3.6.0 release (SciTools#5324) Whats new updates for v3.6.0 (SciTools#5323) Added whatsnew notes on netcdf delayed saving and Distributed support. (SciTools#5322) Updated environment lockfiles (SciTools#5320) Bugfix 4566 (SciTools#4569) Simpler/faster data aggregation code in `aggregated_by` (SciTools#4970)
tkknight · May 21, 2023 · d6fc3e1 · d6fc3e1
2 parents 058a91e + 95cdea3
commit d6fc3e1
Show file tree

Hide file tree

Showing 27 changed files with 880 additions and 412 deletions.
diff --git a/docs/src/userguide/citation.rst b/docs/src/userguide/citation.rst
@@ -15,12 +15,12 @@ For example::
 
  @manual{Iris,
  author = {{Met Office}},
- title = {Iris: A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data },
- edition = {v3.5},
+ title = {Iris: A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data},
+ edition = {v3.6},
  year = {2010 - 2023},
- address = {Exeter, Devon },
+ address = {Exeter, Devon},
  url = {http://scitools.org.uk/},
- doi = {10.5281/zenodo.7871017}
+ doi = {10.5281/zenodo.7948293}
  }
 
 

diff --git a/docs/src/whatsnew/3.6.rst b/docs/src/whatsnew/3.6.rst
@@ -1,35 +1,58 @@
 .. include:: ../common_links.inc
 
-v3.6 (03 May 2023) [release candidate]
-**************************************
+v3.6 (18 May 2023)
+******************
 
 This document explains the changes made to Iris for this release
 (:doc:`View all changes <index>`.)
 
 
 .. dropdown:: v3.6 Release Highlights
-   :color: primary
-   :icon: info
-   :animate: fade-in
-   :open:
-
-   We're so excited about our recent support for **delayed saving of lazy data
-   to netCDF** (:pull:`5191`) that we're celebrating this important step change
-   in behavour with its very own dedicated release 🥳
-
-   We're super keen for the community to leverage the benefit of this new
-   feature within Iris that we've brought this release forward several months.
-   As a result, this minor release of Iris is intentionally light in content.
-   However, there are some other goodies available for you to enjoy, such as:
-
-   * Performing lazy arithmetic with an Iris :class:`~iris.cube.Cube` and a
-     :class:`dask.array.Array`, and
-   * Various improvements to our documentation resulting from adoption of
-     `sphinx-design`_ and `sphinx-apidoc`_.
-
-   As always, get in touch with us on :issue:`GitHub<new/choose>`, particularly
-   if you have any feedback with regards to delayed saving, or have any issues
-   or feature requests for improving Iris. Enjoy!
+    :color: primary
+    :icon: info
+    :animate: fade-in
+    :open:
+
+    We're so excited about our recent support for **delayed saving of lazy data
+    to netCDF** (:pull:`5191`) that we're celebrating this important step change
+    in behavour with its very own dedicated release 🥳
+
+    By using ``iris.save(..., compute=False)`` you can now save to multiple NetCDF files
+    in parallel.  See the new ``compute`` keyword in :func:`iris.fileformats.netcdf.save`.
+    This can share and re-use any common (lazy) result computations, and it makes much
+    better use of resources during any file-system waiting (i.e., it can use such periods
+    to progress the *other* saves).
+
+    Usage example::
+
+       # Create output files with delayed data saving.
+       delayeds = [
+           iris.save(cubes, filepath, compute=False)
+           for cubes, filepath in zip(output_cubesets, output_filepaths)
+       ]
+       # Complete saves in parallel.
+       dask.compute(*delayeds)
+
+    This advance also includes **another substantial benefit**, because NetCDF saves can
+    now use a
+    `Dask.distributed scheduler <https://docs.dask.org/en/stable/scheduling.html>`_.
+    With `Distributed <https://distributed.dask.org/en/stable/>`_ you can parallelise the
+    saves across a whole cluster.  Whereas previously, the NetCDF saving *only* worked with
+    a "threaded" scheduler, limiting it to a single CPU.
+
+    We're so super keen for the community to leverage the benefit of this new
+    feature within Iris that we've brought this release forward several months.
+    As a result, this minor release of Iris is intentionally light in content.
+    However, there are some other goodies available for you to enjoy, such as:
+
+    * Performing lazy arithmetic with an Iris :class:`~iris.cube.Cube` and a
+      :class:`dask.array.Array`, and
+    * Various improvements to our documentation resulting from adoption of
+      `sphinx-design`_ and `sphinx-apidoc`_.
+
+    As always, get in touch with us on :issue:`GitHub<new/choose>`, particularly
+    if you have any feedback with regards to delayed saving, or have any issues
+    or feature requests for improving Iris. Enjoy!
 
 
 📢 Announcements

diff --git a/docs/src/whatsnew/latest.rst b/docs/src/whatsnew/latest.rst
@@ -48,8 +48,8 @@ This document explains the changes made to Iris for this release
 🚀 Performance Enhancements
 ===========================
 
-#. N/A
-
+#. `@rcomer`_ made :meth:`~iris.cube.Cube.aggregated_by` faster. (:pull:`4970`)
+#. `@rsdavies`_ modified the CF compliant standard name for m01s00i023 :issue:`4566`
 
 🔥 Deprecations
 ===============
@@ -72,13 +72,16 @@ This document explains the changes made to Iris for this release
 💼 Internal
 ===========
 
-#. N/A
+#. `@pp-mo`_ supported loading and saving netcdf :class:`netCDF4.Dataset` compatible
+   objects in place of file-paths, as hooks for a forthcoming
+   `"Xarray bridge" <https://github.com/SciTools/iris/issues/4994>`_ facility.
+   (:pull:`5214`)
 
 
 .. comment
     Whatsnew author names (@github name) in alphabetical order. Note that,
     core dev names are automatically included by the common_links.inc:
-
+.. _@rsdavies: https://github.com/rsdavies
 
 
 

diff --git a/lib/iris/__init__.py b/lib/iris/__init__.py
@@ -89,12 +89,12 @@ def callback(cube, field, filename):
 
 """
 
+from collections.abc import Iterable
 import contextlib
 import glob
 import importlib
 import itertools
 import os.path
-import pathlib
 import threading
 
 import iris._constraints
@@ -256,7 +256,8 @@ def context(self, **kwargs):
 
 def _generate_cubes(uris, callback, constraints):
     """Returns a generator of cubes given the URIs and a callback."""
-    if isinstance(uris, (str, pathlib.PurePath)):
+    if isinstance(uris, str) or not isinstance(uris, Iterable):
+        # Make a string, or other single item, into an iterable.
         uris = [uris]
 
     # Group collections of uris by their iris handler
@@ -273,6 +274,10 @@ def _generate_cubes(uris, callback, constraints):
             urls = [":".join(x) for x in groups]
             for cube in iris.io.load_http(urls, callback):
                 yield cube
+        elif scheme == "data":
+            data_objects = [x[1] for x in groups]
+            for cube in iris.io.load_data_objects(data_objects, callback):
+                yield cube
         else:
             raise ValueError("Iris cannot handle the URI scheme: %s" % scheme)
 

diff --git a/lib/iris/cube.py b/lib/iris/cube.py
@@ -4178,98 +4178,65 @@ def aggregated_by(
         data_shape = list(self.shape + aggregator.aggregate_shape(**kwargs))
         data_shape[dimension_to_groupby] = len(groupby)
 
-        # Aggregate the group-by data.
+        # Choose appropriate data and functions for data aggregation.
         if aggregator.lazy_func is not None and self.has_lazy_data():
-            front_slice = (slice(None, None),) * dimension_to_groupby
-            back_slice = (slice(None, None),) * (
-                len(data_shape) - dimension_to_groupby - 1
-            )
+            stack = da.stack
+            input_data = self.lazy_data()
+            agg_method = aggregator.lazy_aggregate
+        else:
+            input_data = self.data
+            # Note numpy.stack does not preserve masks.
+            stack = ma.stack if ma.isMaskedArray(input_data) else np.stack
+            agg_method = aggregator.aggregate
+
+        # Create data and weights slices.
+        front_slice = (slice(None),) * dimension_to_groupby
+        back_slice = (slice(None),) * (
+            len(data_shape) - dimension_to_groupby - 1
+        )
 
-            # Create cube and weights slices
-            groupby_subcubes = map(
-                lambda groupby_slice: self[
+        groupby_subarrs = map(
+            lambda groupby_slice: iris.util._slice_data_with_keys(
+                input_data, front_slice + (groupby_slice,) + back_slice
+            )[1],
+            groupby.group(),
+        )
+
+        if weights is not None:
+            groupby_subweights = map(
+                lambda groupby_slice: weights[
                     front_slice + (groupby_slice,) + back_slice
-                ].lazy_data(),
+                ],
                 groupby.group(),
             )
-            if weights is not None:
-                groupby_subweights = map(
-                    lambda groupby_slice: weights[
-                        front_slice + (groupby_slice,) + back_slice
-                    ],
-                    groupby.group(),
-                )
-            else:
-                groupby_subweights = (None for _ in range(len(groupby)))
+        else:
+            groupby_subweights = (None for _ in range(len(groupby)))
 
-            agg = iris.analysis.create_weighted_aggregator_fn(
-                aggregator.lazy_aggregate, axis=dimension_to_groupby, **kwargs
+        # Aggregate data slices.
+        agg = iris.analysis.create_weighted_aggregator_fn(
+            agg_method, axis=dimension_to_groupby, **kwargs
+        )
+        result = list(map(agg, groupby_subarrs, groupby_subweights))
+
+        # If weights are returned, "result" is a list of tuples (each tuple
+        # contains two elements; the first is the aggregated data, the
+        # second is the aggregated weights). Convert these to two lists
+        # (one for the aggregated data and one for the aggregated weights)
+        # before combining the different slices.
+        if return_weights:
+            result, weights_result = list(zip(*result))
+            aggregateby_weights = stack(
+                weights_result, axis=dimension_to_groupby
             )
-            result = list(map(agg, groupby_subcubes, groupby_subweights))
-
-            # If weights are returned, "result" is a list of tuples (each tuple
-            # contains two elements; the first is the aggregated data, the
-            # second is the aggregated weights). Convert these to two lists
-            # (one for the aggregated data and one for the aggregated weights)
-            # before combining the different slices.
-            if return_weights:
-                result, weights_result = list(zip(*result))
-                aggregateby_weights = da.stack(
-                    weights_result, axis=dimension_to_groupby
-                )
-            else:
-                aggregateby_weights = None
-            aggregateby_data = da.stack(result, axis=dimension_to_groupby)
         else:
-            cube_slice = [slice(None, None)] * len(data_shape)
-            for i, groupby_slice in enumerate(groupby.group()):
-                # Slice the cube with the group-by slice to create a group-by
-                # sub-cube.
-                cube_slice[dimension_to_groupby] = groupby_slice
-                groupby_sub_cube = self[tuple(cube_slice)]
-
-                # Slice the weights
-                if weights is not None:
-                    groupby_sub_weights = weights[tuple(cube_slice)]
-                    kwargs["weights"] = groupby_sub_weights
-
-                # Perform the aggregation over the group-by sub-cube and
-                # repatriate the aggregated data into the aggregate-by cube
-                # data. If weights are also returned, handle them separately.
-                result = aggregator.aggregate(
-                    groupby_sub_cube.data, axis=dimension_to_groupby, **kwargs
-                )
-                if return_weights:
-                    weights_result = result[1]
-                    result = result[0]
-                else:
-                    weights_result = None
-
-                # Determine aggregation result data type for the aggregate-by
-                # cube data on first pass.
-                if i == 0:
-                    if ma.isMaskedArray(self.data):
-                        aggregateby_data = ma.zeros(
-                            data_shape, dtype=result.dtype
-                        )
-                    else:
-                        aggregateby_data = np.zeros(
-                            data_shape, dtype=result.dtype
-                        )
-                    if weights_result is not None:
-                        aggregateby_weights = np.zeros(
-                            data_shape, dtype=weights_result.dtype
-                        )
-                    else:
-                        aggregateby_weights = None
-                cube_slice[dimension_to_groupby] = i
-                aggregateby_data[tuple(cube_slice)] = result
-                if weights_result is not None:
-                    aggregateby_weights[tuple(cube_slice)] = weights_result
-
-            # Restore original weights.
-            if weights is not None:
-                kwargs["weights"] = weights
+            aggregateby_weights = None
+
+        aggregateby_data = stack(result, axis=dimension_to_groupby)
+        # Ensure plain ndarray is output if plain ndarray was input.
+        if ma.isMaskedArray(aggregateby_data) and not ma.isMaskedArray(
+            input_data
+        ):
+            aggregateby_data = ma.getdata(aggregateby_data)
 
         # Add the aggregation meta data to the aggregate-by cube.
         aggregator.update_metadata(

diff --git a/lib/iris/fileformats/__init__.py b/lib/iris/fileformats/__init__.py
@@ -9,6 +9,7 @@
 """
 
 from iris.io.format_picker import (
+    DataSourceObjectProtocol,
     FileExtension,
     FormatAgent,
     FormatSpecification,
@@ -125,16 +126,34 @@ def _load_grib(*args, **kwargs):
 )
 
 
-_nc_dap = FormatSpecification(
-    "NetCDF OPeNDAP",
-    UriProtocol(),
-    lambda protocol: protocol in ["http", "https"],
-    netcdf.load_cubes,
-    priority=6,
-    constraint_aware_handler=True,
+FORMAT_AGENT.add_spec(
+    FormatSpecification(
+        "NetCDF OPeNDAP",
+        UriProtocol(),
+        lambda protocol: protocol in ["http", "https"],
+        netcdf.load_cubes,
+        priority=6,
+        constraint_aware_handler=True,
+    )
+)
+
+# NetCDF file presented as an open, readable netCDF4 dataset (or mimic).
+FORMAT_AGENT.add_spec(
+    FormatSpecification(
+        "NetCDF dataset",
+        DataSourceObjectProtocol(),
+        lambda object: all(
+            hasattr(object, x)
+            for x in ("variables", "dimensions", "groups", "ncattrs")
+        ),
+        # Note: this uses the same call as the above "NetCDF_v4" (and "NetCDF OPeNDAP")
+        # The handler itself needs to detect what is passed + handle it appropriately.
+        netcdf.load_cubes,
+        priority=4,
+        constraint_aware_handler=True,
+    )
 )
-FORMAT_AGENT.add_spec(_nc_dap)
-del _nc_dap
+
 
 #
 # UM Fieldsfiles.