Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/main' into dark_theme_ready
Browse files Browse the repository at this point in the history
* upstream/main:
  Updated environment lockfiles (SciTools#5332)
  Support netcdf variable emulation (SciTools#5212)
  Support netCDF load+save on dataset-like objects as well as filepaths. (SciTools#5214)
  minor refinement to release do-nothing script (SciTools#5326)
  update bibtex citation for v3.6.0 release (SciTools#5324)
  Whats new updates for v3.6.0 (SciTools#5323)
  Added whatsnew notes on netcdf delayed saving and Distributed support. (SciTools#5322)
  Updated environment lockfiles (SciTools#5320)
  Bugfix 4566 (SciTools#4569)
  Simpler/faster data aggregation code in `aggregated_by` (SciTools#4970)
  • Loading branch information
tkknight committed May 21, 2023
2 parents 058a91e + 95cdea3 commit d6fc3e1
Show file tree
Hide file tree
Showing 27 changed files with 880 additions and 412 deletions.
8 changes: 4 additions & 4 deletions docs/src/userguide/citation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,12 @@ For example::

@manual{Iris,
author = {{Met Office}},
title = {Iris: A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data },
edition = {v3.5},
title = {Iris: A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data},
edition = {v3.6},
year = {2010 - 2023},
address = {Exeter, Devon },
address = {Exeter, Devon},
url = {http://scitools.org.uk/},
doi = {10.5281/zenodo.7871017}
doi = {10.5281/zenodo.7948293}
}


Expand Down
71 changes: 47 additions & 24 deletions docs/src/whatsnew/3.6.rst
Original file line number Diff line number Diff line change
@@ -1,35 +1,58 @@
.. include:: ../common_links.inc

v3.6 (03 May 2023) [release candidate]
**************************************
v3.6 (18 May 2023)
******************

This document explains the changes made to Iris for this release
(:doc:`View all changes <index>`.)


.. dropdown:: v3.6 Release Highlights
:color: primary
:icon: info
:animate: fade-in
:open:

We're so excited about our recent support for **delayed saving of lazy data
to netCDF** (:pull:`5191`) that we're celebrating this important step change
in behavour with its very own dedicated release 🥳

We're super keen for the community to leverage the benefit of this new
feature within Iris that we've brought this release forward several months.
As a result, this minor release of Iris is intentionally light in content.
However, there are some other goodies available for you to enjoy, such as:

* Performing lazy arithmetic with an Iris :class:`~iris.cube.Cube` and a
:class:`dask.array.Array`, and
* Various improvements to our documentation resulting from adoption of
`sphinx-design`_ and `sphinx-apidoc`_.

As always, get in touch with us on :issue:`GitHub<new/choose>`, particularly
if you have any feedback with regards to delayed saving, or have any issues
or feature requests for improving Iris. Enjoy!
:color: primary
:icon: info
:animate: fade-in
:open:

We're so excited about our recent support for **delayed saving of lazy data
to netCDF** (:pull:`5191`) that we're celebrating this important step change
in behavour with its very own dedicated release 🥳

By using ``iris.save(..., compute=False)`` you can now save to multiple NetCDF files
in parallel. See the new ``compute`` keyword in :func:`iris.fileformats.netcdf.save`.
This can share and re-use any common (lazy) result computations, and it makes much
better use of resources during any file-system waiting (i.e., it can use such periods
to progress the *other* saves).

Usage example::

# Create output files with delayed data saving.
delayeds = [
iris.save(cubes, filepath, compute=False)
for cubes, filepath in zip(output_cubesets, output_filepaths)
]
# Complete saves in parallel.
dask.compute(*delayeds)

This advance also includes **another substantial benefit**, because NetCDF saves can
now use a
`Dask.distributed scheduler <https://docs.dask.org/en/stable/scheduling.html>`_.
With `Distributed <https://distributed.dask.org/en/stable/>`_ you can parallelise the
saves across a whole cluster. Whereas previously, the NetCDF saving *only* worked with
a "threaded" scheduler, limiting it to a single CPU.

We're so super keen for the community to leverage the benefit of this new
feature within Iris that we've brought this release forward several months.
As a result, this minor release of Iris is intentionally light in content.
However, there are some other goodies available for you to enjoy, such as:

* Performing lazy arithmetic with an Iris :class:`~iris.cube.Cube` and a
:class:`dask.array.Array`, and
* Various improvements to our documentation resulting from adoption of
`sphinx-design`_ and `sphinx-apidoc`_.

As always, get in touch with us on :issue:`GitHub<new/choose>`, particularly
if you have any feedback with regards to delayed saving, or have any issues
or feature requests for improving Iris. Enjoy!


📢 Announcements
Expand Down
11 changes: 7 additions & 4 deletions docs/src/whatsnew/latest.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,8 @@ This document explains the changes made to Iris for this release
🚀 Performance Enhancements
===========================

#. N/A

#. `@rcomer`_ made :meth:`~iris.cube.Cube.aggregated_by` faster. (:pull:`4970`)
#. `@rsdavies`_ modified the CF compliant standard name for m01s00i023 :issue:`4566`

🔥 Deprecations
===============
Expand All @@ -72,13 +72,16 @@ This document explains the changes made to Iris for this release
💼 Internal
===========

#. N/A
#. `@pp-mo`_ supported loading and saving netcdf :class:`netCDF4.Dataset` compatible
objects in place of file-paths, as hooks for a forthcoming
`"Xarray bridge" <https://github.com/SciTools/iris/issues/4994>`_ facility.
(:pull:`5214`)


.. comment
Whatsnew author names (@github name) in alphabetical order. Note that,
core dev names are automatically included by the common_links.inc:
.. _@rsdavies: https://github.com/rsdavies



Expand Down
9 changes: 7 additions & 2 deletions lib/iris/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,12 +89,12 @@ def callback(cube, field, filename):
"""

from collections.abc import Iterable
import contextlib
import glob
import importlib
import itertools
import os.path
import pathlib
import threading

import iris._constraints
Expand Down Expand Up @@ -256,7 +256,8 @@ def context(self, **kwargs):

def _generate_cubes(uris, callback, constraints):
"""Returns a generator of cubes given the URIs and a callback."""
if isinstance(uris, (str, pathlib.PurePath)):
if isinstance(uris, str) or not isinstance(uris, Iterable):
# Make a string, or other single item, into an iterable.
uris = [uris]

# Group collections of uris by their iris handler
Expand All @@ -273,6 +274,10 @@ def _generate_cubes(uris, callback, constraints):
urls = [":".join(x) for x in groups]
for cube in iris.io.load_http(urls, callback):
yield cube
elif scheme == "data":
data_objects = [x[1] for x in groups]
for cube in iris.io.load_data_objects(data_objects, callback):
yield cube
else:
raise ValueError("Iris cannot handle the URI scheme: %s" % scheme)

Expand Down
135 changes: 51 additions & 84 deletions lib/iris/cube.py
Original file line number Diff line number Diff line change
Expand Up @@ -4178,98 +4178,65 @@ def aggregated_by(
data_shape = list(self.shape + aggregator.aggregate_shape(**kwargs))
data_shape[dimension_to_groupby] = len(groupby)

# Aggregate the group-by data.
# Choose appropriate data and functions for data aggregation.
if aggregator.lazy_func is not None and self.has_lazy_data():
front_slice = (slice(None, None),) * dimension_to_groupby
back_slice = (slice(None, None),) * (
len(data_shape) - dimension_to_groupby - 1
)
stack = da.stack
input_data = self.lazy_data()
agg_method = aggregator.lazy_aggregate
else:
input_data = self.data
# Note numpy.stack does not preserve masks.
stack = ma.stack if ma.isMaskedArray(input_data) else np.stack
agg_method = aggregator.aggregate

# Create data and weights slices.
front_slice = (slice(None),) * dimension_to_groupby
back_slice = (slice(None),) * (
len(data_shape) - dimension_to_groupby - 1
)

# Create cube and weights slices
groupby_subcubes = map(
lambda groupby_slice: self[
groupby_subarrs = map(
lambda groupby_slice: iris.util._slice_data_with_keys(
input_data, front_slice + (groupby_slice,) + back_slice
)[1],
groupby.group(),
)

if weights is not None:
groupby_subweights = map(
lambda groupby_slice: weights[
front_slice + (groupby_slice,) + back_slice
].lazy_data(),
],
groupby.group(),
)
if weights is not None:
groupby_subweights = map(
lambda groupby_slice: weights[
front_slice + (groupby_slice,) + back_slice
],
groupby.group(),
)
else:
groupby_subweights = (None for _ in range(len(groupby)))
else:
groupby_subweights = (None for _ in range(len(groupby)))

agg = iris.analysis.create_weighted_aggregator_fn(
aggregator.lazy_aggregate, axis=dimension_to_groupby, **kwargs
# Aggregate data slices.
agg = iris.analysis.create_weighted_aggregator_fn(
agg_method, axis=dimension_to_groupby, **kwargs
)
result = list(map(agg, groupby_subarrs, groupby_subweights))

# If weights are returned, "result" is a list of tuples (each tuple
# contains two elements; the first is the aggregated data, the
# second is the aggregated weights). Convert these to two lists
# (one for the aggregated data and one for the aggregated weights)
# before combining the different slices.
if return_weights:
result, weights_result = list(zip(*result))
aggregateby_weights = stack(
weights_result, axis=dimension_to_groupby
)
result = list(map(agg, groupby_subcubes, groupby_subweights))

# If weights are returned, "result" is a list of tuples (each tuple
# contains two elements; the first is the aggregated data, the
# second is the aggregated weights). Convert these to two lists
# (one for the aggregated data and one for the aggregated weights)
# before combining the different slices.
if return_weights:
result, weights_result = list(zip(*result))
aggregateby_weights = da.stack(
weights_result, axis=dimension_to_groupby
)
else:
aggregateby_weights = None
aggregateby_data = da.stack(result, axis=dimension_to_groupby)
else:
cube_slice = [slice(None, None)] * len(data_shape)
for i, groupby_slice in enumerate(groupby.group()):
# Slice the cube with the group-by slice to create a group-by
# sub-cube.
cube_slice[dimension_to_groupby] = groupby_slice
groupby_sub_cube = self[tuple(cube_slice)]

# Slice the weights
if weights is not None:
groupby_sub_weights = weights[tuple(cube_slice)]
kwargs["weights"] = groupby_sub_weights

# Perform the aggregation over the group-by sub-cube and
# repatriate the aggregated data into the aggregate-by cube
# data. If weights are also returned, handle them separately.
result = aggregator.aggregate(
groupby_sub_cube.data, axis=dimension_to_groupby, **kwargs
)
if return_weights:
weights_result = result[1]
result = result[0]
else:
weights_result = None

# Determine aggregation result data type for the aggregate-by
# cube data on first pass.
if i == 0:
if ma.isMaskedArray(self.data):
aggregateby_data = ma.zeros(
data_shape, dtype=result.dtype
)
else:
aggregateby_data = np.zeros(
data_shape, dtype=result.dtype
)
if weights_result is not None:
aggregateby_weights = np.zeros(
data_shape, dtype=weights_result.dtype
)
else:
aggregateby_weights = None
cube_slice[dimension_to_groupby] = i
aggregateby_data[tuple(cube_slice)] = result
if weights_result is not None:
aggregateby_weights[tuple(cube_slice)] = weights_result

# Restore original weights.
if weights is not None:
kwargs["weights"] = weights
aggregateby_weights = None

aggregateby_data = stack(result, axis=dimension_to_groupby)
# Ensure plain ndarray is output if plain ndarray was input.
if ma.isMaskedArray(aggregateby_data) and not ma.isMaskedArray(
input_data
):
aggregateby_data = ma.getdata(aggregateby_data)

# Add the aggregation meta data to the aggregate-by cube.
aggregator.update_metadata(
Expand Down
37 changes: 28 additions & 9 deletions lib/iris/fileformats/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
"""

from iris.io.format_picker import (
DataSourceObjectProtocol,
FileExtension,
FormatAgent,
FormatSpecification,
Expand Down Expand Up @@ -125,16 +126,34 @@ def _load_grib(*args, **kwargs):
)


_nc_dap = FormatSpecification(
"NetCDF OPeNDAP",
UriProtocol(),
lambda protocol: protocol in ["http", "https"],
netcdf.load_cubes,
priority=6,
constraint_aware_handler=True,
FORMAT_AGENT.add_spec(
FormatSpecification(
"NetCDF OPeNDAP",
UriProtocol(),
lambda protocol: protocol in ["http", "https"],
netcdf.load_cubes,
priority=6,
constraint_aware_handler=True,
)
)

# NetCDF file presented as an open, readable netCDF4 dataset (or mimic).
FORMAT_AGENT.add_spec(
FormatSpecification(
"NetCDF dataset",
DataSourceObjectProtocol(),
lambda object: all(
hasattr(object, x)
for x in ("variables", "dimensions", "groups", "ncattrs")
),
# Note: this uses the same call as the above "NetCDF_v4" (and "NetCDF OPeNDAP")
# The handler itself needs to detect what is passed + handle it appropriately.
netcdf.load_cubes,
priority=4,
constraint_aware_handler=True,
)
)
FORMAT_AGENT.add_spec(_nc_dap)
del _nc_dap


#
# UM Fieldsfiles.
Expand Down

0 comments on commit d6fc3e1

Please sign in to comment.