Skip to content

Commit

Permalink
Run batches of simulations (#115)
Browse files Browse the repository at this point in the history
* minor refactor

* add batch_dim argument

* add get_results method to driver classes

* wip implement run batch (sequential)

* wip add batch in zarr store and runtime context

* maybe resize zarr dataset for non-clock variables

* rename ibatch -> batch and istep -> step

* docstring minor change

* split variable info and cache

Variable info is common to all simulations in the batch, while variable
cache is specific to each simulation in the batch.

* update release notes

* black

* remove print

* fix data assign of index variables (xarray error)

* update release notes (more details)

* update and fix tests

Fix shape (must be shared between batch simulations)
Fix scalar output variables
Fix > 1 attempts to create the same zarr dataset
Fix clock incrementers (one per batch)

* black again

* refactor: stick model state to Model object

Added public methods and properties to Model for accessing or updating
or validating its state (i.e., active simulation data).

This redesign allows much cleaner code for drivers, given that one
instance of a driver (and store) may now handle multiple
simulations (batches).

This also provides better (public) API for running models without
using the xarray extension.

It assumes that model cloning is a relatively cheap operation.

* black

* clean-up and add tests

* zarr: don't create batch dim for index vars

* fixes that still need some tests

* doc: run batches section + lots of improvements

* black

* doc tweaks

* more tests
  • Loading branch information
benbovy committed Apr 1, 2020
1 parent 24ec282 commit 83bbdeb
Show file tree
Hide file tree
Showing 25 changed files with 950 additions and 492 deletions.
13 changes: 8 additions & 5 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -103,16 +103,19 @@ process names and values are objects of ``Process`` subclasses
Running a model
---------------

In most cases, the methods listed below should not be used directly.
For running simulations, it is preferable to use the
``Dataset.xsimlab`` accessor instead. These methods might be useful
though, e.g., for debugging or for using ``Model`` objects with other
interfaces.
In most cases, the methods and properties listed below should not be used
directly. For running simulations, it is preferable to use the
``Dataset.xsimlab`` accessor instead. These methods might be useful though,
e.g., for debugging or for using ``Model`` objects with other interfaces.

.. autosummary::
:toctree: _api_generated/

Model.state
Model.cache_state
Model.set_inputs
Model.execute
Model.validate

Process
=======
Expand Down
48 changes: 27 additions & 21 deletions doc/create_model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Create and Modify Models
========================

Like the previous :doc:`framework` section, this section is useful
Like the previous Section :doc:`framework`, this section is useful
mostly for users who want to create new models from scratch or
customize existing models. Users who only want to run simulations from
existing models may skip this section.
Expand Down Expand Up @@ -82,7 +82,7 @@ computes a value for these two variables.
Note also ``static=True`` set for ``spacing``, ``length``, ``loc`` and
``scale``. This is to prevent providing time varying values as model inputs for
those parameters. By default, it is possible to change the value of a variable
during a simulation (external forcing), see section :ref:`time_varying_inputs`
during a simulation (external forcing), see Section :ref:`time_varying_inputs`
for an example. This is not always desirable, though.

Process "runtime" methods
Expand Down Expand Up @@ -142,10 +142,10 @@ to include in the model, e.g., with only the process created above:
:lines: 37

That's it! Now we have different tools already available to inspect
the model (see section :doc:`inspect_model`). We can also use that
the model (see Section :doc:`inspect_model`). We can also use that
model with the xarray extension provided by xarray-simlab to create
new setups, run the model, take snapshots for one or more variables on
a given frequency, etc. (see section :doc:`run_model`).
a given frequency, etc. (see Section :doc:`run_model`).

Fine-grained process refactoring
--------------------------------
Expand Down Expand Up @@ -176,8 +176,13 @@ x-coordinate values.
.. literalinclude:: scripts/advection_model.py
:lines: 40-49

Grid x-coordinate values only need to be set once at the beginning of
the simulation ; there is no need to implement ``.run_step()`` here.
All grid variables are static, i.e., their values must be time-invariant. The
``x`` variable is declared using :func:`~xsimlab.index`. This is a specific kind
of variable intended for storing coordinate labels, here useful for indexing any
data on the grid. ``x`` values must be set somewhere in the process runtime
methods and they should also be time-invariant (i.e., all index variables imply
``intent='out'`` and ``static=True``). Those values are set here once at the
beginning of the simulation ; there is no need to implement ``.run_step()``.

**ProfileU**

Expand Down Expand Up @@ -226,20 +231,21 @@ We now have all the building blocks to create a more flexible model:
.. literalinclude:: scripts/advection_model.py
:lines: 104-111

The order in which processes are given doesn't matter (it is a
dictionary). A computationally consistent order, as well as model
inputs among all declared variables, are both automatically figured
out when creating the Model instance.
The order in which the processes are given in the dictionary doesn't matter.
When creating a new instance of :class:`~xsimlab.Model`, the xarray-simlab
modeling framework automatically sorts the given processes into a
computationally consistent order and retrieves the model inputs among all
declared variables in all processes.

In terms of computation and inputs, ``model2`` is equivalent to the
``model1`` instance created above ; it is just organized
In terms of computation and inputs, ``advect_model`` is equivalent to the
``advect_model_raw`` instance created above ; it is just organized
differently.

Update existing models
----------------------

Between the two Model instances created so far, the advantage of
``model2`` over ``model1`` is that we can easily update the model --
``advect_model`` over ``advect_model_raw`` is that we can easily update the model --
change its behavior and/or add many new features -- without
sacrificing readability or losing the ability to get back to the
original, simple version.
Expand Down Expand Up @@ -275,9 +281,9 @@ Using one command, we can then update the model with these new
features:

.. literalinclude:: scripts/advection_model.py
:lines: 155
:lines: 155-157

Compared to ``model2``, this new ``model3`` have a new process named
Compared to ``advect_model``, this new ``advect_model_src`` have a new process named
'source' and a replaced process 'init'.

**Removing one or more processes**
Expand All @@ -286,7 +292,7 @@ It is also possible to create new models by removing one or more
processes from existing Model instances, e.g.,

.. literalinclude:: scripts/advection_model.py
:lines: 158
:lines: 160

In this latter case, users will have to provide initial values of
:math:`u` along the grid directly as an input array.
Expand All @@ -304,34 +310,34 @@ Customize existing processes
Sometimes we only want to update an existing model with very minor
changes.

As an example, let's update ``model2`` by using a fixed grid (i.e.,
As an example, let's update ``advect_model`` by using a fixed grid (i.e.,
with hard-coded values for grid spacing and length). One way to
achieve this is to create a small new process class that sets
the values of ``spacing`` and ``length``:

.. literalinclude:: scripts/advection_model.py
:lines: 161-168
:lines: 163-170

However, one drawback of this "additive" approach is that the number
of processes in a model might become unnecessarily high:

.. literalinclude:: scripts/advection_model.py
:lines: 171
:lines: 173-175

Alternatively, it is possible to write a process class that inherits
from ``UniformGrid1D``, in which we can re-declare variables *and/or*
re-define "runtime" methods:

.. literalinclude:: scripts/advection_model.py
:lines: 174-182
:lines: 178-186

We can here directly update the model and replace the original process
``UniformGrid1D`` by the inherited class ``FixedGrid``. Foreign
variables that refer to ``UniformGrid1D`` will still correctly point
to the ``grid`` process in the updated model:

.. literalinclude:: scripts/advection_model.py
:lines: 185
:lines: 189

.. warning::

Expand Down
4 changes: 2 additions & 2 deletions doc/develop.rst
Original file line number Diff line number Diff line change
Expand Up @@ -170,8 +170,8 @@ change submission.
Release notes
~~~~~~~~~~~~~

Every significative code contribution should be listed in the
:doc:`whats_new` section of this documentation under the corresponding version.
Every significative code contribution should be listed in Section
:doc:`whats_new` of this documentation under the corresponding version.

Contributing to documentation
-----------------------------
Expand Down
2 changes: 1 addition & 1 deletion doc/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ Three levels of parallelism are possible:

Note that the notion of process used above is different from
multiprocessing: a process here corresponds to a component of a model
(see :ref:`framework` section).
(see Section :ref:`framework`).

The first level "inter-model" is an embarrassingly parallel problem.
Next versions of xarray-simlab will allow to very easily run
Expand Down
8 changes: 8 additions & 0 deletions doc/framework.rst
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,14 @@ returns their value. They have always ``intent='out'``.

On-demand variables are useful, e.g., for optional model diagnostics.

Index variables
~~~~~~~~~~~~~~~

Index variables are intended for indexing data of other variables in a model
like, e.g., coordinate labels of grid nodes. They are declared using
:func:`~xsimlab.index`. They have always ``intent='out'`` although their values
could be computed from other input variables.

Simulation workflow
-------------------

Expand Down
5 changes: 2 additions & 3 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,5 @@ group of the GFZ Helmholtz Centre Potsdam.
Citation
--------

If you use xarray-simlab and would like to cite it in a scientific
publication, we would certainly appreciate it (see :doc:`citation`
section).
If you use xarray-simlab and would like to cite it in a scientific publication,
we would certainly appreciate it (see Section :doc:`citation`).
65 changes: 37 additions & 28 deletions doc/inspect_model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,19 @@
Inspect Models
==============

We can inspect xarray-simlab's :class:`~xsimlab.Model` objects in
different ways. As an example we'll use here the object ``model2``
which has been created in the previous section :doc:`create_model` of
this user guide.
Models may be complex and built from many processes and variables. To better
explore those models, xarray-simlab provides many convenient ways to inspect and
auto-document :class:`~xsimlab.Model` objects.

As an example we'll use here the object ``advect_model`` which has been created
in the previous Section :doc:`create_model` of this user guide.

.. ipython:: python
:suppress:
import sys
sys.path.append('scripts')
from advection_model import model2, ProfileU
from advection_model import advect_model, ProfileU
.. ipython:: python
Expand All @@ -27,7 +29,7 @@ processes and their variables that need an input value (if any):

.. ipython:: python
model2
advect_model
For each input, a one-line summary is shown with the intent (either
'in' or 'inout') as well as the dimension labels for inputs that don't
Expand All @@ -40,14 +42,14 @@ variable names, respectively.

.. ipython:: python
model2.input_vars
advect_model.input_vars
:attr:`~xsimlab.Model.input_vars_dict` returns all inputs grouped by
process, as a dictionary:

.. ipython:: python
model2.input_vars_dict
advect_model.input_vars_dict
Inspect processes and variables
-------------------------------
Expand All @@ -57,8 +59,8 @@ attribute-like access to their processes, e.g.,

.. ipython:: python
model2['advect']
model2.grid
advect_model['advect']
advect_model.grid
As shown here above, process *repr* includes:

Expand All @@ -83,14 +85,21 @@ variable level:
.. ipython:: python
xs.variable_info(ProfileU, 'u')
xs.variable_info(model2.profile, 'u_vars')
xs.variable_info(advect_model.profile, 'u_vars')
Alternatively, you can look at the auto-generated docstrings of a process class
(configurable via the ``autodoc`` parameter of :func:`~xsimlab.process`):

.. ipython:: python
ProfileU?
Alternatively, we can look at the docstrings of auto-generated
properties for each variable, e.g.,
As well as the auto-generated docstrings for each variable (only accessible from
Model objects), e.g.,

.. ipython:: python
ProfileU.u?
advect_model.profile.u?
Like :attr:`~xsimlab.Model.input_vars` and
:attr:`~xsimlab.Model.input_vars_dict`, Model properties
Expand All @@ -105,46 +114,46 @@ Visualize models as graphs
:suppress:
from xsimlab.dot import dot_graph
dot_graph(model2, filename='savefig/model2_simple.png')
dot_graph(model2, show_inputs=True, filename='savefig/model2_inputs.png')
dot_graph(model2, show_inputs=True, show_variables=True,
filename='savefig/model2_variables.png')
dot_graph(advect_model, filename='savefig/advect_model_simple.png')
dot_graph(advect_model, show_inputs=True, filename='savefig/advect_model_inputs.png')
dot_graph(advect_model, show_inputs=True, show_variables=True,
filename='savefig/advect_model_variables.png')
.. ipython:: python
:suppress:
dot_graph(model2, show_only_variable=('profile', 'u'),
filename='savefig/model2_var_u.png')
dot_graph(advect_model, show_only_variable=('profile', 'u'),
filename='savefig/advect_model_var_u.png')
It is possible to visualize a model and its processes as a directed
graph (note: this requires installing Graphviz and its Python
bindings, which both can be found on conda-forge):

.. ipython:: python
model2.visualize();
advect_model.visualize();
.. image:: savefig/model2_simple.png
.. image:: savefig/advect_model_simple.png
:width: 40%

``show_inputs`` option allows to show model input variables as yellow
square nodes linked to their corresponding processes:

.. ipython:: python
model2.visualize(show_inputs=True);
advect_model.visualize(show_inputs=True);
.. image:: savefig/model2_inputs.png
.. image:: savefig/advect_model_inputs.png
:width: 60%

``show_variables`` option allows to show the other variables as white
square nodes:

.. ipython:: python
model2.visualize(show_inputs=True, show_variables=True);
advect_model.visualize(show_inputs=True, show_variables=True);
.. image:: savefig/model2_variables.png
.. image:: savefig/advect_model_variables.png
:width: 60%

Nodes with solid border correspond to regular variables while nodes
Expand All @@ -158,9 +167,9 @@ variable and all its references in other processes, e.g.,

.. ipython:: python
model2.visualize(show_only_variable=('profile', 'u'));
advect_model.visualize(show_only_variable=('profile', 'u'));
.. image:: savefig/model2_var_u.png
.. image:: savefig/advect_model_var_u.png
:width: 40%

Note that there is another function ``dot_graph`` available in module
Expand Down

0 comments on commit 83bbdeb

Please sign in to comment.