Run batches of simulations (#115)

* minor refactor * add batch_dim argument * add get_results method to driver classes * wip implement run batch (sequential) * wip add batch in zarr store and runtime context * maybe resize zarr dataset for non-clock variables * rename ibatch -> batch and istep -> step * docstring minor change * split variable info and cache Variable info is common to all simulations in the batch, while variable cache is specific to each simulation in the batch. * update release notes * black * remove print * fix data assign of index variables (xarray error) * update release notes (more details) * update and fix tests Fix shape (must be shared between batch simulations) Fix scalar output variables Fix > 1 attempts to create the same zarr dataset Fix clock incrementers (one per batch) * black again * refactor: stick model state to Model object Added public methods and properties to Model for accessing or updating or validating its state (i.e., active simulation data). This redesign allows much cleaner code for drivers, given that one instance of a driver (and store) may now handle multiple simulations (batches). This also provides better (public) API for running models without using the xarray extension. It assumes that model cloning is a relatively cheap operation. * black * clean-up and add tests * zarr: don't create batch dim for index vars * fixes that still need some tests * doc: run batches section + lots of improvements * black * doc tweaks * more tests
xarray-contrib · Apr 1, 2020 · 83bbdeb · 83bbdeb
1 parent 24ec282
commit 83bbdeb
Show file tree

Hide file tree

Showing 25 changed files with 950 additions and 492 deletions.
diff --git a/doc/api.rst b/doc/api.rst
@@ -103,16 +103,19 @@ process names and values are objects of ``Process`` subclasses
 Running a model
 ---------------
 
-In most cases, the methods listed below should not be used directly.
-For running simulations, it is preferable to use the
-``Dataset.xsimlab`` accessor instead. These methods might be useful
-though, e.g., for debugging or for using ``Model`` objects with other
-interfaces.
+In most cases, the methods and properties listed below should not be used
+directly. For running simulations, it is preferable to use the
+``Dataset.xsimlab`` accessor instead. These methods might be useful though,
+e.g., for debugging or for using ``Model`` objects with other interfaces.
 
 .. autosummary::
    :toctree: _api_generated/
 
+   Model.state
+   Model.cache_state
+   Model.set_inputs
    Model.execute
+   Model.validate
 
 Process
 =======

diff --git a/doc/create_model.rst b/doc/create_model.rst
@@ -3,7 +3,7 @@
 Create and Modify Models
 ========================
 
-Like the previous :doc:`framework` section, this section is useful
+Like the previous Section :doc:`framework`, this section is useful
 mostly for users who want to create new models from scratch or
 customize existing models. Users who only want to run simulations from
 existing models may skip this section.
@@ -82,7 +82,7 @@ computes a value for these two variables.
 Note also ``static=True`` set for ``spacing``, ``length``, ``loc`` and
 ``scale``. This is to prevent providing time varying values as model inputs for
 those parameters. By default, it is possible to change the value of a variable
-during a simulation (external forcing), see section :ref:`time_varying_inputs`
+during a simulation (external forcing), see Section :ref:`time_varying_inputs`
 for an example. This is not always desirable, though.
 
 Process "runtime" methods
@@ -142,10 +142,10 @@ to include in the model, e.g., with only the process created above:
    :lines: 37
 
 That's it! Now we have different tools already available to inspect
-the model (see section :doc:`inspect_model`). We can also use that
+the model (see Section :doc:`inspect_model`). We can also use that
 model with the xarray extension provided by xarray-simlab to create
 new setups, run the model, take snapshots for one or more variables on
-a given frequency, etc. (see section :doc:`run_model`).
+a given frequency, etc. (see Section :doc:`run_model`).
 
 Fine-grained process refactoring
 --------------------------------
@@ -176,8 +176,13 @@ x-coordinate values.
 .. literalinclude:: scripts/advection_model.py
    :lines: 40-49
 
-Grid x-coordinate values only need to be set once at the beginning of
-the simulation ; there is no need to implement ``.run_step()`` here.
+All grid variables are static, i.e., their values must be time-invariant. The
+``x`` variable is declared using :func:`~xsimlab.index`. This is a specific kind
+of variable intended for storing coordinate labels, here useful for indexing any
+data on the grid. ``x`` values must be set somewhere in the process runtime
+methods and they should also be time-invariant (i.e., all index variables imply
+``intent='out'`` and ``static=True``). Those values are set here once at the
+beginning of the simulation ; there is no need to implement ``.run_step()``.
 
 **ProfileU**
 
@@ -226,20 +231,21 @@ We now have all the building blocks to create a more flexible model:
 .. literalinclude:: scripts/advection_model.py
    :lines: 104-111
 
-The order in which processes are given doesn't matter (it is a
-dictionary). A computationally consistent order, as well as model
-inputs among all declared variables, are both automatically figured
-out when creating the Model instance.
+The order in which the processes are given in the dictionary doesn't matter.
+When creating a new instance of :class:`~xsimlab.Model`, the xarray-simlab
+modeling framework automatically sorts the given processes into a
+computationally consistent order and retrieves the model inputs among all
+declared variables in all processes.
 
-In terms of computation and inputs, ``model2`` is equivalent to the
-``model1`` instance created above ; it is just organized
+In terms of computation and inputs, ``advect_model`` is equivalent to the
+``advect_model_raw`` instance created above ; it is just organized
 differently.
 
 Update existing models
 ----------------------
 
 Between the two Model instances created so far, the advantage of
-``model2`` over ``model1`` is that we can easily update the model --
+``advect_model`` over ``advect_model_raw`` is that we can easily update the model --
 change its behavior and/or add many new features -- without
 sacrificing readability or losing the ability to get back to the
 original, simple version.
@@ -275,9 +281,9 @@ Using one command, we can then update the model with these new
 features:
 
 .. literalinclude:: scripts/advection_model.py
-   :lines: 155
+   :lines: 155-157
 
-Compared to ``model2``, this new ``model3`` have a new process named
+Compared to ``advect_model``, this new ``advect_model_src`` have a new process named
 'source' and a replaced process 'init'.
 
 **Removing one or more processes**
@@ -286,7 +292,7 @@ It is also possible to create new models by removing one or more
 processes from existing Model instances, e.g.,
 
 .. literalinclude:: scripts/advection_model.py
-   :lines: 158
+   :lines: 160
 
 In this latter case, users will have to provide initial values of
 :math:`u` along the grid directly as an input array.
@@ -304,34 +310,34 @@ Customize existing processes
 Sometimes we only want to update an existing model with very minor
 changes.
 
-As an example, let's update ``model2`` by using a fixed grid (i.e.,
+As an example, let's update ``advect_model`` by using a fixed grid (i.e.,
 with hard-coded values for grid spacing and length). One way to
 achieve this is to create a small new process class that sets
 the values of ``spacing`` and ``length``:
 
 .. literalinclude:: scripts/advection_model.py
-   :lines: 161-168
+   :lines: 163-170
 
 However, one drawback of this "additive" approach is that the number
 of processes in a model might become unnecessarily high:
 
 .. literalinclude:: scripts/advection_model.py
-   :lines: 171
+   :lines: 173-175
 
 Alternatively, it is possible to write a process class that inherits
 from ``UniformGrid1D``, in which we can re-declare variables *and/or*
 re-define "runtime" methods:
 
 .. literalinclude:: scripts/advection_model.py
-   :lines: 174-182
+   :lines: 178-186
 
 We can here directly update the model and replace the original process
 ``UniformGrid1D`` by the inherited class ``FixedGrid``. Foreign
 variables that refer to ``UniformGrid1D`` will still correctly point
 to the ``grid`` process in the updated model:
 
 .. literalinclude:: scripts/advection_model.py
-   :lines: 185
+   :lines: 189
 
 .. warning::
 

diff --git a/doc/develop.rst b/doc/develop.rst
@@ -170,8 +170,8 @@ change submission.
 Release notes
 ~~~~~~~~~~~~~
 
-Every significative code contribution should be listed in the
-:doc:`whats_new` section of this documentation under the corresponding version.
+Every significative code contribution should be listed in Section
+:doc:`whats_new` of this documentation under the corresponding version.
 
 Contributing to documentation
 -----------------------------

diff --git a/doc/faq.rst b/doc/faq.rst
@@ -67,7 +67,7 @@ Three levels of parallelism are possible:
 
 Note that the notion of process used above is different from
 multiprocessing: a process here corresponds to a component of a model
-(see :ref:`framework` section).
+(see Section :ref:`framework`).
 
 The first level "inter-model" is an embarrassingly parallel problem.
 Next versions of xarray-simlab will allow to very easily run

diff --git a/doc/framework.rst b/doc/framework.rst
@@ -169,6 +169,14 @@ returns their value. They have always ``intent='out'``.
 
 On-demand variables are useful, e.g., for optional model diagnostics.
 
+Index variables
+~~~~~~~~~~~~~~~
+
+Index variables are intended for indexing data of other variables in a model
+like, e.g., coordinate labels of grid nodes. They are declared using
+:func:`~xsimlab.index`. They have always ``intent='out'`` although their values
+could be computed from other input variables.
+
 Simulation workflow
 -------------------
 

diff --git a/doc/index.rst b/doc/index.rst
@@ -104,6 +104,5 @@ group of the GFZ Helmholtz Centre Potsdam.
 Citation
 --------
 
-If you use xarray-simlab and would like to cite it in a scientific
-publication, we would certainly appreciate it (see :doc:`citation`
-section).
+If you use xarray-simlab and would like to cite it in a scientific publication,
+we would certainly appreciate it (see Section :doc:`citation`).
diff --git a/doc/inspect_model.rst b/doc/inspect_model.rst
@@ -3,17 +3,19 @@
 Inspect Models
 ==============
 
-We can inspect xarray-simlab's :class:`~xsimlab.Model` objects in
-different ways. As an example we'll use here the object ``model2``
-which has been created in the previous section :doc:`create_model` of
-this user guide.
+Models may be complex and built from many processes and variables. To better
+explore those models, xarray-simlab provides many convenient ways to inspect and
+auto-document :class:`~xsimlab.Model` objects.
+
+As an example we'll use here the object ``advect_model`` which has been created
+in the previous Section :doc:`create_model` of this user guide.
 
 .. ipython:: python
    :suppress:
 
     import sys
     sys.path.append('scripts')
-    from advection_model import model2, ProfileU
+    from advection_model import advect_model, ProfileU
 
 .. ipython:: python
 
@@ -27,7 +29,7 @@ processes and their variables that need an input value (if any):
 
 .. ipython:: python
 
-    model2
+    advect_model
 
 For each input, a one-line summary is shown with the intent (either
 'in' or 'inout') as well as the dimension labels for inputs that don't
@@ -40,14 +42,14 @@ variable names, respectively.
 
 .. ipython:: python
 
-    model2.input_vars
+    advect_model.input_vars
 
 :attr:`~xsimlab.Model.input_vars_dict` returns all inputs grouped by
 process, as a dictionary:
 
 .. ipython:: python
 
-    model2.input_vars_dict
+    advect_model.input_vars_dict
 
 Inspect processes and variables
 -------------------------------
@@ -57,8 +59,8 @@ attribute-like access to their processes, e.g.,
 
 .. ipython:: python
 
-    model2['advect']
-    model2.grid
+    advect_model['advect']
+    advect_model.grid
 
 As shown here above, process *repr* includes:
 
@@ -83,14 +85,21 @@ variable level:
 .. ipython:: python
 
     xs.variable_info(ProfileU, 'u')
-    xs.variable_info(model2.profile, 'u_vars')
+    xs.variable_info(advect_model.profile, 'u_vars')
+
+Alternatively, you can look at the auto-generated docstrings of a process class
+(configurable via the ``autodoc`` parameter of :func:`~xsimlab.process`):
+
+.. ipython:: python
+
+   ProfileU?
 
-Alternatively, we can look at the docstrings of auto-generated
-properties for each variable, e.g.,
+As well as the auto-generated docstrings for each variable (only accessible from
+Model objects), e.g.,
 
 .. ipython:: python
 
-    ProfileU.u?
+    advect_model.profile.u?
 
 Like :attr:`~xsimlab.Model.input_vars` and
 :attr:`~xsimlab.Model.input_vars_dict`, Model properties
@@ -105,46 +114,46 @@ Visualize models as graphs
    :suppress:
 
     from xsimlab.dot import dot_graph
-    dot_graph(model2, filename='savefig/model2_simple.png')
-    dot_graph(model2, show_inputs=True, filename='savefig/model2_inputs.png')
-    dot_graph(model2, show_inputs=True, show_variables=True,
-              filename='savefig/model2_variables.png')
+    dot_graph(advect_model, filename='savefig/advect_model_simple.png')
+    dot_graph(advect_model, show_inputs=True, filename='savefig/advect_model_inputs.png')
+    dot_graph(advect_model, show_inputs=True, show_variables=True,
+              filename='savefig/advect_model_variables.png')
 
 .. ipython:: python
    :suppress:
 
-    dot_graph(model2, show_only_variable=('profile', 'u'),
-              filename='savefig/model2_var_u.png')
+    dot_graph(advect_model, show_only_variable=('profile', 'u'),
+              filename='savefig/advect_model_var_u.png')
 
 It is possible to visualize a model and its processes as a directed
 graph (note: this requires installing Graphviz and its Python
 bindings, which both can be found on conda-forge):
 
 .. ipython:: python
 
-    model2.visualize();
+    advect_model.visualize();
 
-.. image:: savefig/model2_simple.png
+.. image:: savefig/advect_model_simple.png
    :width: 40%
 
 ``show_inputs`` option allows to show model input variables as yellow
 square nodes linked to their corresponding processes:
 
 .. ipython:: python
 
-    model2.visualize(show_inputs=True);
+    advect_model.visualize(show_inputs=True);
 
-.. image:: savefig/model2_inputs.png
+.. image:: savefig/advect_model_inputs.png
    :width: 60%
 
 ``show_variables`` option allows to show the other variables as white
 square nodes:
 
 .. ipython:: python
 
-    model2.visualize(show_inputs=True, show_variables=True);
+    advect_model.visualize(show_inputs=True, show_variables=True);
 
-.. image:: savefig/model2_variables.png
+.. image:: savefig/advect_model_variables.png
    :width: 60%
 
 Nodes with solid border correspond to regular variables while nodes
@@ -158,9 +167,9 @@ variable and all its references in other processes, e.g.,
 
 .. ipython:: python
 
-    model2.visualize(show_only_variable=('profile', 'u'));
+    advect_model.visualize(show_only_variable=('profile', 'u'));
 
-.. image:: savefig/model2_var_u.png
+.. image:: savefig/advect_model_var_u.png
    :width: 40%
 
 Note that there is another function ``dot_graph`` available in module