Skip to content

Commit

Permalink
wip documentation (framework section)
Browse files Browse the repository at this point in the history
  • Loading branch information
benbovy committed Jul 28, 2017
1 parent ea291d6 commit ba23ec5
Showing 1 changed file with 164 additions and 97 deletions.
261 changes: 164 additions & 97 deletions doc/framework.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,108 +3,130 @@
Modeling Framework
==================

This section briefly explains the design of the xarray-simlab's modeling
framework. For more practical details on how to create, inspect and run models,
see the relevant sections in this user guide.
This section explains the design of the xarray-simlab's modeling
framework. It is useful mostly for users who want to create new models
from scratch or customize existing models. Users who only want to run
simulations from existing models may skip this section.

For more practical details on how using the API to create, inspect and
run models, see the relevant sections of this user guide.

Main concepts
-------------

The xarray-simlab's framework is built on only a few concepts that are layered
onto three levels here listed from top to bottom:

- **models**, i.e., instances of :class:`~xsimlab.Model`
- **processes**, i.e., subclasses of :class:`~xsimlab.Process`
- **variables**, i.e., :class:`~xsimlab.Variable` objects or objects of
derived or related classes.

A **model** is an ordered collection of processes. The role of a **process** is
twofold: (1) declare a given subset of the **variables** used in a model
and (2) define a specific set of instructions that use, initialize or update
the values of these variables.
The xarray-simlab's framework is built on only a few concepts that
allow great flexibility for model developers:

Conceptually, a process is a logical component of a computational model. It may
for example represent a particular physical mechanism that is described in terms
of one or more state variables (e.g., scalar or vector fields) and one or more
operations -- with or without parameters -- that modify those state variables
through time. Note that some processes may be time-independent.
- **models**
- **processes**
- **variables**

.. note::
Models
------

xarray-simlab does not explicitly distinguish between model parameters and
state variables, both are declared as variables within their own process.
Models are instances of the :class:`~xsimlab.Model` class. They
consist of ordered and immutable collections of processes. The
ordering is inferred automatically from the given processes (see below).

.. note::
The Model class also implements specific methods for:

xarray-simlab does not provide any built-in logic for tasks like generating
computational meshes or setting boundary conditions. It should rather be
implemented in 3rd-party libraries as time-independent processes. The reason
is that even such tasks may vary from one model / domain to another, whereas
xarray-simlab aims to provide a general purpose framework.
- introspection,
- running simulations,
- easy creation of new Model objects from existing ones by dropping,
adding or updating one or more processes.

Foreign variables
-----------------
Processes
---------

Like different physical mechanisms involve some common state variables
(e.g., temperature or pressure), different processes may need to operate on
common variables.
Processes are defined as custom Python classes that inherit from the
base class :class:`~xsimlab.Process`. The role of a process is twofold:

In xarray-simlab, this common case is handled by declaring variables only once
within their own process but with the possibility of also declaring foreign
variables. :class:`~xsimlab.ForeignVariable` objects are references to
variables that are declared in other processes. It allows getting or setting
values just as if these references were the original variables.
- declare a given subset of the variables used in a model,
- define a specific set of instructions that use, initialize or update
the values of these variables during a model run.

A downside of this approach is that a lot of process (one-way) connections may
be hard-coded through the declaration of foreign variables. Therefore a process
cannot be re-used alone if it has links to other processes. However,
the great advantage of declaring variables at unique places is that
all their metadata (e.g., description, default value, units, etc.) are also
defined once.
Conceptually, a process is a logical component of a computational
model. It may for example represent a particular physical mechanism
that is described in terms of one or more state variables (e.g.,
scalar or vector fields) and one or more operations -- with or without
parameters -- that modify those state variables through time. Note
that some processes may be time-independent.

Simulation workflow
-------------------
.. note::

A model run is divided into four successive stages:
xarray-simlab does not provide any built-in logic for tasks like
generating computational meshes or setting boundary conditions,
which should rather be implemented in 3rd-party libraries as
time-independent processes. Even those tasks may be too specialized
to justify including them in this framework, which aims to be as
generic as possible.

1. initialization
2. run step
3. finalize step
4. finalization
Variables
---------

During a simulation, stages 1 and 4 are run only once while steps 2 and 3 are
repeated for a given number of (time) steps.
Variables are the most basic elements of a model. They consist of
:class:`~xsimlab.Variable` [*]_ objects that are declared in processes as class
attributes. They have the following properties:

Each process provides its own computation instructions for those stages. Note
that this is optional, except for time-dependent processes that must provide
some instructions at least for stage 2 (run step). For time-independent
processes stages 2 and 3 are ignored.
- data values (state, rate or change -- see below),
- validators, i.e., callables for checking supplied data values,
- labeled dimensions (or no dimension for scalars),
- predefined attributes like description or default value,
- custom attributes (e.g., units, math symbol).

Process dependencies and ordering
---------------------------------
.. note::

The order in which processes are executed during a simulation is critical.
For example, if the role of a process is to provide a value for a given
variable, then the execution of this process must happen before the execution
of all other processes that use the same variable in their computation.
xarray-simlab does not distinguish between, e.g., model parameters
and state variables. Both are declared as Variable objects.

This role can be defined using the ``provided`` attribute of ``Variable``
and ``ForeignVariable`` objects, either set to True or False (note that
a process may still update a variable value even if ``provided`` is set to
False, see Model inputs section below).
.. [*] usually variables rather consist of objects of derived classes
like, e.g., ``FloatVariable`` or ``IntegerVariable`` depending on
their expected value type.
In a model, the processes and their dependencies together form the nodes and
the edges of a Directed Acyclic Graph (DAG). The graph topology is fully
determined by the role set for each variable or foreign variable declared in
each process. An ordering that is computationally consistent can then be
obtained using topological sorting. This is done when creating a new
``Model`` object. The same ordering is used at every stage of a model run.
Foreign variables
-----------------

In theory, The DAG structure would also allow running the processes in parallel
at every stage of a model run. This is not yet implemented, though.
Like different physical mechanisms involve some common state variables
(e.g., temperature or pressure), different processes may operate on
common variables.

Variable values and dimensions
------------------------------
In xarray-simlab, a variable is declared at a unique place, i.e.,
within one and only one process. The issue of shared variables is
addressed by the possibility of also declaring foreign
variables. :class:`~xsimlab.ForeignVariable` objects are references to
variables that are declared in other processes. It allows getting or
setting values just as if these references were the original
variables.

The great advantage of declaring variables at unique places is that
all their metadata are defined once. However, a downside of this
approach is that foreign variables may potentially add many hard-coded
(one-way) links between processes, which makes harder reusing these
processes independently of each other.

Variable groups
---------------

In some cases, using variables groups may provide an elegant
alternative to hard-coded links between processes.

Variable objects belong to the same group if they have the same name
set for their ``group`` attribute. In a process, this group name can
be used to declare a :class:`~xsimlab.VariableGroup` object. When a
new Model object is created, this is automatically turned into an
iterable of ForeignVariable objects pointing to each of the variables
of the group. We thus avoid making explicit references to the
processes in which these variables are declared.

Variable groups are useful particularly in cases where we want to
combine different processes that act on the same variable, e.g. in
landscape evolution modeling, combine the effect of different erosion
processes on the evolution of surface elevation. This way we can
easily add or remove erosion processes to/from a model without having
issues of broken links between processes.

Variable state, rate and change
-------------------------------

A single variable may accept up to 3 different values:

Expand All @@ -113,37 +135,82 @@ A single variable may accept up to 3 different values:
- a change, i.e., the value of the time-derivative integrated for a given
time step.

These are accessible as properties of ``Variable`` and ``ForeignVariable``
objects, respectively named ``state``, ``rate`` and ``change``. An additional
property ``value`` is defined as an alias of ``state``.
These are accessible as properties of Variable and ForeignVariable
objects, respectively named ``state``, ``rate`` and ``change``. An
additional property ``value`` is defined as an alias of ``state``.

.. note::

These properties are for convenience only, it avoids duplicating
Variable objects representing state variables.
The names and descriptions of these properties are only conventions. There is
actually no restriction in getting or setting values for any of these
properties at any stage of a model run (it is let to the responsibility of
model developers). However, it is recommended to follow these conventions as
well as some good practice.
Variable objects representing state variables. Their names and
descriptions serve only as conventions, i.e., there is no
restriction for model developers in using any of these properties
anywhere in a process. It is good practice to follow these
conventions, though.

.. note::

For state variables, a common practice is to compute ``rate`` or ``change``
values during the "run step" stage and update ``state`` values during the
"finalize step" stage.
The ``rate`` and ``change`` properties should never be used for
variables other than state variables. Moreover, it is preferable
to use the property ``value`` instead of ``state`` as the latter is
quite meaningless in this case.

.. todo_move_this_elsewhere
For state variables, a common practice is to compute ``rate`` or
``change`` values during the "run step" stage and update ``state``
values during the "finalize step" stage.
Simulation workflow
-------------------

A model run is divided into four successive stages:

1. initialization
2. run step
3. finalize step
4. finalization

During a simulation, stages 1 and 4 are run only once while steps 2
and 3 are repeated for a given number of (time) steps.

Each process provides its own computation instructions for those
stages. Note that this is optional, except for time-dependent
processes that must provide some instructions at least for stage 2
(run step). For time-independent processes stages 2 and 3 are ignored.

Process dependencies and ordering
---------------------------------

The order in which processes are executed during a simulation is
critical. For example, if the role of a process is to provide a value
for a given variable, then the execution of this process must happen
before the execution of all other processes that use the same variable
in their computation.

Such role can be defined using the ``provided`` attribute of
Variable and ForeignVariable objects, either set to True or
False (note that a process may still update a variable value even if
``provided`` is set to False, see Model inputs section below).

For time-invariant variables, ``rate`` or ``change`` properties should never
be used. Moreover, it is preferable to use the property ``value`` instead of
``state`` as the latter is quite meaningless in this case.
In a model, the processes and their dependencies together form the
nodes and the edges of a Directed Acyclic Graph (DAG). The graph
topology is fully determined by the role set for each variable or
foreign variable declared in each process. An ordering that is
computationally consistent can then be obtained using topological
sorting. This is done at Model object creation. The same ordering
is used at every stage of a model run.

.. todo_
In theory, The DAG structure would also allow running the processes in
parallel at every stage of a model run. This is not yet implemented,
though.

variable dimensions paragraph
Model inputs
------------

.. todo_
.. todo::

input variable section
Fill this section.

.. move_this_foreign_variable
Expand Down

0 comments on commit ba23ec5

Please sign in to comment.