# Rocoto Tool

The `uwtools` API's `rocoto` module provides functions to build and validate Rocoto workflows. For more information on the UW YAML language than what is discussed here, see the <a href="https://uwtools.readthedocs.io/en/main/sections/user_guide/yaml/rocoto.html">Defining a Rocoto Workflow</a> page. For more on Rocoto XML documents, see the <a href="https://christopherwharrop.github.io/rocoto/">Rocoto Documentation</a>.

<div class="alert alert-warning"><b>Note: </b>This notebook was tested using <code>uwtools</code> version 2.5.0. </div>
<div class="alert alert-info">For more information, please see the <a href="https://uwtools.readthedocs.io/en/2.5.0/sections/user_guide/api/rocoto.html">uwtools.api.rocoto</a> Read the Docs page.</div>

## Table of Contents
* [Building Rocoto Workflows with UW YAML](#Building-Rocoto-Workflows-with-UW-YAML)
  * [Entities and Cyclestrings](#Entities-and-Cyclestrings)
  * [Tasks and Dependencies](#Tasks-and-Dependencies)
  * [Metatasks](#Metatasks)
* [Validating Workflows](#Validating-Workflows)
<!--cell 0-->

In [1]:
from pathlib import Path
from uwtools.api import rocoto
from uwtools.api.logging import use_uwtools_logger

use_uwtools_logger()

## Building Rocoto Workflows with UW YAML

The `rocoto.realize()` function uses a UW YAML language to create Rocoto workflows in XML format.
<!--cell 2-->

In [2]:
help(rocoto.realize)

Help on function realize in module uwtools.api.rocoto:

realize(config: Union[uwtools.config.formats.yaml.YAMLConfig, pathlib.Path, str, NoneType], output_file: Union[str, pathlib.Path, NoneType] = None, stdin_ok: bool = False) -> bool
    Realize the Rocoto workflow defined in the given YAML as XML.

    If no input file is specified, ``stdin`` is read. A ``YAMLConfig`` object may also be provided
    as input. If no output file is specified, ``stdout`` is written to. Both the input config and
    output Rocoto XML will be validated against appropriate schemas.

    :param config: YAML input file or ``YAMLConfig`` object (``None`` => read ``stdin``).
    :param output_file: XML output file path (``None`` => write to ``stdout``).
    :param stdin_ok: OK to read from ``stdin``?
    :return: ``True``.



The following is an example of a simple workflow written in the UW YAML language. It uses a top-level `workflow:` block that contains all other blocks in the workflow. The workflow's global attributes are set within an `attrs:` block, and each workflow has two required attributes: `realtime` and `scheduler`. The `realtime` key indicates whether the workflow will be run in realtime or in retrospective mode, where a value of `true` means that the workflow will be run in realtime mode. The `scheduler` key tells Rocoto which batch system to use when submitting and monitoring jobs. Each workflow must contain a `cycledef:` block that defines one or more sets of cycles the workflow will iterate over. A set of cycles must be given using the `spec` key. This key may define a set of cycles using either the "start stop step" method or the "crontab-like" method. The "start stop step" method is used below. A `log:` block is required to define the path where Rocoto logs are written. At least one task must be defined in the `tasks:` block, which is discussed in the [Tasks and Dependencies](#Tasks-and-Dependencies) section of this notebook.

The simple workflow below contains a minimal set of keys. For more on the UW YAML language, see the <a href="https://uwtools.readthedocs.io/en/main/sections/user_guide/yaml/rocoto.html">Defining a Rocoto Workflow</a> page.
<!--cell 4-->

In [3]:
%%bash
cat fixtures/rocoto/simple-workflow.yaml

workflow:
  attrs:
    realtime: false
    scheduler: slurm
  cycledef:
    - spec: 202410290000 202410300000 06:00:00
  log: logs/test.log
  tasks:
    task_greet:
      command: echo Hello, World!
      cores: 1
      walltime: 00:00:10


Using `rocoto.realize()`, the UW YAML from above is translated to Rocoto XML. A `config` may be given as a string path, <a href="https://docs.python.org/3/library/pathlib.html#pathlib.Path">Path</a> object, or `YAMLConfig` object. Likewise, the path to the XML output file may be defined by providing `output_file` with a string path or <a href="https://docs.python.org/3/library/pathlib.html#pathlib.Path">Path</a> object. If `output_file` is omitted or set to `None`, the XML will be written to `stdout`. Both the input config and the output Rocoto XML are validated against appropriate schemas. The number of schema-validation errors, as well as details on the errors (if any), are reported.

The `stdin_ok` argument can be used to permit configs to be read from `stdin` when `config` is omitted or set to `None`, but this is a rare use case beyond the scope of this notebook that will not be discussed here.
<!--cell 6-->

In [4]:
rocoto.realize(
    config=Path('fixtures/rocoto/simple-workflow.yaml'),
    output_file='tmp/simple-workflow.xml'
)

[2024-11-19T23:15:43]     INFO 0 UW schema-validation errors found in Rocoto config
[2024-11-19T23:15:43]     INFO 0 Rocoto XML validation errors found


True

The resulting Rocoto XML file is shown below. An XML header is automatically added without the need to explicitly define it in the UW YAML. Note how blocks from UW YAML language have been transformed into XML tags along with their attributes and values. For example, attributes defined by the `attrs:` block in the UW YAML have become attributes of the `<workflow>` tag in the XML.

For more information on Rocoto workflows, including tags like the ones shown here and thier attributes, see the <a href="https://christopherwharrop.github.io/rocoto/">Rocoto Documentation</a>.
<!--cell 8-->

In [5]:
%%bash
cat tmp/simple-workflow.xml

<?xml version='1.0' encoding='utf-8'?>
<workflow realtime="False" scheduler="slurm">
  <cycledef>202410290000 202410300000 06:00:00</cycledef>
  <log>logs/test.log</log>
  <task name="greet">
    <cores>1</cores>
    <walltime>00:00:10</walltime>
    <command>echo Hello, World!</command>
    <jobname>greet</jobname>
  </task>
</workflow>


The following workflow is missing required components: `workflow` doesn't contain a `realtime` attribute, a `log:` block isn't included, and `task_greet` doesn't include a `command`.
<!--cell 10-->

In [6]:
%%bash
cat fixtures/rocoto/err-workflow.yaml

workflow:
  attrs:
    scheduler: slurm
  cycledef:
    - spec: 202410290000 202410300000 06:00:00
  tasks:
    task_greet:
      cores: 1
      walltime: 00:00:10


When validation errors occur, `realize()` raises an exception indicating what type of error occurred. Here, the YAML validation errors cause a `UWConfigError` to be raised. The number of validation errors present and their locations within the workflow structure are also shown.
<!--cell 12-->

In [7]:
try:
    rocoto.realize(
        config=Path('fixtures/rocoto/err-workflow.yaml'),
        output_file='tmp/err-workflow.xml'
    )
except Exception as e:
    print(e, type(e))

[2024-11-19T23:15:43]    ERROR 3 UW schema-validation errors found in Rocoto config
[2024-11-19T23:15:43]    ERROR Error at workflow -> attrs:
[2024-11-19T23:15:43]    ERROR   'realtime' is a required property
[2024-11-19T23:15:43]    ERROR Error at workflow -> tasks -> task_greet:
[2024-11-19T23:15:43]    ERROR   'command' is a required property
[2024-11-19T23:15:43]    ERROR Error at workflow:
[2024-11-19T23:15:43]    ERROR   'log' is a required property


YAML validation errors <class 'uwtools.exceptions.UWConfigError'>


### Entities and Cyclestrings

Constants called entities may be defined so that their values can be referenced throughout the rest of the Rocoto XML. These are defined in an `entities:` block, with their names and values given as keys and values in the YAML. Below, an entity named `LOG` is defined with a string value. This value is referred elsewhere in the Rocoto XML with the syntax `&ENTITY_NAME;`. In this case, note the `&LOG;` entity within the `log:` block.
<!--cell 14-->

In [8]:
%%bash
cat fixtures/rocoto/ent-workflow.yaml

workflow:
  attrs:
    realtime: false
    scheduler: slurm
  cycledef:
    - spec: 202410290000 202410300000 06:00:00
  entities:
    LOG: "2024-10-29/test06:00:00.log"
  log: logs/&LOG;
  tasks:
    task_greet:
      command: echo Hello, World!
      cores: 1
      walltime: 00:00:10


Cycle strings represent dynamic cycle time components that are represented by specific flags and are rendered when Rocoto runs the XML. Here, the `LOG` entity contains `@Y`, `@m`, `@d` and `@X` flags that represent the year, month, day, and time relative to a cycle defined by the `cycledefs:` entry. For more information on these flags, see the <a href="https://christopherwharrop.github.io/rocoto/">Rocoto Documentation</a>. A `cyclestr:` block is used to mark a string containing cycle string flags for rendering when Rocoto runs. Here, since the `LOG` entity contains these flags, a `cyclestr:` block within the `log:` block indicates that the flags should be rendered when Rocoto runs. This string itself is contained in a `value` key.
<!--cell 16-->

In [9]:
%%bash
cat fixtures/rocoto/ent-cs-workflow.yaml

workflow:
  attrs:
    realtime: false
    scheduler: slurm
  cycledef:
    - spec: 202410290000 202410300000 06:00:00
  entities:
    LOG: "@Y-@m-@d/test@X.log"
  log: 
    cyclestr:
      value: logs/&LOG;
  tasks:
    task_greet:
      command: echo Hello, World!
      cores: 1
      walltime: 00:00:10


As before, the `realize()` function transforms the UW YAML into Rocoto XML.
<!--cell 18-->

In [10]:
rocoto.realize(
    config='fixtures/rocoto/ent-cs-workflow.yaml',
    output_file='tmp/ent-cs-workflow.xml'
)

[2024-11-19T23:15:43]     INFO 0 UW schema-validation errors found in Rocoto config
[2024-11-19T23:15:43]     INFO 0 Rocoto XML validation errors found


True

Here we see the Rocoto XML with the addition of an entity and a `<cyclestr>` tag. The entity is defined in the header of the XML document, and the `<cyclestr>` tag is added within the `<log>` tag.
<!--cell 20-->

In [11]:
%%bash
cat tmp/ent-cs-workflow.xml

<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE workflow [
  <!ENTITY LOG "@Y-@m-@d/test@X.log">
]>
<workflow realtime="False" scheduler="slurm">
  <cycledef>202410290000 202410300000 06:00:00</cycledef>
  <log>
    <cyclestr>logs/&LOG;</cyclestr>
  </log>
  <task name="greet">
    <cores>1</cores>
    <walltime>00:00:10</walltime>
    <command>echo Hello, World!</command>
    <jobname>greet</jobname>
  </task>
</workflow>


### Tasks and Dependencies

A `tasks:` block defines all tasks in a Rocoto workflow. Each task is contained within its own block, where the key is `task_` followed by the name of the task. There are two tasks in the example below, `task_bacon` and `task_eggs`. In the Rocoto XML, two separate `<task>` tags will be created with their `name` attributes set to "bacon" and "eggs" respectively. Each task must contain a command to execute indicated by the `command` key and an amount of time to request when submitting the task for execution indicated by the `walltime` key. Each task must also contain either a `cores`, `nodes`, or `native` key to request a given number of nodes/cores used to execute the task. The `task_bacon:` block below requests 1 core, while the `task_eggs:` block requests 4 cores on 1 node.
<!--cell 22-->

In [12]:
%%bash
cat fixtures/rocoto/tasks-workflow.yaml

workflow:
  attrs:
    realtime: false
    scheduler: slurm
  cycledef:
    - spec: 202410290000 202410300000 06:00:00
  log: logs/test.log
  tasks:
    task_bacon:
      command: "echo Cooking bacon..."
      cores: 1
      walltime: 00:00:10
    task_eggs:
      command: "echo Cooking eggs..."
      nodes: 1:ppn=4
      walltime: 00:00:10


Each task may optionally have one or more dependencies that must be accounted for before a task runs. These are specified using a `dependency:` block within the `task_*` block that the dependencies apply to. Dependencies are structured as boolean expressions using a variety of keys that may define specific types of dependencies like task or data dependencies. They may also group dependencies together using boolean operators keys like `and` or `or`. For a full list of possible tags, see the <a href="https://christopherwharrop.github.io/rocoto/">Rocoto Documentation</a>. 

Below, the `task_eggs:` block includes one data dependency indicated by the `datadep` key, plus a `value` key that identifies the required data. The `task_serve:` block includes two task dependencies for the bacon and eggs tasks. Since there are multiple dependencies here, they need to be contained within a boolean operator block that describes how to deal with the group of dependencies which may not all have the same level of completion. Here the `and:` block indicates that all of the individual tasks (i.e. `task_eggs`) within need to be completed. The two task dependencies must have unique names since they exist at the same level, and they are differentiated here using the `_name` suffix. To prevent circular dependencies, task dependencies must have a `task` attribute that indicates the name of a task that is already defined above it. Similar to the `workflow:` block, an `attrs:` block is used here to add attributes to `taskdep`, and the `task` key specifies the value of the task attribute.
<!--cell 24-->

In [13]:
%%bash
cat fixtures/rocoto/tasks-deps-workflow.yaml

workflow:
  attrs:
    realtime: false
    scheduler: slurm
  cycledef:
    - spec: 202410290000 202410300000 06:00:00
  log: logs/test.log
  tasks:
    task_bacon:
      command: "echo Cooking bacon..."
      cores: 1
      walltime: 00:00:10
    task_eggs:
      command: "echo Cooking eggs..."
      nodes: 1:ppn=4
      walltime: 00:00:10
      dependency:
        datadep:
          value: eggs_recipe.txt
    task_serve:
      command: "echo Serving breakfast..."
      cores: 2
      walltime: 00:00:01
      dependency:
        and:
          taskdep_eggs:
            attrs:
              task: bacon
          taskdep_bacon:
            attrs:
              task: eggs


Here, the `realize()` function transforms this UW YAML into Rocoto XML.
<!--cell 26-->

In [14]:
rocoto.realize(
    config='fixtures/rocoto/tasks-deps-workflow.yaml',
    output_file='tmp/tasks-deps-workflow.xml'
)

[2024-11-19T23:15:43]     INFO 0 UW schema-validation errors found in Rocoto config
[2024-11-19T23:15:43]     INFO 0 Rocoto XML validation errors found


True

Note how each task has its own tag in the Rocoto XML document, with name attributes that came from the unique suffixes of the `task_` keys. While the bacon task contains no `<dependency>` tag, the eggs and serve tasks do. Within the serve task's dependencies, the `<and>` tag describes the need for both of the two task dependencies to be fulfilled. Each `<taskdep>` task dependency uses the `task` attribute to point to a previously named task. 
<!--cell 28-->

In [15]:
%%bash
cat tmp/tasks-deps-workflow.xml

<?xml version='1.0' encoding='utf-8'?>
<workflow realtime="False" scheduler="slurm">
  <cycledef>202410290000 202410300000 06:00:00</cycledef>
  <log>logs/test.log</log>
  <task name="bacon">
    <cores>1</cores>
    <walltime>00:00:10</walltime>
    <command>echo Cooking bacon...</command>
    <jobname>bacon</jobname>
  </task>
  <task name="eggs">
    <nodes>1:ppn=4</nodes>
    <walltime>00:00:10</walltime>
    <command>echo Cooking eggs...</command>
    <jobname>eggs</jobname>
    <dependency>
      <datadep>eggs_recipe.txt</datadep>
    </dependency>
  </task>
  <task name="serve">
    <cores>2</cores>
    <walltime>00:00:01</walltime>
    <command>echo Serving breakfast...</command>
    <jobname>serve</jobname>
    <dependency>
      <and>
        <taskdep task="bacon"/>
        <taskdep task="eggs"/>
      </and>
    </dependency>
  </task>
</workflow>


### Metatasks

Metatasks define one or more tasks that are similar to one another using a substitution of values. Like tasks, metatask block keys use a suffix after an underscore to name a particular metatask. The metatask in the example below will have a `name=breakfast` attribute in its `<metatask>` tag in the XML document. The values to substitute are defined in a `var:` block, and this block contains one or more keys representing the name of a list of values. The values in the list are separated by spaces. The number of tasks defined by a metatask is equal to the number of values in any list in the `var:` block. In the example below, two lists named `food` and `prepare` contain three values each, so three tasks are defined by this metatask. It is necessary that each list defined in a metatask has the same number of values. The values are referenced using the name of the list that contains the values bracketed by pound signs, as seen in the `task_#food#` key and in the following `command` string. 
<!--cell 30-->

In [16]:
%%bash
cat fixtures/rocoto/meta-workflow.yaml

workflow:
  attrs:
    realtime: false
    scheduler: slurm
  cycledef:
    - spec: 202410290000 202410300000 06:00:00
  log: logs/test.log
  tasks:
    metatask_breakfast:
      var:
        food: biscuits OJ hashbrowns
        prepare: bake pour fry
      task_#food#:
        command: "echo It's time for breakfast, #prepare# the #food#!"
        cores: 1
        walltime: 00:00:03


Similar to previous examples, `realize()` transforms the metatask workflow to Rocoto XML.
<!--cell 32-->

In [17]:
rocoto.realize(
    config='fixtures/rocoto/meta-workflow.yaml',
    output_file='tmp/meta-workflow.xml'
)

[2024-11-19T23:15:43]     INFO 0 UW schema-validation errors found in Rocoto config
[2024-11-19T23:15:43]     INFO 0 Rocoto XML validation errors found


True

The XML document below shows how the `<metatask>` tag and each of its child tags efficiently define multiple similar tasks. Like previous examples, name attributes for task-related tags are created here from the suffixes of their keys in the UW YAML. Note that `<var>` names were derived from full key names in the `var:` block. The `<task>`, `<command>`, and `<jobname>` tags each contain strings that will receive substitute values wherever the placeholders `#food#` or `#prepare#` appear.
<!--cell 34-->

In [18]:
%%bash
cat tmp/meta-workflow.xml

<?xml version='1.0' encoding='utf-8'?>
<workflow realtime="False" scheduler="slurm">
  <cycledef>202410290000 202410300000 06:00:00</cycledef>
  <log>logs/test.log</log>
  <metatask name="breakfast">
    <var name="food">biscuits OJ hashbrowns</var>
    <var name="prepare">bake pour fry</var>
    <task name="#food#">
      <cores>1</cores>
      <walltime>00:00:03</walltime>
      <command>echo It's time for breakfast, #prepare# the #food#!</command>
      <jobname>#food#</jobname>
    </task>
  </metatask>
</workflow>


Metatasks may be nested to create tasks using combinatorial lists of variables. This will create sets of tasks where each `var` value in a parent metatask applies to every child metatask. In the example below, a parent metatask contains a `var` named `process` with values `bake`, `cool`, and `store`. Its child metatask contains a `var` named `food` with values `cookies` and `cakes`. Tasks will be created to bake, cool, and store both cookies and cakes. Note that `var:` blocks at different levels do not necessarily contain the same number of values. 
<!--cell 36-->

In [19]:
%%bash
cat fixtures/rocoto/meta-nested-workflow.yaml

workflow:
  attrs:
    realtime: false
    scheduler: slurm
  cycledef:
    - spec: 202410290000 202410300000 06:00:00
  log: logs/test.log
  tasks:
    metatask_process:
      var:
        process: bake cool store
      metatask_process_food:
        var:
          food: cookies cakes
        task_#process#_#food#:
          command: "echo It's time to #process# the #food#."
          nodes: 1:ppn=4
          walltime: 00:00:30


## Validating Workflows

The `rocoto.validate()` function checks the content of a Rocoto XML file against its schema, detecting and reporting any errors.
<!--cell 38-->

In [20]:
help(rocoto.validate)

Help on function validate in module uwtools.api.rocoto:

validate(xml_file: Union[str, pathlib.Path, NoneType] = None, stdin_ok: bool = False) -> bool
    Validate purported Rocoto XML file against its schema.

    :param xml_file: Path to XML file (``None`` or unspecified => read ``stdin``).
    :param stdin_ok: OK to read from ``stdin``?
    :return: ``True`` if the XML conforms to the schema, ``False`` otherwise.



The following Rocoto XML is identical that generated in the [Building Rocoto Workflows with UW YAML](#Building-Rocoto-Workflows-with-UW-YAML) section above.
<!--cell 40-->

In [21]:
%%bash
cat fixtures/rocoto/simple-workflow.xml

<?xml version='1.0' encoding='utf-8'?>
<workflow realtime="False" scheduler="slurm">
  <cycledef>202410290000 202410300000 06:00:00</cycledef>
  <log>logs/test.log</log>
  <task name="greet">
    <cores>1</cores>
    <walltime>00:00:10</walltime>
    <command>echo Hello, World!</command>
    <jobname>greet</jobname>
  </task>
</workflow>


`validate()` accepts <a href="https://docs.python.org/3/library/pathlib.html#pathlib.Path">Path</a> objects or string paths passed via the `xml_file` parameter. (If `xml_file` is omitted or `None`, and `stdin_ok` is `True`, XML will be read from `stdin`, but this is a rare use case that won't be covered here.) The function returns `True` if the XML is validated without any errors, and `False` otherwise. The number of schema-validation errors, as well as details on the errors (if any), are reported.
<!--cell 42-->

In [22]:
rocoto.validate(
    xml_file="fixtures/rocoto/simple-workflow.xml"
)

[2024-11-19T23:15:43]     INFO 0 Rocoto XML validation errors found


True

The following Rocoto XML is missing two required components: `<workflow>`'s `scheduler` attribute and a `<cycledef>` tag.
<!--cell 44-->

In [23]:
%%bash
cat fixtures/rocoto/err-workflow.xml

<?xml version='1.0' encoding='utf-8'?>
<workflow realtime="False">
  <log>logs/test.log</log>
  <task name="greet">
    <cores>1</cores>
    <walltime>00:00:10</walltime>
    <command>echo Hello, World!</command>
    <jobname>greet</jobname>
  </task>
</workflow>


When Rocoto validation errors are found, `validate()` returns `False`. Details are reported regarding the types of errors and number of errors found. For more information on required Rocoto XML components, see the <a href="https://christopherwharrop.github.io/rocoto/">Rocoto Documentation</a>.
<!--cell 46-->

In [24]:
rocoto.validate(
    xml_file=Path("fixtures/rocoto/err-workflow.xml")
)

[2024-11-19T23:15:43]    ERROR 4 Rocoto XML validation errors found
[2024-11-19T23:15:43]    ERROR <string>:2:0:ERROR:RELAXNGV:RELAXNG_ERR_ATTRVALID: Element workflow failed to validate attributes
[2024-11-19T23:15:43]    ERROR <string>:2:0:ERROR:RELAXNGV:RELAXNG_ERR_NOELEM: Expecting an element cycledef, got nothing
[2024-11-19T23:15:43]    ERROR <string>:2:0:ERROR:RELAXNGV:RELAXNG_ERR_INTERSEQ: Invalid sequence in interleave
[2024-11-19T23:15:43]    ERROR <string>:2:0:ERROR:RELAXNGV:RELAXNG_ERR_CONTENTVALID: Element workflow failed to validate content
[2024-11-19T23:15:43]    ERROR Invalid Rocoto XML:
[2024-11-19T23:15:43]    ERROR  1 <?xml version='1.0' encoding='utf-8'?>
[2024-11-19T23:15:43]    ERROR  2 <workflow realtime="False">
[2024-11-19T23:15:43]    ERROR  3   <log>logs/test.log</log>
[2024-11-19T23:15:43]    ERROR  4   <task name="greet">
[2024-11-19T23:15:43]    ERROR  5     <cores>1</cores>
[2024-11-19T23:15:43]    ERROR  6     <walltime>00:00:10</walltime>
[2024-11-19T23

False