# Using other SoS actions to control the execution of steps

* **Difficulty level**: easy
* **Time need to lean**: 10 minutes or less
* **Key points**:
  * Normal `break`, `continue`, `return` structures cannot be used in the implicit loops of substeps
  * Action `warn_if` gives an warning under specified conditions
  * Action `fail_if` raises an exception that terminates the substep and therefore the entire workflow if a condition is met
  * Action `done_if` assumes that the substep is completed and ignores the rest of the statements
  * Action `skip_if` skips the substep and removed `_output` even if the `_output` has been generated

## Control structures of substeps 

In [1]:
# create a few input files for examples
!touch a_0.txt a_1.txt a_2.txt a_3.txt

SoS allows the use of arbitrary Python statements in step processes. For example, suppose you are processing a number of input files and some of them contain errors and have to be ignored, you can write a workflow step as follows:

In [2]:
infiles = [f'a_{i}.txt' for i in range(4)]
outfiles = []
for idx, infile in enumerate(infiles):
    if idx == 2:  # problematic step
        continue
    out = f'a_{idx}.out'
    sh(f'echo generating {out}\ntouch {out}')
    outfiles.append(out)

generating a_0.out


generating a_1.out


generating a_3.out


However, as we have discussed in tutorials [How to include scripts in different langauges in SoS workflows](doc/user_guide/scripts_in_sos.html) and [How to specify input and output files and process input files in groups](doc/user_guide/input_substeps.html), steps written with loops and function calls like `sh()` are not very readable because the scripts are not clearly presented and users have to follow the logics of the code. Also, the input files are not processed in parallel so the step is not executed efficiently.

The more SoS way to implement the step is to use input and output statements and script format of function calls as follows:

In [3]:
input: [f'a_{i}.txt' for i in range(4)], group_by=1
output: _input.with_suffix('.out')

sh: expand=True
    echo generating {_output}
    touch {_output}


generating a_0.out


generating a_1.out


generating a_2.out


generating a_3.out


The problem is that substeps are processed concurrently and we do not yet have a way to treat them differentially and introduce the logic of

```
    if idx == 2:  # problematic step
        continue
```

##  Action `skip_if`

<div class="bs-callout bs-callout-primary" role="alert">
    <h4>Action <code>skip_if(expr, msg)</code></h4>
    <p>Action <code>skip_if(expr, msg)</code> skips the execution of the substep if condition <code>expr</code> is met. It also assume that the substep generates no output and set <code>_output</code> to empty. The usage pattern of <code>skip_if</code> is</p>
    <pre>
    output: ...
    skip_if(...)
    statements to produce _output
    </pre>
</div>

The `skip_if` action allows you to skip certain substeps with certain condition. The condition can involve a (mostly) hidden variable `_index` which is the index of the substep. For example, the aforementioned step can be written as 

In [4]:
input: [f'a_{i}.txt' for i in range(4)], group_by=1
output: _input.with_suffix('.out')

skip_if(_index == 2, 'input 2 has some problem')

sh: expand=True
    echo generating {_output}
    touch {_output}


generating a_0.out


generating a_1.out


generating a_3.out


It is important to remember that `skip_if` assumes that substep output is not generated and adjust `_output` accordingly. For example, if you pass the output of the step to another step, you will notice that the output of step `2` is empty.

In [5]:
%run -v0
[10]
input: [f'a_{i}.txt' for i in range(4)], group_by=1
output: _input.with_suffix('.out')

skip_if(_index == 2, 'input 2 has some problem')

sh: expand=True
    echo generating {_output}
    touch {_output}

[20]
print(f'Input of {_index} is {_input}')

[32m[[0m[97m.[0m [32m.[0m [32m#[0m[32m#[0m[32m][0m 2 steps processed (7 jobs completed)


## Action `done_if`

<div class="bs-callout bs-callout-primary" role="alert">
    <h4>Action <code>done_if(expr, msg)</code></h4>
    <p>Action <code>done_if(expr, msg)</code> ignores the rest of the step process, assuming that the substep has been completed with output generated. The usage pattern of <code>done_if</code> is</p>
    <pre>
    output: ...
    statements to produce _output
    done_if(...)
    additional statements
    </pre>
</div>

A similar action is `done_if`, which also ignores the rest of the step process but assumes that the output has already been generated. Consequently, this action does not adjust `_output`. For example, if some more work is only applied to a subset of substeps, you can use `done_if` to execute additional code to only selected substeps.

In [6]:
%run -v0
[10]
input: [f'a_{i}.txt' for i in range(4)], group_by=1
output: _input.with_suffix('.out')

sh: expand=True
    echo generating {_output}
    touch {_output}

done_if(_index != 2, 'input 2 need to be fixed')

sh: expand=True
    echo "Fixing {_output}"

[20]
print(f'Input of {_index} is {_input}')

[32m[[0m[97m.[0m[97m.[0m[97m.[0m   [32m.[0m [32m#[0m[32m#[0m[32m][0m 2 steps processed (5 jobs completed)


##  Action `warn_if`

<div class="bs-callout bs-callout-primary" role="alert">
    <h4>Action <code>warn_if(expr, msg)</code></h4>
    <p>Action <code>warn_if(expr, msg)</code> gives an warning if a specified condition is met.</p>
</div>

Action `warn_if` is very easy to use. It just produces an warning message if something suspicious is detected.

In [7]:
input: [f'a_{i}.txt' for i in range(4)], group_by=1
output: _input.with_suffix('.out')

sh: expand=True
    echo generating {_output}
    touch {_output}

warn_if(_index == 2, 'input 2 might be problematic')

generating a_0.out


generating a_1.out


generating a_2.out


generating a_3.out


##  Action `fail_if`

<div class="bs-callout bs-callout-primary" role="alert">
    <h4>Action <code>fail_if(expr, msg)</code></h4>
    <p>Action <code>fail_if(expr, msg)</code> terminates the execution of workflow if a condition is met.</p>
</div>

Action `fail_if` terminates the execution of the workflow under certain conditions. It kills all other processes (e.g. working substeps or nested workflows) and it should be used with caution if is unsafe to terminate the workflow abruptly.

For example, if we decide to terminate the entire workflow if we detect something wrong with an input file, we can do

In [8]:
input: [f'a_{i}.txt' for i in range(4)], group_by=1
output: _input.with_suffix('.out')

sh: expand=True
    echo generating {_output}
    touch {_output}

fail_if(_index == 2, 'input 2 might be problematic')

generating a_0.out


generating a_1.out


generating a_3.out


[(id=6373461632411795663, index=2)]: input 2 might be problematic


## Further reading

* [How to include scripts in different langauges in SoS workflows](doc/user_guide/scripts_in_sos.html)
* [How to specify input and output files and process input files in groups](doc/user_guide/input_substeps.html)