# How to avoid or force the re-execution of executed steps

* **Difficulty level**: easy
* **Time need to lean**: 10 minutes or less
* **Key points**:
  * Runtime signatures avoids repeated execution of steps
  * Option `-s` controls the behavior of signatures
  

## Runtime signature

One of the most annonying problems with the development and execution of workflows is that it can take very long times to execute them. What makes things worse is that we frequently need to re-run the workflow with different paremeters and even different tools -- it can be really time-consuming to re-execute the whole workflow repeatedly, but it is also very error-prone to repeat selected steps of a workflow. 

SoS addresses this problem by using **runtime signatures to keep track of execution units**, namely the input, output, and dependent targets, and related SoS variables of a piece of workflow. SoS tracks execution of statements at the step level for each [substep](input_statement.html) and saves runtime signatures at a folder called `.sos` under the project directory.

Before running any examples, let us clear all runtime signatures of workflows executed under the current directory.

In [1]:
!sos remove -s

INFO: Signatures from 245 substeps are removed.


### `ignore` mode

SoS workflows can be executed in batch mode and in interactive mode using the SoS kernel in Jupyter notebook or qtconsole. Because the SoS kernel is mostly used to execute short statements in SoS and other kernels, runtime signatures are by default set to `ignore` in interactive mode (and to `default` in batch mode).

A consequence of this setting is that **scratch steps will always be executed**.

In [2]:
output:  "temp/result.txt"
sh: expand=True
    dd if=/dev/urandom of={_output} count=2000

2000+0 records in


2000+0 records out


1024000 bytes transferred in 0.092426 secs (11079155 bytes/sec)


In [3]:
output:  "temp/result.txt"
sh: expand=True
    dd if=/dev/urandom of={_output} count=2000

2000+0 records in


2000+0 records out


1024000 bytes transferred in 0.056830 secs (18018599 bytes/sec)


### `default` mode

When you execute workflows with magics `%run` and `%sosrun`, you are running workflows in separate processes and the default mode is `default`. In this mode, signatures are created and validated, and executed steps will not be re-executed.

Let us create a workflow that saves two files `temp/result.txt` and `temp/size.txt`, with content of the file controlled by parameter `size`.

In [4]:
%save test_signature.sos -f

import os
parameter: size=1000
[10]
output:  "temp/result.txt"
sh: expand=True
    dd if=/dev/urandom of={_output} count={size}

[20]
output:  'temp/size.txt'
with open(_output[0], 'w') as sz:
    sz.write(f"{_input}: {os.path.getsize(_input[0])}\n")

When the workflow is first executed, both steps will be executed:

In [5]:
%runfile test_signature

1000+0 records in


1000+0 records out


512000 bytes transferred in 0.027222 secs (18808373 bytes/sec)


Now, if we re-run the last script, nothing changes and it takes a bit of time to execute the script.

In [6]:
%runfile test_signature

However, if you use a different parameter (not the default `size=1000`), the steps would be rerun

In [7]:
%runfile test_signature --size 2000

2000+0 records in


2000+0 records out


1024000 bytes transferred in 0.048703 secs (21025418 bytes/sec)


The signature is at the step level so if you change the second step of the script, the first step would still be skipped. Note that the step is independent of the script executed so a step would be skipped even if its signature was saved by the execution of another workflow. The signature is clever enough to allow minor changes such as addition of spaces and comments.

In [8]:
%run --size 2000
parameter: size=1000
import os

[10]
output:  "temp/result.txt"
# added comment
sh: expand=True
    dd if=/dev/urandom of={_output} count={size}

[20]
output:  'temp/size.txt'
with open(_output[0], 'w') as sz:
    sz.write(f"Modified {_input}: {os.path.getsize(_input[0])}\n")

### `assert` mode

The `assert` mode is used to detect if anything has been changed after the execution of a workflow. For example, let us execute the workflow without parameter,

In [9]:
%runfile test_signature -v1

1000+0 records in


1000+0 records out


512000 bytes transferred in 0.026079 secs (19632700 bytes/sec)


and the signature check would succeed

In [10]:
%runfile test_signature -s assert -v1

If we execute the workflow with another parameter

In [11]:
%runfile test_signature --size 3000 -v1

3000+0 records in


3000+0 records out


1536000 bytes transferred in 0.081370 secs (18876709 bytes/sec)


signature checking would fail because the last signature was saved with option `--size 3000`.

In [12]:
%runfile test_signature -s assert

[91mERROR[0m: [91m[10]: [10]: Signature mismatch: Target temp/result.txt does not exist or does not match saved signature (1577120766.9563751, 512000, '3241862808b12cf4')


[default]: Exits with 1 pending step (20)[0m


and the signature checking would be fine with the parameter.

In [13]:
%runfile test_signature --size 3000 -s assert

Now if you change one of the output files, sos would fail with an error message because `temp/result.txt` has been changed.

In [14]:
!echo "aaa" >> temp/result.txt
%runfile test_signature --size 3000 -s assert

[91mERROR[0m: [91m[10]: [10]: Signature mismatch: Target temp/result.txt does not exist or does not match saved signature (1577120774.9189792, 1536000, 'a67c2c34d70a3daf')


[default]: Exits with 1 pending step (20)[0m


### `force` mode

The `force` signature mode ignores existing signatures to re-run the workflow, and saves new signatures. This is needed when you would like to forcefully re-run all the steps to generate another set of output if outcome of some steps is random, or to re-run the workflow because of changes that is not tracked by SoS, for example after you have installed a new version of a program.

In [15]:
%runfile test_signature --size 2000 -s force

2000+0 records in


2000+0 records out


1024000 bytes transferred in 0.046258 secs (22136725 bytes/sec)


### `build` mode

The `build` mode is somewhat opposite to the `force` mode in that it creates (or overwrite existing signature if exists) with existing output files. It is useful, for example, if you are adding a step to a workflow that you have tested outside of SoS (without signature) but do not want to rerun it, or if for some reason you have lost your signature files and would like to reconstruct them from existing outputs.

In [16]:
%runfile test_signature --size 2000 -s build -v2

This mode can introduce erraneous files to the signatures because it does not check the validity of the incorporated files. For example, SoS would not complain if you change parameter and replace `temp/result.txt` with something else.

## Further reading

* [SoS workflows](sos_workflows.html)