# Title

* **Difficulty level**: easy
* **Time need to lean**: 10 minutes or less
* **Key points**:
  * a
  

## Runtime signature

One of the most annonying problems with the development and execution of workflows is that it can take very long times to execute then. What makes things worse is that we frequently need to re-run the workflow with different paremeters and even different tools -- it can be really time-consuming to re-execute the whole workflow repeatedly, but it is also very error-prone to repeat selected steps of a workflow. 

SoS addresses this problem by using <font color='red'>runtime signatures</font> to keep track of <font color='red'>execution units</font>, namely the input, output, and dependent targets, and related SoS variables of a piece of workflow. SoS tracks execution of statements at the step level for each [input group](../documentation/SoS_Step.html) and saves runtime signatures at a folder called `.sos` under the project directory. The runtime signatures are used to

1. Avoid repeated execution of identical units, and
2. Keep track of workflow related files for project management

This tutorial focuses on the first usage. The second one would be described in detail in [Project Management](Project_Management.html).

### `ignore` mode

SoS workflows can be executed in batch mode and in interactive mode using the SoS kernel in Jupyter notebook or qtconsole. Because the SoS kernel is mostly used to execute short statements in SOS and other kernels, runtime signatures are by default set to `ignore` in interactive mode (and to `default` in batch mode).

Let us create a temporary directory and execute a workflow that take a bit of time to execute. This is done in the default `ignore` signature mode of the Jupyter notebook

In [7]:
%sandbox --dir tmp

!rm -rf .sos/.runtime
![ -d temp ] || mkdir temp

In [8]:
%sandbox --dir tmp
%run
parameter: size=1000
[10]
output:  "temp/result.txt"
sh: expand=True
    dd if=/dev/urandom of={_output} count={size}

[20]
output:  'temp/size.txt'
with open(_output[0], 'w') as sz:
    sz.write(f"{_input}: {os.path.getsize(_input[0])}\n")

1000+0 records in
1000+0 records out
512000 bytes transferred in 0.041920 secs (12213687 bytes/sec)


Now, if we re-run the last script, nothing changes and it takes a bit of time to execute the script.

In [9]:
%sandbox --dir tmp
%rerun

INFO: [32mdefault_10[0m (index=0) is [32mignored[0m due to saved signature
INFO: [32mdefault_20[0m (index=0) is [32mignored[0m due to saved signature


### `default` mode

Now let us switch to `default` mode of signature by running the script with option `-s default`. When you run the script for the first time, it would execute normally and save runtime signature of the steps.

In [10]:
%sandbox --dir tmp
%rerun -s default

INFO: [32mdefault_10[0m (index=0) is [32mignored[0m due to saved signature
INFO: [32mdefault_20[0m (index=0) is [32mignored[0m due to saved signature


but both steps would be ignored. Here we use `-v2` to show the `ignored` message. This time we use magic `%set` to make option `-s default` persistent so that we do not have to specify it each time.

In [11]:
%sandbox --dir tmp
%set -s default
%rerun -v2

Set sos options to "-s default"


INFO: [32mdefault_10[0m (index=0) is [32mignored[0m due to saved signature
INFO: [32mdefault_20[0m (index=0) is [32mignored[0m due to saved signature


However, if you use a different parameter (not the default `size=1000`), the steps would be rerun

In [12]:
%sandbox --dir tmp
%rerun -v2 --size 2000

2000+0 records in
2000+0 records out
1024000 bytes transferred in 0.085565 secs (11967497 bytes/sec)


The signature is at the step level so if you change the second step of the script, the first step would still be skipped. Note that the step is independent of the script executed so a step would be skipped even if its signature was saved by the execution of another workflow. The signature is clever enough to allow minor changes such as addition of spaces and comments.

In [13]:
%sandbox --dir tmp
%run --size 2000 -v2
parameter: size=1000
[10]
output:  "temp/result.txt"
# added comment
sh: expand=True
    dd if=/dev/urandom of={_output} count={size}

[20]
output:  'temp/size.txt'
with open(_output[0], 'w') as sz:
    sz.write(f"Modified {_input}: {os.path.getsize(_input[0])}\n")

INFO: [32mdefault_10[0m (index=0) is [32mignored[0m due to saved signature


### `assert` mode

The `assert` mode is used to detect if anything has been changed after the execution of a workflow. For example,

In [14]:
%sandbox --dir tmp
%set -s assert
%rerun --size 2000 -v2

Set sos options to "-s assert"


INFO: Step [32mdefault_10[0m (index=0) is [32mignored[0m with matching signature
INFO: Step [32mdefault_20[0m (index=0) is [32mignored[0m with matching signature


Now if you change one of the output files, sos would fail with an error message.

In [15]:
%sandbox --expect-error --dir tmp
!echo "aaa" >> temp/result.txt
%rerun --size 2000 -v2

Failed to process step output: "temp/result.txt" (Signature mismatch: File has changed temp/result.txt)


### `force` mode

The `force` signature mode ignores existing signatures and re-run the workflow. This is needed when you would like to forcefully re-run all the steps to generate another set of output if outcome of some steps is random, or to re-run the workflow because of changes that is not tracked by SoS, for example after you have installed a new version of a program.

In [16]:
%sandbox --dir tmp
%set
%rerun --size 2000 -s force

Reset sos options from "-s assert" to ""
2000+0 records in
2000+0 records out
1024000 bytes transferred in 0.088850 secs (11525039 bytes/sec)


### `build` mode

The `build` mode is somewhat opposite to the `force` mode in that it creates (or overwrite existing signature if exists) with existing output files. It is useful, for example, if you are adding a step to a workflow that you have tested outside of SoS (without signature) but do not want to rerun it, or if for some reason you have lost your signature files and would like to reconstruct them from existing outputs.

In [17]:
%sandbox --dir tmp
%rerun --size 2000 -s build -v2

INFO: Step [32mdefault_10[0m (index=0) is [32mignored[0m with signature constructed
INFO: Step [32mdefault_20[0m (index=0) is [32mignored[0m with signature constructed


This mode can introduce erraneous files to the signatures because it does not check the validity of the incorporated files. For example, SoS would not complain if you change parameter and replace `temp/result.txt` with something else.

In [18]:
%sandbox --dir tmp
!echo "something else" > temp/result.txt
%rerun -s build -v2

INFO: Step [32mdefault_10[0m (index=0) is [32mignored[0m with signature constructed
INFO: Step [32mdefault_20[0m (index=0) is [32mignored[0m with signature constructed


In [19]:
# cleanup
!rm -rf tmp

## Further reading

* 