## Auxiliary steps and makefile-style workflows

Auxiliary steps are special steps that are executed to provide [targets](Understanding_Targets.html) that are required by others.

For example, when the following step is executed with an input file `bamfile` (with extension `.bam`), it checks the existence of input file (`bamfile`), and a dependent index file (with extension `.bam.bai`).

```sos
[100 (call variant)]
input:   bamfile
depends: bamfile + '.bai'
run:
    # commands to call variants from 
    # input bam file
```

If the index file exists, generated either by another step or outside of SoS, sos will go ahead and execute the step. Otherwise  SoS will look in the script for a step that provides such a target, which would be similar to 

```sos
[index_bam : provides='{sample}.bam.bai']
input: '${sample}.bam'
run:
     samtools index ${input}
```

Such a step is defined by the **`provides`** option (or a **`shared`** option that will be discussed later) and is called an auxiliary step. In this particular case, if `bamfile="AS123.bam"`, the requested file would be `AS123.bam.bai`. Through the matching mechanism of option `provides`, the `index_bam` step would be executed with variable `sample="AS123"` and `output=["AS123.bam.bai"]`.

An auxiliary step can trigger other auxiliary steps that form a DAG (Directed Acyclic Graph). Acutually, you can write workflows in a make-file style with all auxiliary steps and execute workflows defined by targets. If you are familiar with Makefile, especially [snakemake](https://bitbucket.org/johanneskoester/snakemake), it can be natural for you to implement your workflow in this style. The advantage of SoS is that **you can use either or both forward-style and makefile-style steps to define your workflow** and take advantages of both approaches. For example, people frequently need to create fake targets to trigger steps that do not produce any target in a makefile-style workflow system, but this is not needed in SoS because steps defined in forward-style will always be executed. 

## Step option `provides`

An auxiliary step can be defined in the format of

```python
[step_name : provides=pattern]
```

where `pattern` can be

* A file pattern such as `"{sample}.bam.idx"`
* Other types of targets such as `executable("ms")`
* A list (sequence) of one or more file patterns and targets.

### File pattern

A file pattern is a filename with optional patterns with variable names enbraced in `{ }`. SoS matches filenames with the patterns and, if successful, assign variables with matched parts of the names. 

The following example first removes all local `*.bam` and `*.bam.bi` file, and executes three workflows defined by `targets`. We could execute them from command line
```
    sos run myscript --target TS1.bam
```
if the script is defined in `myscript.sos`, or from Jupyter notebook using
```
    %run --target TS1.bam
```
but using action `sos_run` allows us to execute multiple workflows as nested workflows.

In [1]:
!rm -f *.bam *.bam.bai

[compress: provides = '{filename}.bam']
print("> ${step_name} input to ${output}")
sh:
    touch ${output}

[index: provides = '{filename}.bam.bai']
input: "${filename}.bam"
print("> ${step_name} ${input} to ${output}")
sh:
    touch ${output}

[default]
print('Generating target TS1.bam')
sos_run(targets='TS1.bam')
print('\nGenerating target TS1.bam.bai')
sos_run(targets='TS1.bam.bai')
print('\nGenerating target TS2.bam.bai')
sos_run(targets='TS2.bam.bai')


Generating target TS1.bam
> compress input to TS1.bam

Generating target TS1.bam.bai
> index TS1.bam to TS1.bam.bai

Generating target TS2.bam.bai
> compress input to TS2.bam
> index TS2.bam to TS2.bam.bai


As you can see from the output, when the first workflow is executed with target `TS1.bam`, step `compress` is executed to produce it. Then the second workflow is executed with target `TS1.bam.bai`, step `index` is executed with `TS1.bam` generated from the first run. In the last run, both steps `compress` and `index` are executed to generate `TS2.bam`, and then `TS1.bam.bai`.

In [2]:
# clean up
!rm -f *.bam *.bam.bai

## Executing workflows with auxiliary steps

You can execute forward-style workflows by specifying workflow name (can be `default`) from command line. The workflow can trigger auxiliary steps for the generation of unavailable targets. The workflows are executed and you generally have the mind-setting of "how to process certain input file".

You can execute a makefile-style workflow by specifying one or more targets using option `-t` (target). SoS would collect all auxiliary steps in the script and create DAGs to generate these targets. Forward-style workflows defined in the script, if defined, would be ignored.

You can specify both a forward-style workflow and a `-t` option. In this case a DAG would be created with both the forward-style workflow, and steps to produce the specified targets. The DAG would then be trimmed to a sub-DAG that produce the specified targets before it is executed. Note that in this case the target can be any output produced by the forward-style workflow, and does not have to be generated by an auxiliary step.

In [8]:
!rm -rf *.txt *.ttt *.txt.gz 
%run default -t step.ttt.gz -f

[compress: provides='{name}.gz']
input: "${name}"
print("Running step ${step_name} to generate ${output} from ${input}")
run:
    touch ${output}

[txt: provides='{name}.txt']
print("Running step ${step_name} to generate ${output} from ${input}")
run:
    touch ${output}

[ttt: provides='{name}.ttt']
print("Running step ${step_name} to generate ${output} from ${input}")
run:
    touch ${output}

[10]
print("Running step ${step_name} to generate ${output} from ${input}")

[20]
depends: "step20.txt"
print("Running step ${step_name} to generate ${output} from ${input}")

[30]
output: "step30.txt"
print("Running step ${step_name} to generate ${output} from ${input}")
run:
    touch ${output}