# How to include output from another step in a SoS step

* **Difficulty level**: intermediate
* **Time need to lean**: 10 minutes or less
* **Key points**:
  * Function `output_from(step)` refers to output from another `step`
  * `output_from(step)[name]` can be used to refer to named output from `step`
  

## Referring to output from another step

As shown in the example from tutorial [How to use named output in data-flow style workflows](doc/user_guide/named_output.html), function `named_output` can be used to refer to named output from another step:

In [1]:
!rm -f data/DEG.csv
%run plot

[global]
excel_file = 'data/DEG.xlsx'
csv_file = 'data/DEG.csv'
figure_file = 'output.pdf'

[convert]
input: excel_file
output: csv = _input.with_suffix('.csv')

run: expand=True
    xlsx2csv {_input} > {_output}

[plot]
input: named_output('csv')
output: figure_file

R: expand=True
    data <- read.csv('{_input}')
    pdf('{_output}')
    plot(data$log2FoldChange, data$stat)
    dev.off()

0,1,2,3,4
,plot,Workflow ID  4dcd3205dd8ea4c0,Index  #1,completed  Ran for 1 sec


xlsx2csv data/DEG.xlsx > data/DEG.csv



One obvious limitation of `named_output()` is that the name has to be unique in the workflow. For example, in the following workflow where another step `test_csv` also gives its output a name `csv`, the workflow would fail due to ambiguity. This is usually not a concern with small workflows but when workflows get more and more complex, it is sometimes desired to anchor named output more precisely.

In [2]:
!rm -f data/DEG.csv
%run plot

[global]
excel_file = 'data/DEG.xlsx'
csv_file = 'data/DEG.csv'
figure_file = 'output.pdf'

[convert]
input: excel_file
output: csv = _input.with_suffix('.csv')

run: expand=True
    xlsx2csv {_input} > {_output}

[test_csv]
input: excel_file
output: csv = f'{_input:n}_test.csv'

run: expand=True
    xlsx2csv {_input} | head -10 > {_output}
    
[plot]
input: named_output('csv')
output: figure_file

R: expand=True
    data <- read.csv('{_input}')
    pdf('{_output}')
    plot(data$log2FoldChange, data$stat)
    dev.off()

[91mERROR[0m: [91mMultiple steps convert, test_csv to generate target named_output("csv")[0m


## Function  `output_from` <a id="output_from"></a>

 <div class="bs-callout bs-callout-primary" role="alert">
    <h4>Function <code>output_from(steps, group_by, ...)</code></h4>
    <p>Function <code>output_from</code> refers to the output of <code>step</code>. The returned the object is the complete output from <code>step</code> with its own sources and groups. Therefore,</p>
    <ul>
        <li>More than one steps can be specified as a list of step names</li>
        <li>Option <code>group_by</code> can be used to regroup the returned files</li>
        <li><code>output_from(step)[name]</code> refers to all output with source <code>name</code></li>
    </ul>
 </div>

Function `output_from` imports the output from one or more other steps. For example, in the following workflow `output_from(['step_10', 'step_20'])` takes the output from steps `step_10` and `step_20` as input. The `sources` of these input are `step_10` and `step_20` respectively. In a process-oriented workflow, `output_from(['step_10', 'step_20'])` can be simplified as `output_from([10, 20])` (integers).

In [5]:
%run -v0
[step_10]
output: 'a.txt'
_output.touch()

[step_20]
output: 'b.txt'
print(f'input of step {step_name} is {step_input}')
_output.touch()

[step_30]
input:  output_from(['step_10', 'step_20'])
print(f'input of step {step_name} is {step_input} with sources {step_input.sources}')
print(f'Output of step_20 is {step_input["step_20"]}')

0,1,2,3,4
,step,Workflow ID  c54cbfdea24d3f6b,Index  #5,completed  Ran for 0 sec


input of step step_30 is a.txt b.txt with sources ['step_10', 'step_20']
Output of step_20 is b.txt


You can override the `sources` of input files with keyword arguments

In [7]:
%run -v0
[step_10]
output: 'a.txt'
print(f'input of step {step_name} is {step_input}')
_output.touch()

[step_20]
output: 'b.txt'
print(f'input of step {step_name} is {step_input}')
_output.touch()

[step_30]
input:  output_from(10), s20=output_from(20)
print(f'input of step {step_name} is {step_input} with sources {step_input.sources}')


0,1,2,3,4
,step,Workflow ID  ab3252c1255a6928,Index  #7,completed  Ran for 0 sec


input of step step_30 is a.txt b.txt with sources ['step_10', 's20']


### `source` of outputs returned from `output_from`

Output from other steps can also have their own sources. In this case, the `sources` of the output is carried over.

In [16]:
%run -v0
[step_10]
output: output='out.txt', summary='summary.txt'
_output.touch()

[step_30]
input:  output_from(10)
print(f'input of step {step_name} is {step_input} with sources {step_input.sources}')

0,1,2,3,4
,step,Workflow ID  741f2c5f77bde980,Index  #15,completed  Ran for 0 sec


input of step step_30 is out.txt summary.txt with sources ['output', 'summary']


Now, if you are only interested to the `summary` part of the output of `step_10`, you can use `['summary']` to get a subset of the output from `output_from(10)`:

In [18]:
%run -v0
[step_10]
output: output='out.txt', summary='summary.txt'
_output.touch()

[step_30]
input:  output_from(10)['summary']
print(f'input of step {step_name} is {step_input} with sources {step_input.sources}')

0,1,2,3,4
,step,Workflow ID  e749542e45a8a28e,Index  #17,completed  Ran for 0 sec


input of step step_30 is summary.txt with sources ['summary']


When you use keyword argument to specify all or parts of the outputs, the `sources` are overridden

In [20]:
%run -v0
[step_10]
output: a='a.txt', b='b.txt'
_output.touch()

[step_20]
output: c='c.txt', d='d.txt'
_output.touch()

[step_30]
input:  s10=output_from(10), c=output_from(20)["c"]
print(f'input of step {step_name} is {step_input} with sources {step_input.sources}')

0,1,2,3,4
,step,Workflow ID  9c6283740131138c,Index  #19,completed  Ran for 0 sec


input of step step_30 is a.txt b.txt c.txt with sources ['s10', 's10', 'c']


Note that both source `a` and `b` from `output_from(10)` are overriden by `s10` so you can no longer differentiate sources `a` and `b` from `output_from(10)`.

### groups of outputs returned from `output_from`

Similar to the case with `named_output`, the returned object from `output_from()` keeps its original groups. For example,  

In [23]:
%run B -v0
[A]
input: for_each=dict(i=range(4))
output: f'a_{i}.txt'
_output.touch()

[B]
input: output_from('A')
output: _input.with_suffix('.bak')
print(f'Converting {_input} to {_output}')
_output.touch()

Converting a_0.txt to a_0.bak
Converting a_1.txt to a_1.bak
Converting a_2.txt to a_2.bak
Converting a_3.txt to a_3.bak


## Using `output_from` in place of `named_output`

Going back to our `conver`, `plot` example. When another step is added to have the same named output, it is no longer possible to use `named_output(name)`. In this case you can explicitly specify the step from which the named output is defined, and use

```
output_from(step)[name]
```
instead of
```
named_output(name)
```
as shown in the following example:

In [11]:
!rm -f data/DEG.csv
%run plot 

[global]
excel_file = 'data/DEG.xlsx'
csv_file = 'data/DEG.csv'
figure_file = 'output.pdf'

[convert]
input: excel_file
output: csv = _input.with_suffix('.csv')

run: expand=True
    xlsx2csv {_input} > {_output}

[test_csv]
input: excel_file
output: csv = f'{_input:n}_test.csv'

run: expand=True
    xlsx2csv {_input} | head -10 > {_output}
    
[plot]
input: output_from('convert')['csv']
output: figure_file

R: expand=True
    data <- read.csv('{_input}')
    pdf('{_output}')
    plot(data$log2FoldChange, data$stat)
    dev.off()

0,1,2,3,4
,plot,Workflow ID  08791179e094b892,Index  #11,completed  Ran for 0 sec


xlsx2csv data/DEG.xlsx > data/DEG.csv



Note that `output_from` is better than `named_output` for its ability to referring to a specific step, but is also worse than `named_output` for the same reason because it makes the workflow more difficult to maintain. We generally recommend the use of `named_output` for its simplicity.

## Further reading
* [How to use named output in data-flow style workflows](doc/user_guide/named_output.html)