# How to group input and output targets by names

* **Difficulty level**: easy
* **Time need to learn**: 10 minutes or less
* **Key points**:
  * `_input` or `_output` can be grouped by the **sources** of input and output targets
  * Use keyword arguments to specify sources of input or output arguments
  * `_input[name]` and `_output[name]` return subset of `_input` or `_output` with source `name`
  * Outputs returned from `output_from` and `named_output` can have their own sources  

## Sources of `sos_targets`

All input targets of SoS steps have a **source**, which specifies where the target comes from. The `sources` of variables in type `sos_targets` (e.g. variables `_input`, `_output`, and `step_input`) can be used to inspect the source of each target.

 <div class="bs-callout bs-callout-primary" role="alert">
    <h4>The <code>sources</code> of <code>sos_targets</code></h4>
    <p>Each element in a <code>sos_targets</code> object (e.g. <code>_input</code>) has a <em>source</em> attribute.</p>
    <ul>
        <li>The sources of <code>sos_targets</code> can be retrieved by attribute <code>.sources</code></li>
        <li>A slice of the <code>sos_targets</code>, namely all elements having a specified source, can be obtained by <code>[name]</code>. Groups of the <code>sos_targets</code> will also be sliced</li>
        <li>The default source of input and output files are the steps from which they are inputted or generated</li>
        <li>Keyword arguments (e.g. <code>summary='summary.html'</code>) overrides the default sources</li>        
    </ul>
 </div>

By default, targets specified directly have the source of the name of the step in which they are specified.

In [2]:
# create a few input files to satisfy the input of sample workflows
!touch a.txt b.txt ref.txt

In [3]:
in_files = ['a.txt', 'b.txt']
input: in_files, 'c.txt', 'd.txt'

print(f'step_input is {step_input} with sources {step_input.sources}')

step_input is a.txt b.txt c.txt d.txt with sources ['scratch_0', 'scratch_0', 'scratch_0', 'scratch_0']


You can specify the sources of inputs with keyword arguments. For example, in the following step, the first two files are given a name `grp1` and the latter two are given a name `grp2`.

In [4]:
input: grp1 = ['a.txt', 'b.txt'], grp2=['c.txt', 'd.txt']

print(f'step_input is {step_input} with sources {step_input.sources}')

step_input is a.txt b.txt c.txt d.txt with sources ['grp1', 'grp1', 'grp2', 'grp2']


The **sources of the targets can be used to partition input targets and refer them separately**. You can access groups of input files with with syntax `step_input[group_name]`.

In [5]:
input: data = ['a.txt', 'b.txt'], reference='ref.txt'

print(f'Input of step is {_input} with sources {step_input.sources}')
print(f'Data is {_input["data"]}')
print(f'Reference is {_input["reference"]}')

Input of step is a.txt b.txt ref.txt with sources ['data', 'data', 'reference']
Data is a.txt b.txt
Reference is ref.txt


## Named inputs and outputs

 <div class="bs-callout bs-callout-primary" role="alert">
    <h4>Named inputs and outputs</h4>
    <p>Keyword arguments in input and output statements allows referring to subsets of inputs and outputs with names</p>
 </div>

For example, in the following workflow, the input files are labelled with `data` and the reference is labelled with `reference`. In the output statement, the `data` part of the input (`_input["data"]`) is used to generate results with label `result`.

In the following `print` statement,  `_input["reference"]`, `_output['result']` etc are used to obtain subsets of `_input` and `_output`. These subsets of inputs or outputs are usually called **named inputs** and **named outputs**.

In [12]:
input: data = ['a.txt', 'b.txt'], reference='ref.txt'
output: result=[x.with_suffix('.res') for x in _input["data"]]
_output.touch()                              

print(f'''\
Input of step is {_input} with sources {step_input.sources}

Input data is {_input["data"]}
Reference is {_input["reference"]}

Output is {_output}
Result of output is {_output['result']}
''')

Input of step is a.txt b.txt ref.txt with sources ['data', 'data', 'reference']

Input data is a.txt b.txt
Reference is ref.txt

Output is a.res b.res
Result of output is a.res b.res



## Slices of `sos_targets` with groups *

If a step has multiple substeps, variables `step_input` and `step_output` will consist of multiple groups. For example, the `_output` of step `[10]` has named output `A` and `B`. The output of the entire step consists of 4 groups, which are retrieved by function `output_from(-1)` (`-1` means last step). The expression

```python
input: output_from(-1)['A']
```
obtains all targets with source `A`, including the groups, so `_input` of step `20` consists of only targets with source `A`.

In [5]:
%run -v0
[10]
input: for_each=dict(i=range(4))
output: A=f'a_{i}.txt', B=f'b_{i}.txt'
_output.touch()       

print(f'Output step is {_output} with sources {_output.sources}')

[20]
input: output_from(-1)['A']
print(f'input of substep is {_input}')

0,1,2,3,4
,default,Workflow ID  1d31d133dbc5cb66,Index  #3,completed  Ran for < 5 seconds


input of substep is a_0.txt
input of substep is a_1.txt
input of substep is a_2.txt
input of substep is a_3.txt


## Further reading
*  [How to use named output in data-flow style workflows](doc/user_guide/named_output.html)