# Attaching variables to output targets and groups

* **Difficulty level**: intermediate
* **Time need to lean**: 20 minutes or less
* **Key points**:
  * Option `paired_with` attaches variables to output targets
  * Option `group_with` attaches variables to output groups (`_output`)
  

## Passing of `step_output`

If a SoS step contains multiple substeps, defined by options `group_by` or `for_each`, the `_input` becomes the groups of `step_input` and `_output` becomes the groups of `step_output`, and the steps are executed for each of the groups.

Moreover, the group information of `step_output` will be passed as the default input to the next step in a simple forward-style workflow, or as input to another step with functions `output_from` or `named_output`. As shown in the following example, the `step_output` of step `A` becomes the input of step `B`, creating two substeps.

In [1]:
!touch a.txt b.txt
%run B -v1

[A]
input: 'a.txt', 'b.txt', group_by=1
output: _input.with_suffix('.bak')

_output.touch()
print(f'step_input={step_input}, _input={_input}, _output={_output}')

[B]
input: output_from('A')
print(f'step_output={step_input}, _input={_input}')


step_output=a.bak b.bak, _input=a.bak


step_output=a.bak b.bak, _input=b.bak


## Attaching attributes to output targets

As we recall the input option `paired_with` associate each input target with one or more attributes.

In [2]:
input: 'a.txt', 'b.txt', group_by=1, paired_with=dict(sample=['A', 'B'])
output: _input.with_suffix('.bak')

_output.touch()
print(f'step_input={step_input}, _input={_input} for sample {_input.sample}, _output={_output}')


step_input=a.txt b.txt, _input=a.txt for sample A, _output=a.bak


step_input=a.txt b.txt, _input=b.txt for sample B, _output=b.bak


We can do the same for `_output`, but it is trickier because the output statement defines `_output` and only in rare cases sees the entire `step_output` (see [output option `group_by`](output_group_by.html) for details). In any case, `paired_with` option applies to what is defined in the `output` statement.

For example, with `paired_with`, the `_input` is associated with an attribute `sample`, and we can assign it to 


In [3]:
input: 'a.txt', 'b.txt', group_by=1, paired_with=dict(sample=['A', 'B'])
output: _input.with_suffix('.bak'), paired_with=dict(sample=_input.sample)

_output.touch()
print(f'step_input={step_input}, _input={_input}, _output={_output} for sample {_output.sample}')


step_input=a.txt b.txt, _input=a.txt, _output=a.bak for sample A


step_input=a.txt b.txt, _input=b.txt, _output=b.bak for sample B


However, if the `output` statement defines `step_output` with `group_by`, option `paired_with` will need to associate all targets with an array (not a single `_input.sample` as above).

In [4]:
in_files = ['a.txt', 'b.txt']
out_files = ['a.bak', 'b.bak']
samples = ['A', 'B']
input: in_files, group_by=1, paired_with=dict(sample=samples)
output: out_files, group_by=1, paired_with=dict(sample=samples)

_output.touch()
print(f'step_input={step_input}, _input={_input}, _output={_output} for sample {_output.sample}')


step_input=a.txt b.txt, _input=a.txt, _output=a.bak for sample A


step_input=a.txt b.txt, _input=b.txt, _output=b.bak for sample B


With attributes attached to `_output` targets, the attributes will be passed to next steps implicitly, or explicitly with `output_from`. The information will help you identify the properties of each substep more easily.

In [5]:
%run B -v1

[A]
in_files = ['a.txt', 'b.txt']
out_files = ['a.bak', 'b.bak']
samples = ['A', 'B']
input: in_files, group_by=1, paired_with=dict(sample=samples)
output: out_files, group_by=1, paired_with=dict(sample=samples)

_output.touch()
print(f'step_input={step_input}, _input={_input}, _output={_output} for sample {_output.sample}')

[B]
input: output_from('A')
print(f'Continue processing {_input} for sample {_input.sample}')

Continue processing a.bak for sample A


Continue processing b.bak for sample B


## The `group_with` output option

Just like the `group_with` option of the `input` statement, the `group_with` output option assigns a sequence of variables to each of the output groups (variable `_output`). Again, the situation is trickier because the output statement defines `_output` and only in rare cases sees the entire `step_output` (see [output option `group_by`](output_group_by.html) for details). In any case, `group_with` option applies to what is defined in the `output` statement.

That is to say, if `output` defines `_output`, `group_with` just associate the dictionary with it, and the values should be specific for this particular substep.

In [6]:
!touch a.txt b.txt
%run -v1

samples = ['A', 'B']

[1]
input: 'a.txt', 'b.txt', group_by=1
output: _input.with_suffix('.bak'), group_with=dict(sample=samples[_index])

_output.touch()
print(f'step_input={step_input}, _input={_input}, _output={_output}, _output.sample={_output.sample}')

[2]
print(f'step_output={step_input}, _output={_input}, _output.sample={_input.sample}')


step_output=a.bak b.bak, _output=a.bak, _output.sample=A


step_output=a.bak b.bak, _output=b.bak, _output.sample=B


If `output` defines `step_output` with `group_by`, then `group_with` should specify arrays with elements assigned to each substep.

In [7]:
!touch a.txt b.txt
%run -v1

samples = ['A', 'B']

[1]
input: 'a.txt', 'b.txt', group_by=1
output: 'a.bak', 'b.bak', group_by=1, group_with=dict(sample=samples)

_output.touch()
print(f'step_input={step_input}, _input={_input}, _output={_output}, _output.sample={_output.sample}')

[2]
print(f'step_output={step_input}, _output={_input}, _output.sample={_input.sample}')

step_output=a.bak b.bak, _output=a.bak, _output.sample=A


step_output=a.bak b.bak, _output=b.bak, _output.sample=B


## Difference between `paired_with` and `group_with`

The difference between `pairwd_with` and `group_with` should be clear but the simple examples we have shown do not show it. More specifically,

* `paired_with` pairs variables with each target of `_output`
* `group_with` pairs variables to `_output` itself

We did not see any difference because our `_output` has only one element so `_output.sample` can be used in place of `_output[0].sample`. The following example creates `_input` of size 2 and demonstrates the difference between target variables (`replicate`) and group varaibles (`group`).

In [8]:
!touch a1.txt a2.txt b1.txt b2.txt
%run -s force -v1

[1]
input: 'a1.txt', 'a2.txt', 'b1.txt', 'b2.txt', group_by=2,
    paired_with=dict(replicate=[1, 2, 1, 2]),
    group_with=dict(group=['A', 'B'])
output: [x.with_suffix('.bak') for x in _input], 
    paired_with=dict(replicate=[x.replicate for x in _input]),
    group_with=dict(group=_input.group)

_output.touch()
print(f'step 1 step_input={step_input}')
print(f'  _input={_input}, _output={_output}, _output.group={_output.group}')

[2]
print(f'step 2 step_input={step_input}')
print(f'  _input={_input} with replicate {[x.replicate for x in _input]},  _input.group={_input.group}')


step 1 step_input=a1.txt a2.txt b1.txt b2.txt


  _input=a1.txt a2.txt, _output=a1.bak a2.bak, _output.group=A


step 1 step_input=a1.txt a2.txt b1.txt b2.txt


  _input=b1.txt b2.txt, _output=b1.bak b2.bak, _output.group=B


step 2 step_input=a1.bak a2.bak b1.bak b2.bak


  _input=a1.bak a2.bak with replicate [1, 2],  _input.group=A


step 2 step_input=a1.bak a2.bak b1.bak b2.bak


  _input=b1.bak b2.bak with replicate [1, 2],  _input.group=B


## Further reading

* [`output` statement](output_statement.html)