# Pass variables between SoS steps

* **Difficulty level**: intemediate
* **Time need to lean**: 20 minutes or less
* **Key points**:
  * Variables defined in steps are not accessible from other steps
  * Variables can be `shared` to steps that depends on it through target `sos_variable`  

## Section option `shared` <a id="Option_shared"></a>

SoS executes each step in a separate process and by default does not return any result to the master SoS process. Option `shared` is used to share variables between steps. This option accepts:

* A string (variable name), or
* A map between variable names and expressions (strings) that will be evaluated upon the completion of the step.
* A sequence of strings (variables) or maps.

For example,

In [1]:
%run -v1
[10: shared='myvar']
myvar = 100

[20]
print(myvar)

100


In [2]:
%run -v1
[10: shared=['v1', 'v2']]
v1 = 100
v2 = 200

[20]
print(v1)
print(v2)

100


200


The `dict` format of `shared` option allows the specification of expressions to be evaluated after the completion of the step, and can be used to pass pieces of `step_output` as follows:

In [3]:
%run -v1
[10: shared={'res': 'step_output["res"]', 'stat': 'step_output["stat"]'}]
output: res='a.res', stat='a.txt'

_output.touch()

[20]
print(res)
print(stat)

a.res


a.txt


## `sos_variable` targets

When we `shared` variables from a step, the variables will be available to the step that will be executed after it. This is why `res` and `stat` would be accessible from step `20` after the completion of step `10`. However, in a more general case, a step would need to depends on a target `sos_variable` to access the `shared` variable in a non-forward stype workflow.

For example, in the following workflow, two `sos_variable` targets creates two dependencies on steps `notebookCount` and `lineCount` so that these two steps will be executed before `default` and provide the required variables.

In [4]:
%run -v1

[notebookCount: shared='numNotebooks']
import glob
numNotebooks = len(glob.glob('*.ipynb'))

[lineCount: shared='lineOfThisNotebook']
with open('shared_variables.ipynb') as nb:
    lineOfThisNotebook = len(nb.readlines())

[default]
depends: sos_variable('numNotebooks'), sos_variable('lineOfThisNotebook')
print(f"There are {numNotebooks} notebooks in this directory")
print(f"Current notebook has {lineOfThisNotebook} lines")

There are 94 notebooks in this directory


Current notebook has 632 lines


## Sharing variables from substeps

When you share a variable from a step with multiple substeps, there can be multiple copies of the variable for each substep and it is uncertain which copy SoS will return. Current implementation returns the variable from the last substep, but this is not guaranteed. 

For example, in the following workflow multiple random seeds have been generated, but only the last `seed` is shared outside of step `1` and obtained by step `2`. 

In [5]:
%run -v1
[1: shared='seed']
input: for_each={'i': range(5)}

import random
seed = random.randint(0, 1000)
print(seed)

[2]
print(f'Got seed {seed} at step 2')

50


606


267


52


701


Got seed 701 at step 2


Got seed 701 at step 2


Got seed 701 at step 2


Got seed 701 at step 2


Got seed 701 at step 2


If you would like to see the variable in all substeps, you can prefix the variable name with `step_`, which is a convention for option `shared` to collect variables from all substeps.

In [6]:
%run -v1
[1: shared='step_seed']
input: for_each={'i': range(5)}
import random
seed = random.randint(0, 1000)

[2]
print(step_seed[_index])

17


114


688


99


253


You can also use the `step_*` vsriables in expressions as in the following example:

In [7]:
%run -v1
[1: shared={'summed': 'sum(step_rng)', 'rngs': 'step_rng'}]
input: for_each={'i': range(10)}
import random
rng = random.randint(0, 10)

[2]
input: group_by='all'
print(rngs)
print(summed)

[5, 2, 5, 2, 7, 10, 5, 0, 2, 2]


40


Here we used `group_by='all'` to collapse multiple substeps into 1.

## Sharing variables from tasks

Variables generated by external tasks adds another layer of complexity because tasks usually do not share variables with the substep it belongs. To solve this problem, you will have to use the `shared` option of `task` to return the variable to the substep:

In [8]:
%run -v1 -q localhost
[1: shared={'summed': 'sum(step_rng)', 'rngs': 'step_rng'}]
input: for_each={'i': range(5)}

task: shared='rng'
import random
rng = random.randint(0, 10*i)

[2]
input: group_by='all'
print(rngs)
print(summed)

[0, 7, 2, 23, 24]


56


## Further reading

* [Make-file style pattern matching](auxiliary_steps.html)