# How to (not) execute substeps in parallel

* **Difficulty level**: easy
* **Time need to lean**: 10 minutes or less
* **Key points**:
  * By default substeps are executed in parallel
  * Option `concurrent=False` stops the substeps from being executed in parallel
  * Certain options and statements prevents substeps from being executed in parallel
  

## Input option `concurrent` <a id="Option_concurrent"></a>

Substeps of a step are by default executed concurrently with potential dependencies. For example,

In [5]:
sum = 0
import time
start_time = time.time()
input: for_each={'i': range(4)}
time.sleep(4)
print(f'sum is {sum} at index {_index}, completed in {time.time() - start_time:.1f} seconds')

sum is 0 at index 0, completed in 4.7 seconds
sum is 0 at index 1, completed in 4.7 seconds
sum is 0 at index 2, completed in 4.7 seconds
sum is 0 at index 3, completed in 4.7 seconds


As you can see, the `start_time` is the start time of all substeps, and the all substeps complete at about the same time because they are executed concurrently.

Concurrent execution can cause some unexpected results. For example, there are 4 substeps in the following example. Each of them adds `i` to a shared variable `sum`, but the results are not accumulated because each substep has its own `sum`.

In [7]:
sum = 0
input: for_each=dict(i=range(4))
sum += i
print(f'sum is {sum} at index {_index}')

sum is 0 at index 0
sum is 1 at index 1
sum is 2 at index 2
sum is 3 at index 3


To get the correct `sum` for all substeps, you can execute the substeps sequentially by adding option `concurrent=False`.

In [8]:
sum = 0
input: for_each=dict(i=range(4)), concurrent=False
sum += i
print(f'sum is {sum} at index {_index}')

sum is 0 at index 0
sum is 1 at index 1
sum is 3 at index 2
sum is 6 at index 3


## Concurrency for the execution of nested subworkflows

Substeps containing nested subworkflows (function `sos_run`) are also executed concurrently by default. For example, in the following workflow where four `sleep` subworkflows are executed with different parameter `duration`, the subworkflows are executed in parallel and completed in random orders.

In [2]:
%run -v0

[sleep]
parameter: index=int
parameter: duration=int
import time
time.sleep(duration)
print(f'I am process {index}, I have slept for {duration} seconds')


[default]
import random
input: for_each=dict(i=range(4))
sos_run('sleep', index=_index, duration=random.randint(1, 10))

I am process 2, I have slept for 2 seconds
I am process 1, I have slept for 8 seconds
I am process 3, I have slept for 8 seconds
I am process 0, I have slept for 9 seconds


<div class="bs-callout bs-callout-warning" role="alert">
    <h4>Substeps with statements after <code>sos_run</code> are not executed in parallel</h4>
    <p>Because of the way subworkflows are executed, a subworkflow must be the last statement in the step process to allow the substeps to be executed in parallel. That is to say, subworkflows in</p>
    <pre>
    input: ...
    sos_run('sub')
    print('Done')
    </pre>
    and 
    <pre>
    input: ...
    sos_run('sub1')
    sos_run('sub2')
    </pre>
    will not be executed in parallel. Although the latter case could be executed in parallel if <code>sub2</code> does not have to be executed after <code>sub1</code> and can be executed side by side with
    <pre>    
    input: ...
    sos_run(['sub1', 'sub2'])
    </pre>
</div>

There is a complication though: substeps with subworkflows must have the `sos_run` as the last statement to be executed in parallel. For example, with the addition of one statement after the `sos_run` call, subworkflows in the aforementioned example are executed sequentially.

In [4]:
%run -v0

[sleep]
parameter: index=int
parameter: duration=int
import time
time.sleep(duration)
print(f'I am process {index}, I have slept for {duration} seconds')


[default]
import random
input: for_each=dict(i=range(4))
sos_run('sleep', index=_index, duration=random.randint(1, 10))
print(f'{_index} is done')

0,1,2,3,4
,default,Workflow ID  c3c3bfb6674b8ed9,Index  #4,completed  Ran for 23 sec


I am process 0, I have slept for 5 seconds
0 is done
I am process 1, I have slept for 7 seconds
1 is done
I am process 2, I have slept for 5 seconds
2 is done
I am process 3, I have slept for 5 seconds
3 is done


This is somewhat limiting for users who get used to use a `default` step to execute multiple subworkflows as follows:

In [6]:
%run -v0

import time

[sub1]
time.sleep(6)
print(f'step {step_name} is done')

[sub2]
time.sleep(2)
print(f'step {step_name} is done')

[default]
sos_run('sub1')
sos_run('sub2')

step sub1 is done
step sub2 is done


However, remember that function `sos_run` can accept multiple subworkflows and will execute them in parallel, you can write execute the steps in parallel as long as they donot depend on each other:

In [7]:
%run -v0

import time

[sub1]
time.sleep(6)
print(f'step {step_name} is done')

[sub2]
time.sleep(2)
print(f'step {step_name} is done')

[default]
sos_run([
    'sub1',
    'sub2'
])

step sub2 is done
step sub1 is done


## Further reading

*  [How to execute other workflows in a SoS step](doc/user_guide/nested_workflow.html)