# Getting Started with SoS Workflow System

* **Difficulty level**: easy
* **Time need to lean**: 10 minutes or less
* **Key points**:
  * SoS uses classic Jupyter or Jupyter Lab with a SoS kernel as its IDE
  * SoS steps can be developed and executed in SoS Notebook
  * SoS workflows are embedded in Jupyter notebook
  * Complete SoS workflows can be executed in Jupyter notebook with magics `%run` and `%sosrun`, or with the `sos` command from command line

## SoS Workflow System in Jupyter

SoS Workflow System uses SoS Notebook as its IDE. The following figure illustrates the overall design of SoS Workflow System and SoS Notebook:

[![JupyterCon18 SoS Talk](https://vatlab.github.io/sos-docs/doc/media/SoS_Notebook_and_Workflow.png)](https://www.youtube.com/watch?v=U75eKosFbp8)

Basically,

* SoS Notebook is a [Jupyter Notebook](https://jupyter.org/) with a SoS kernel.
* SoS Notebook serves as a super kernel to all other Jupyter kernels and allows the use of multiple kernels in a single notebook.
* SoS Notebook also serves as the IDE for SoS Workflow System.

The figure is linked to a [youtube video](https://www.youtube.com/watch?v=U75eKosFbp8) for a [presentation on SoS during the 2018 JupyterCon](https://github.com/vatlab/JupyterCon2018), which introduces both SoS Notebook and SoS Workflow System and can be a good starting point for you to learn SoS. The SoS Workflow part starts at [20min](https://www.youtube.com/watch?v=U75eKosFbp8#t=20m).

## Running SoS

The Running SoS section of the [SoS Homepage](https://vatlab.github.io/sos-docs/) contains all the instructions on how to install SoS. Briefly, you have the following options to use SoS

* Try SoS using our live server [http://vatlab.github.io/sos/live](http://vatlab.github.io/sos/live).
* Start a Jupyter notebook server from our docker image [mdabioinfo/sos-notebook](https://hub.docker.com/r/mdabioinfo/sos-notebook/).
* Install `sos` and `sos-notebook` locally if you have a local Python (3.6 or higher) installation and a working Jupyter server with kernels of interest.
* Check with your system administrator if you have access to an institutional JupyterHub server with SoS installed.

For the purpose of this tutorial, it is good enough to use our live server [http://vatlab.github.io/sos/live](http://vatlab.github.io/sos/live). After you see the following interface, select New -> SoS to create a SoS notebook. You can also go to `examples` and open existing SoS notebooks.

[![LiveServer](https://vatlab.github.io/sos-docs/doc/media/Live_Server.png)](http://vatlab.github.io/sos/live)

## Using the SoS kernel

This tutorial is written in a SoS Notebook, which consists of multiple **markdown cells** and **code cells**. With the SoS kernel, each code cell can have its own kernel. SoS Notebook allows you to use multiple kernels in a single notebook and exchange variables among live kernels. This allow you to develop scripts and analyze data in different languages.

For example, the following three code cells perform a multi-language data analysis where the first cell defines a few variables (in Python, as SoS is based on Python), the second cell runs a bash script to convert an excel file to csv format, and the last cell uses R to read the csv file and generate a plot. Three different kernels, SoS, [bash_kernel](https://github.com/takluyver/bash_kernel), and [IRkernel](https://github.com/IRkernel/IRkernel) are used, and a `%expand` magic is used to pass filenames from the SoS kernel to other kernels.

In [1]:
excel_file = 'data/DEG.xlsx'
csv_file = 'DEG.csv'
figure_file = 'output.pdf'

In [2]:
%expand
xlsx2csv {excel_file} > {csv_file}

In [3]:
%expand
data <- read.csv('{csv_file}')
pdf('{figure_file}')
plot(data$log2FoldChange, data$stat)
dev.off()

The SoS cell above is called a **scratch cell** because it does not contain a formal SoS step. Such cells accept:
* **Any Python statements** because SoS is extended from Python 3.6,
* **SoS magics** as documented [here](https://vatlab.github.io/sos-docs/doc/documentation/SoS_Magics.html), and
* **Any SoS step definition without header**, which we will introduce later.

## Embedded SoS script

Scripts in different languages can be [easily converted to SoS workflows](https://www.youtube.com/watch?v=U75eKosFbp8#t=22m40s), which consist of sos sections marked by section headers in the format of 

```
[header_name (: optional options]
```

For example, the following code defines a SoS workflow step with a header and a simple Python print statement.

```
[10]
print('this is a SoS step')
```

These steps define the embedded SoS script of the notebook.

<div class="bs-callout bs-callout-info" role="alert">
  <h4>Embedded SoS script</h4>
  <p>An embed SoS script consists of SoS sections in all SoS cells of a notebook.</p>  
</div>

The easiest way to view the embedded workflow of a SoS notebook is to use the `%preview --workflow` as follows (The option `-n` lists the script in the notebook instead of the side panel). As you can see, the embedded script consists of steps from the entire notebook, from content even after this cell.

In [4]:
%preview -n --workflow

## Execute workflow cells using magic `%run`

<div class="bs-callout bs-callout-info" role="alert">
  <h4>%run</h4>
    <p> The <code>%run</code> magic execute the content of the cell as a complete SoS workflow.</p>  
</div>

If you define a workflow in a SoS cell, you can use magic `%run` to execute it:

In [5]:
%run
[hello_world]
print('This is our first hellp world workflow')

0,1,2,3,4
,hello_world,Workflow ID  74fbf4d9ab411228,Index  #1,completed  Ran for 0 sec


This is our first hellp world workflow


SoS starts an external `sos` process, execute the workflow and displays the output in the notebook. A status table is created to list the workflow name, ID and other information, which can be removed if you click the status icon.

The `%run` magic execute the content of the cell as a SoS workflow, even if it contains no section header, or multiple steps. For example,

In [6]:
%run
[step_10]
print('This is step 10 of a SoS workflow step')

[step_20]
print('This is step 20 of a SoS workflow step')

0,1,2,3,4
,step,Workflow ID  ea60ad8b15451823,Index  #2,completed  Ran for 0 sec


This is step 10 of a SoS workflow step
This is step 20 of a SoS workflow step


## Execute embedded workflows using magic `%sosrun`

<div class="bs-callout bs-callout-info" role="alert">
  <h4>%sosrun</h4>
  <p> The <code>%sosrun</code> magic execute workflows defined in the embedded SoS script of a notebook.</p>  
</div>

As you can see from the output of `%preview --workflow`, the entire embedded workflow consists of sections from all SoS cells. The `%sosrun` magic can be used to execute any of the defined workflows.

For example, the following magic execute the workflow `step` defined in the above section. Because multiple workflows are defined in this notebook (`A`, `B`, and `step`), a workflow name is required for this magic.

In [7]:
%sosrun step

0,1,2,3,4
,step,Workflow ID  49be8d1fa78f3018,Index  #3,completed  Ran for 0 sec


This is step 10 of a SoS workflow step
This is step 20 of a SoS workflow step


<div class="bs-callout bs-callout-warning" role="alert">
  <h4>Warning</h4>
    <p>Workflow cells can only be executed by SoS magics <code>%run</code> and <code>%sosrun</code>. SoS will not produce any output if you execute a workflow cell directly.</p>  
</div>

## Execute embedded workflows with command `sos`

The `%run` and `%sosrun` magics actually calls an external command `sos` to execute the workflows. Actually, the `sos` command recognizes embedded scripts from a notebook and can execute it directly.

Using a magic `!` that execute any shell command, we can mimick the execution of this notebook from the command line:

In [8]:
!sos run sos_in_notebook.ipynb step

INFO: Running [32mstep_10[0m: 
This is step 10 of a SoS workflow step


INFO: Running [32mstep_20[0m: 
This is step 20 of a SoS workflow step
INFO: Workflow step (ID=a0dabdc1992e98bb) is executed successfully with 2 completed steps.


## Further reading

* [Inclusion of scripts](Inclusion_of_scripts.html)