# Getting Started with SoS Workflow System

* **Difficulty level**: easy
* **Time need to lean**: 10 minutes or less
* **Take home messages**:
  1. SoS uses classic Jupyter or Jupyter Lab with a SoS kernel as its IDE
  2. SoS steps can be developed and executed in SoS Notebook
  3. SoS workflows are embedded in Jupyter notebook
  4. Complete SoS workflows can be executed in Jupyter notebook with magics `%run` and `%sosrun`, or with the `sos` command from command line

## SoS Workflow System in Jupyter

SoS Workflow System uses SoS Notebook as its IDE. The following figure illustrates the overall design of SoS Workflow System and SoS Notebook:

[![JupyterCon18 SoS Talk](https://vatlab.github.io/sos-docs/doc/media/SoS_Notebook_and_Workflow.png)](https://www.youtube.com/watch?v=U75eKosFbp8)

Basically,

1. SoS Notebook is a [Jupyter Notebook](https://jupyter.org/) with a SoS kernel.
2. SoS Notebook serves as a super kernel to all other Jupyter kernels and allows the use of multiple kernels in a single notebook.
3. SoS Notebook also serves as the IDE for SoS Workflow System.

The figure is linked to a [youtube video](https://www.youtube.com/watch?v=U75eKosFbp8) for a [presentation on SoS during the 2018 JupyterCon](https://github.com/vatlab/JupyterCon2018), which introduces both SoS Notebook and SoS Workflow System and can be a good starting point for you to learn SoS. The SoS Workflow part starts at 20min.

## Running SoS

The Running SoS section of the [SoS Homepage](https://vatlab.github.io/sos-docs/) contains all the instructions on how to install SoS. Briefly, you have the following options to use SoS

1. Try SoS using our live server [http://vatlab.github.io/sos/live](http://vatlab.github.io/sos/live).
2. Start a Jupyter notebook server from our docker image [mdabioinfo/sos-notebook](https://hub.docker.com/r/mdabioinfo/sos-notebook/).
3. Install `sos` and `sos-notebook` locally if you have a local Python (3.6 or higher) installation and a working Jupyter server with kernels of interest.
4. Check with your system administrator if you have access to an institutional JupyterHub server with SoS installed.

For the purpose of this tutorial, it is good enough to use our live server [http://vatlab.github.io/sos/live](http://vatlab.github.io/sos/live). After you see the following interface, select New -> SoS to create a SoS notebook. You can also go to `examples` and open existing SoS notebooks.

[![LiveServer](https://vatlab.github.io/sos-docs/doc/media/Live_Server.png)](http://vatlab.github.io/sos/live)

## Using the SoS kernel

This tutorial is written in a SoS Notebook, which consists of multiple **markdown cells** and **code cells**. With the SoS kernel, the code cells can have their own kernels. SoS Notebook allows you to use multiple kernels in a notebook and exchange variables among live kernels. This allow you to develop scripts and analyze data in different languages.

For example, the following cell is a SoS cell (based on Python) that defines a few variables. The next cell is a Bash cell that converts an excel file to csv format, using filenames expanded using the `%expand` magic, and the last cell reads the csv file and generate a plot, again using the `%expand` magic to pass filename information.

In [2]:
excel_file = 'data/DEG.xlsx'
csv_file = 'DEG.csv'
figure_file = 'output.pdf'

In [3]:
%expand
xlsx2csv {excel_file} > {csv_file}

In [4]:
%expand
data <- read.csv('{csv_file}')
pdf('{figure_file}')
plot(data$log2FoldChange, data$stat)
dev.off()

The SoS cell above is called a **scratch cell**. It accepts:
* **Any Python statements** because SoS is extended from Python 3.6.
* **SoS magics** as documented [here](https://vatlab.github.io/sos-docs/doc/documentation/SoS_Magics.html).
* **Any SoS step statements** such as `input` and `task`, which will be intruduced in other tutorials.

## Embedded SoS script

Scripts in different languages can be [easily converted to SoS workflows](https://www.youtube.com/watch?v=U75eKosFbp8#t=22m40s), which consist of sos sections marked by section headers in the format of 

```
[header_name (: optional options]
````

For example, the following code defines a SoS workflow step with a header and a simple Python print statement.

```
[10]
print('this is a SoS step')
```

However, **SoS sections are part of the embedded SoS script and will be executed outside of SoS Notebook**. Formally speaking:

> **Embedded SoS script:**<br>
> An embed SoS script consists of SoS sections in SoS cells.


The easiest way to view the embedded workflow of a SoS notebook is to use the `%preview --workflow` as follows (The option `-n` lists the script in the notebook instead of the side panel). As you can see, the embedded script consists of steps from the entire notebook, from content even after this cell.

In [5]:
%preview -n --workflow

## Execute workflow cells using magic `%run`

If you open a SoS notebook and execute the following cell, nothing will happen because **embeded workflows can only be executed by SoS magics**.

In [6]:
[A]
print('This is step A of a SoS workflow')

The correct way to execute the above cell is to use a `%run` magic:

In [7]:
%run
[B]
print('This is step B of a SoS workflow')

0,1,2,3,4
,B,Workflow ID  85678f0d686d8bb6,Index  #1,completed  Ran for < 5 seconds


This is step B of a SoS workflow


SoS starts an external `sos` process, execute the workflow and displays the output in the notebook. A status table is created to list the workflow name, ID and other information, which can be removed if you click the status icon.

The `%run` magic execute the content of the cell as a SoS workflow, even if it contains no section header, or multiple steps. For example,

In [8]:
%run
[step_10]
print('This is step 10 of a SoS workflow step')

[step_20]
print('This is step 20 of a SoS workflow step')

0,1,2,3,4
,step,Workflow ID  ea60ad8b15451823,Index  #2,completed  Ran for < 5 seconds


This is step 10 of a SoS workflow step
This is step 20 of a SoS workflow step


So in summary:
> **%run**<br>
> The `%run` magic execute the content of the cell as a complete SoS workflow.

## Execute embedded workflows using magic `%sosrun`

As you can see from the output of `%preview --workflow`, the entire embedded workflow consists of sections from all SoS cells. 

> **%sosrun**<br>
> The `%sosrun` magic execute workflows defined in the embedded SoS script of a notebook.

For example, the following magic execute the workflow `step` defined in the above section. Because multiple workflows are defined in this notebook (`A`, `B`, and `step`), a workflow name is required for this magic.

In [9]:
%sosrun step

0,1,2,3,4
,step,Workflow ID  9d30892e37966444,Index  #3,completed  Ran for < 5 seconds


This is step 10 of a SoS workflow step
This is step 20 of a SoS workflow step


## Execute embedded workflows with command `sos`

The `%run` and `%sosrun` magics actually calls an external command `sos` to execute the workflows. Actually, the `sos` command recognizes embedded scripts from a notebook and can execute it directly.

Using a magic `!` that execute any shell command, we can mimick the execution of this notebook from the command line:

In [10]:
!sos run Getting_Started.ipynb step

INFO: Running [32mstep_10[0m: 
This is step 10 of a SoS workflow step
INFO: Running [32mstep_20[0m: 
This is step 20 of a SoS workflow step
INFO: Workflow step (ID=9d30892e37966444) is executed successfully with 2 completed steps.


## Further reading

* [Inclusion of scripts](Inclusion_of_scripts.html)