# Use of variables and command line parameters

* **Difficulty level**: easy
* **Time need to lean**: 10 minutes or less
* **Key points**:
  * SoS (Python) variables can be used to compose scripts in different languages as Python f-strings
  * A `parameter` statement defines a parameter that can be passed from command line
  * The `parameter` statements accepts either a default value or a type
  * Parameters without default values are required

## Global and local variables

Now let us have a look at the example workflow from [our first tutorial](sos_in_notebook.html) in more detail. 

In [1]:
%run

[global]
excel_file = 'data/DEG.xlsx'
csv_file = 'DEG.csv'
figure_file = 'output.pdf'

[plot_10]
run: expand=True
    xlsx2csv {excel_file} > {csv_file}

[plot_20]
R: expand=True
    data <- read.csv('{csv_file}')
    pdf('{figure_file}')
    plot(data$log2FoldChange, data$stat)
    dev.off()

xlsx2csv data/DEG.xlsx > DEG.csv



null device 
          1 


This workflow has a `global` section, which defines variables that are visible to all workflow steps. The three variables are available in `plot_10` and `plot_20`, so they can be used in actions `run` and `R` with the option `expand=True`. More explicitly, the `plot_10` can be considered as the following python script

```python
excel_file = 'data/DEG.xlsx'
csv_file = 'DEG.csv'
figure_file = 'output.pdf'

run(f'''\
xlsx2csv {excel_file} > {csv_file}
''')
```

<div class="bs-callout bs-callout-primary" role="alert">
  <h4>The global section</h4>
  <p>The content of the global section can be considered as part of all workflow steps</p>  
</div>
<div class="bs-callout bs-callout-info" role="alert">
  <h4>Unnamed and multiple global sections (advanced)</h4>
    <ul>
        <li>The <code>[global]</code> section header can be ignored if it is the first section of a SoS script. In another word, statements before the definition of any SoS section constitute an unnamed global section.</li>  
        <li>Multiple named and unnamed global sections are allowed and their contents are merged to a single <code>[global]</code> section of the workflow.</li>
    </ul>
</div>

In contrast, **variables defined in individual steps are not available to other steps**. For example, the following workflow would fail because 

In [2]:
%run

[plot_10]
excel_file = 'data/DEG.xlsx'
csv_file = 'DEG.csv'

run: expand=True
    xlsx2csv {excel_file} > {csv_file}

[plot_20]
figure_file = 'output.pdf'

R: expand=True
    data <- read.csv('{csv_file}')
    pdf('{figure_file}')
    plot(data$log2FoldChange, data$stat)
    dev.off()

xlsx2csv data/DEG.xlsx > DEG.csv

[91mERROR[0m: [91m[plot_20]: 
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
script_8039746696757261604 in <module>
      plot(data$log2FoldChange, data$stat)
      dev.off()
----> """)

NameError: name 'csv_file' is not defined[0m


<div class="bs-callout bs-callout-warning" role="alert">
  <h4>Local (step-level) variables</h4>
  <p>Variables defines at the step level are local to the step and are not accessible from other SoS steps.</p>  
</div>

If you really need to pass locally defined variables to other steps, you will have to return it as the part of the result of the output, or explicitly share the variable with others. Please refer to the [Further reading](#further_reading) section of this tutorial for details.

## Workflow parameters <a id="parameter"></a>

SoS allows you to define parameters that accept values from command line options.  

In [3]:
%run --excel-file data/DEG.xlsx

[global]
parameter: excel_file = str
parameter: csv_file = 'DEG.csv'
parameter: figure_file = 'output.pdf'

[plot_10]
run: expand=True
    xlsx2csv {excel_file} > {csv_file}

[plot_20]
R: expand=True
    data <- read.csv('{csv_file}')
    pdf('{figure_file}')
    plot(data$log2FoldChange, data$stat)
    dev.off()

xlsx2csv data/DEG.xlsx > DEG.csv

null device 
          1 


<div class="bs-callout bs-callout-primary" role="alert">
    <h4>The <code>parameter</code> statement</h4>
 <ul>
     <li>Parameters are defined with a <code>parameter</code> statement if the format of <br><code>parameter: name = value</code></li>
     <li>Parameter <code>name</code> is the name of the parameter</li>
     <li>Parameter <code>value</code> can be a type (`str` in the example) or a default value</li>
<li>The parameter is specified with a double dash syntax from command line, which can be from <code>%run</code> or <code>%sosrun</code> magics, or from command line after <code>sos run</code> command</li>
    </ul>
</div>


In the above example, three parameters `excel_file`, `csv_file`, `figure_file` are defined. Parameter `excel_file` is required and is specified as an command line option of the `%run` magic. The other two parameters have their default values. Note that parameter `excel_file` can be specified as both `--excel_file` or `--excel-file` from command line.

If you execute the workflow without option, an error will be raised. 

In [4]:
%run 

[global]
parameter: excel_file = str
parameter: csv_file = 'DEG.csv'
parameter: figure_file = 'output.pdf'

[plot_10]
run: expand=True
    xlsx2csv {excel_file} > {csv_file}

[plot_20]
R: expand=True
    data <- read.csv('{csv_file}')
    pdf('{figure_file}')
    plot(data$log2FoldChange, data$stat)
    dev.off()
  

[91mERROR[0m: [91mArgument excel_file of type str is required[0m


## Type of parameters

The types of parameters are determined by the default value or type specification, and determines how they should be passed from command line.

### Simple Python types

SoS automatically determines the type of default values and convert your input data to the type. For example, the type of `cutoff` is determined to be an integer so it accepts an integer value from command line:

In [5]:
%run --cutoff 5
parameter: cutoff = 0

print(cutoff)

5


An error will be raised if you pass a string,

In [16]:
%run -v0 --cutoff zero
parameter: cutoff = 0

print(cutoff)

[91mERROR[0m: [91m[default]: argument --cutoff: invalid int value: 'zero'[0m


even a float value

In [18]:
%run -v0 --cutoff 5.1
parameter: cutoff = 0

print(cutoff)

[91mERROR[0m: [91m[default]: argument --cutoff: invalid int value: '5.1'[0m


If you intended to accept an float value, use a default value in float:

In [9]:
%run --cutoff 5.1
parameter: cutoff = 0.

print(cutoff)

5.1


### List of strings

A list would be created if the parameter has a default value of type list. For example, a list `['A']` is returned because the default value has a list type.

In [10]:
%run --sample-names A
parameter: sample_names = []

print(sample_names)

['A']


SoS even understands the type of the values of the list and tries to follow it:

In [11]:
%run --values 4 5
parameter: values = [1, 2, 3]

print(values)

[4, 5]


However, it is not yet possible to specify the type of values when you specify a required parameter of type list so all the values will be passed as strings:

In [12]:
%run --values 4 5
parameter: values = list

print(values)

['4', '5']


### SoS types *

It is a bit advanced but for completeness we also show parameters with SoS types such as `path`, `paths`, `file_targets`. Generally speaking you can pass filenames as `str` or list of `str`, but passing SoS types such as `path` allow you to create variable in these types without type coercion.

For example, the `path` type is derived from [`pathlib.Path`](https://docs.python.org/3.6/library/pathlib.html) with automatic expansion of `~` and other features. If you pass a parameter with type `path`, SoS will convert passed string into a `path` object so that you can use it directly.

In [13]:
%run --infile ~/project/a.txt
parameter: infile = path

print(infile.name)
print(infile.parent)
print(infile.exists())

a.txt
/Users/bpeng1/project
False


Similarly, if you pass a `paths` (a sequence of `path`), you parameter will be of type `paths`:

In [15]:
%run --infiles a.txt b.txt
parameter: infiles = paths('a.txt')

print(infiles)

a.txt b.txt


## Further reading <a id='further_reading'></a>

* [SoS Data Types](doc/user_guide/sos_datatypes.html)