# String Interpolation

Bo Peng, Nov, 2016, with latest version available [here](http://bopeng.github.org/sos)

On top of python string manipulation functions (`%` operator and [`str.format`](https://docs.python.org/3/library/stdtypes.html#str.format) function) and similar to recently introduced (Python 3.6) [format string](https://docs.python.org/3.6/reference/lexical_analysis.html#f-strings), SoS uses string interpolation to replace variables and expressions in string literals with their corresponding values.

Unlike Python format string, SoS string interpolation **does not require any prefix**, and is **applied to only double quoted strings** (`" "`, `""" """`, `r" "`, and `r""" """`). Single quoted strings (`' '`, `''' '''`, `r' '`, and `r''' '''`) are not interpolated.

Although configurable, the default sigil for SoS string interpolation is `'${ }'`, which means by default any expression between `${` and `}` would be evaluated by SoS.

For example, expressions `resource_path`, `sample_names[0]` and `sample_names` would be replaced by their values in variables `ref_genome`, `title`, and `all_names`, but not in `single_quoted` because the string literal is quoted by single quotes. For convenience, we use a magic `%preview` of the SoS kernel to display the values of variables after the evaluation of the cell content.

In [1]:
%preview ref_genome title all_names single_quoted

resource_path = '~/.sos/resources'
ref_genome    = "${resource_path}/hg19/refGenome.fasta"

sample_names  = ['A', 'B', 'C']
title         = "Sample ${sample_names[0]} results"
all_names     = "Samples ${sample_names}"

single_quoted = '${sample_names} is not interpolated'

'~/.sos/resources/hg19/refGenome.fasta'

'Sample A results'

'Samples A B C'

'${sample_names} is not interpolated'

SoS actions specified in **script format** is assumed to be in raw tripple quote and will be interpolated. For example, variable `num` is passed from SoS to a shell script in the following example

In [2]:
import random
num = random.randint(1, 6)
run:
    echo "Random number is ${num}"

Random number is 4


because the code is equivalent to

In [3]:
import random
num = random.randint(1, 6)
run(r"""
echo "Random number is ${num}"
""")

Random number is 6


## String representation of Python objects


SoS evaluate an expression and returns the string representation of the value.

If the value is of simple Python types such as string, boolean, and numbers, the standard Python representation of the value (`repr(obj)`) will be returned.

In [4]:
"${2**10}"

'1024'

In [4]:
user = 'Bob'
"${\"Hi, \" + user}"

'Hi, Bob'

For objects with an iterator interface (e.g. Python `list`, `tuple`, `dict`, and `set`), SoS join the string representation of each item by a space (or comma with `,` conversion flag). For example,

  * List of strings will be converted to a string by joining strings with a space or comma.
  * Dictionary of strings will be converted to a string by joining dictionary keys with no guarantee on the order of values.

In [6]:
names = ['James', 'Bob', 'Kathy']
"${names}"

'James Bob Kathy'

In [7]:
salary = {'James': 20, 'Bob': 25, 'Kathy': 18}
"Employees: ${salary}"

'Employees: Kathy James Bob'

It is worth noting that the step input and output variables (`input`, `output`, `depends`, and its looped version `_input`, `_output`, and `_depends`) are always list of targets. However, if the list contains only one filename, `"${input}"` would be the same as `"${input[0]}"`.

SoS string interpolation supports all string format and conversion specification as in the [Python string format specifier](https://docs.python.org/3/library/string.html#formatspec). That is to say, you can use `: specifier` at the end of the expression to control the format of the output. For example

In [8]:
"${1/3. :.2f}"

'0.33'

In [9]:
filename = 'test.sos'
"${filename:>20}"

'            test.sos'

SoS also extends the conversion operators of the standard Python string format string to give you more control on the string representation of objects, particularly file and directory names. The conversion operators should be specified after a `!` character.

SoS currently supports the following convertors:


| convertor | effect | input | output |
| :----------| :----- | :----- | :-------|
| `s`         | `str()`  | `${file1!s}` | `file 1.txt` |
| `r`         | `repr()`  | `${file1!r}` | `'file 1.txt'` |
| `q`         | `quoted()` | `${file1!q}` | `file\ 1.txt`|
| `e`         | `expanduser()` | `${file2!e}` | `/path/to/user/SoS/test.sos`|
| `a`         | `abspath(expanduser())` |  `${file2!a}` | `/path/to/user/SoS/test.sos` |
| `b`         | `basename())` |  `${file2!b}` | `test.sos` |
| `d`         | `dirname())` |  `${file2!d}` | `/path/to/user/SoS/` |
| `,`         | `', '.join()` | `${files!,}` | `a.txt, b.txt`|

here we assume

```
file1='file 1.txt'
file2='~/SoS/test.sos'
files=['a.txt', 'b.txt']
```


For example, if we need to create a file called `Bon Jovi.txt` and run

In [10]:
filename = 'Bon Jovi.txt'
run:
    echo "test" > ${filename}
    cat ${filename}

test Jovi.txt


We would get two files `Bon` and `Jovi.txt` because the command executed was actually

```
    echo "test" > Bon Jovi.txt
    cat Bon Jovi.txt
```

To avoid such problems, you can quote the filename using the `q` (quoted) convertor

In [11]:
run:
   echo "test" > ${filename!q}
   cat ${filename!q}

test


Depending on how your scripts handle filenames, it can be handy to pass filenames to scripts in expanded format. For example, it would be perfectly OK to pass `~/a.txt` to a shell script, but a `e` convertor should be added if you are passing the filename to a script that does not understand `~` in filenames.

Finally, the `,` converter can be used to output Python sequences with items separated by `,` instead of ` `. For example, if you are passing a Python list as R literals, you can pass them as follows:

In [12]:
salary = {'James': 20, 'Bob': 25, 'Kathy': 18}
R:
    employee = c(${salary!,r})
    print(employee)

[1] "Kathy" "James" "Bob"  


Here the `r` converter quotes the strings, and `,` converter joints the strings by `,`.

In [13]:
"${salary!,r}"

"'Kathy', 'James', 'Bob'"

Although the SoS format specifiers are convenient to use, you are not limited to these rules and can define your own ways to present objects. For example

In [14]:
def r_list(obj):
    return 'c(' + ','.join('{!r}'.format(x) for x in obj) + ')'

"${r_list(salary)}"

"c('Kathy','James','Bob')"

## Inclusion of sigil in string

If you need to stop SoS from interpolating some expressions in a string, you can precede the SoS sigil with a backslash (`\`). For example, the following script includes a shell script with shell variable `${file}` that is not interpolated by SoS:

In [15]:
[10]
title = "Sample ${sample_names[0]} results"
run:
    echo ${title}
    for file in a.txt b.txt c.txt
    do
        echo Processing \${file} ...
    done


Sample A results
Processing a.txt ...
Processing b.txt ...
Processing c.txt ...


## Alternative sigil

If your SoS script contains long bash, perl, or other scripts in which `${ }` are frequently used, it can be tedious and error prone to backquote all sigils in these script. In this case you can assign an alternative sigil to a script, even assign a different sigil for your entire SOS script. 

For example, the example above could be written as 

In [16]:
[10: sigil='%( )']
title = "Sample %(sample_names[0]) results"
run:
    echo %(title)
    for file in a.txt b.txt c.txt
    do
        echo Processing ${file} ...
    done

Sample A results
Processing a.txt ...
Processing b.txt ...
Processing c.txt ...


to use an alternative sigil for this particular step using a step option `sigil`, or change the system default sigil

In [17]:
%set_options sigil='%( )'

[10]
title = "Sample %(sample_names[0]) results"
run:
    echo %(title)
    for file in a.txt b.txt c.txt
    do
        echo Processing ${file} ...
    done

Sample A results
Processing a.txt ...
Processing b.txt ...
Processing c.txt ...


You can define any sigil as long as it has a left sigil and a right sigil separated by a space. You can even use sigils with identical left and right sigil (e.g. `# #`). However, unequal left and right sigils allows the use of nested expressions as shown in the following example, equal left and right sigils cannot be used in this way. 

In [18]:
%set_options sigil='${ }'

In [19]:
%preview file
idx = 1
filenames = ['a.txt', 'b.txt', 'c.txt']
file = "${idx}-th file is ${filenames[${idx}]}"

unexpected EOF while parsing (<string>, line 1): filenames[]


> Failed to preview expression file: name 'file' is not defined

Finally, if you feel that the Python's own string functions are good enough, you can set `None` as sigil and completely disable the string interpolation function of SoS.

In [25]:
%set_options sigil='%{ }'
price = 1000
"The price is ${}".format(price)

unexpected EOF while parsing (<string>, line 0): 