# Configuration Files

* **Difficulty level**: easy
* **Time need to lean**: 10 minutes or less
* **Key points**:
  * SoS reads multiple configuration files and merge the results
  * User configuration files can be specified with option `-c`
  * Content of configuration file is available through variable `CONFIG`
  * Host-specific paths can be accessed by `path(name, default)`
  

## SoS configuration files <a id="Configuration_files"></a>

SoS reads configurations from 

* A site configuration file `site_config.yml` under the sos package directory. This is where system adminstrators define system-wide configurations (e.g. host definitions) for all users.
* A host configuration file `~/.sos/hosts.yml` that defines properties of local and remote hosts.
* A global sos configuration file `~/.sos/config.yml` that defines other user-specific settings.
* And a configuration file specified by command line option `-c` that defines workflow-specific settings.

The configuration files should be in the format of [`YAML`](http://yaml.org/) or its subset format [`JSON`](http://json-schema.org/implementations.html). When a SoS script is loaded, SoS looks for and parses site and global configuration files, then optionally a configuration file specified by command line option `-c`. The results are stored in a global variable `CONFIG` that is available to the script.

### Merge of multiple configuration files

All configurations from the aforementioned files are merged to a single dictionary. A dictionary could therefore contain keys defined in different configuration files and a latter file could overwrite keys defined in a previous file. For example, if 

* `{'A': {'B': 'old', 'C': 'old'}` is defined in `~/.sos/config.yml` using
  
  ```
  A:
      B: old
      C: old
  ```
  
* `{'A': {'B': 'new', 'D': 'new'}` is defined in `my_config.yml` using
  ```
  A:
      B: new
      D: new
  ```

then the final result using `-c my_config.yml` would be `{'A': {'B': 'new', 'C': 'old', 'D': 'new'}}` as if a sinle configuration file with content
  ```
  A:
      B: new
      C: old
      D: new
  ```
is used. This is how site or global configurations can be overridden by user configurations.

### String interpolation

SoS interpolates string values if they contain `{ }`. The expressions enclosed by `{ }` would be evaluated with a local namespace that is the dictionary in which the key exists, and a global namespace that is the complete `CONFIG` dictionary. That is to say, if a configuration file contains


```
user_name: user
hosts:
  cluster:
    address: "{user_name}@domain.com:{port}"
    port: 123
```

`CONFIG['hosts']['cluster']['address']` would be interpolated as

```
user@domain.com:123
```

using `port` from the `CONFIG['host']['cluster']` and `user_name` from the top level `CONFIG['user_name']`. You will need to double the braces (`{{ }}` to include `{ }` in the config file.

Because key `user_name` is frequently used in `hosts.yml`, SoS automatically defines `user_name` as the local user ID (all lower case) in `CONFIG` if it is not defined in any of the configuration files.

### Derived dictionary keys

A special key `based_on` will be processed after all configuration files are loaded. The value of `based_on` should be one or more keys to other dictionaries in the configuration (e.g. `hosts.cluster`. The consequence of this key is that the items from the referred dictionaries would be merged to the present dictionary if they do not exist in the present dictionary. This allows you to derive a dictionary from an existing one. For example, 

```
hosts:
    head_node:
        description: head_node of cluster
        address: "{user_name}@domain.com:{port}"
        port: 123
        paths:
            home:   "/home/{user_name}"
    cluster:
        description: Cluster
        based_on: hosts.head_node
        queue_type: pbs
```

allows `hosts["cluster"]` to be derived from `hosts["head_node"]`, and

```
hosts:
    cat:
        based_on: hosts.a_very_long_name
```
effectively creates an alias `cat` to another host with `a_very_long_name`.

## Command `sos config`

Although `yaml` is not a difficult format to learn. It is often easier to use command `sos config` to check and set values in configuration files, especially for complex data types.

### Set configuration

`sos config` by default works on `~/.sos/config.yml` file. For example

In [1]:
!sos config --set cutoff 0.5

Set cutoff to '0.5'


creates `~/.sos/config.yml` if it does not exist, or append to this file otherwise, with content

In [2]:
!cat ~/.sos/config.yml

cutoff: '0.5'


You can specify a configuration file and add the content to it with option `-c`:

In [22]:
!sos config -c new_config.yml --set cutoff.low 1

Set cutoff to {'low': 1}


would create a configuration file `myconfig.yml`

In [3]:
!cat new_config.yml

cutoff:
  low: 1


Note that `cutoff.low` is interpreted as dictionary `cutoff` with key `low`, and the command is clever enough to handle partial values (e.g. of a dictionary). For example, the following command will update instead of replacing `cutoff`

In [4]:
!sos config -c new_config.yml --set cutoff.high 2

Set cutoff to {'low': 1, 'high': 2}


The `--set` option can handle lists:

In [7]:
!sos config -c new_config.yml --set values 1 2 3

Set values to [1, 2, 3]


And it accepts Python expressions such as a dictionary. The tricky part here is that SoS would interpolate command line (`!` magic) if you execute the command in SoS notebook, so you will have to double the braces here. You do not need to do it if you execute the command from a terminal.

In [1]:
!sos config -c new_config.yml --set samples "{{'A': 'A.txt'}}"

Set samples to {'A': 'A.txt'}


### View configurations

Running command `sos config` without any parameter will list all configurations in a dictonary format. Because we set `cutoff` to `0.5` to `~/.sos/config.yml`, the following command shows `cutoff` and a `user_name` key generated by SoS.

In [2]:
!sos config

{'cutoff': '0.5', 'user_name': 'bpeng1'}


If you are interested in only one of the items, you can use option `--get` to list it.

In [3]:
!sos config --get cutoff

cutoff	'0.5'


Of course you can use `-c` to include another configuration file

In [4]:
!sos config --get cutoff -c new_config.yml

cutoff.high	2
cutoff.low	1


or only one of the keys

In [5]:
!sos config -c new_config.yml --get cutoff.low

cutoff.low	1


### Remove a key from a configuration file

Finally, if you would like to remove a key from a configuration file, you can use option `--unset`.

In [7]:
!sos config -c new_config.yml --unset cutoff

Unset cutoff


Running `sos config` again will show the `cutoff` from `~/.sos/config.yml`, which was overriden by `cutoff` defined in `new_config.yml`.

In [10]:
!sos config --get cutoff -c new_config.yml

cutoff	'0.5'


## Variable `CONFIG`

As stated above, you can create a configuration file and load it with option `-c`, and the results would be available as a magic variable `CONFIG` to the workflow.

Let us create a yaml file with some simple content using a `report` action.

In [7]:
report: output='myconfig.yml'
    # A list of tasty fruits
    martin:
        name: Martin D'vloper
        job: Developer
        skill: Elite
    manager: Martin

When you execute any workflow with option `-c myconfig.yml`, the content of the configuration file would be available as keys of variable `CONFIG`.

Configuration files are frequently used to specify system configurations. For example, with the following definition of parameter `manager`, the workflow will take default value `Bob` if run without option,

In [4]:
%run
parameter: manager = CONFIG.get('manager', 'Bob')
print(manager)

Bob


take user specified value from command line

In [5]:
%run --manager Me
parameter: manager = CONFIG.get('manager', 'Bob')
print(manager)

Me


or values from a configuration file if a configuration file is specified

In [8]:
%run -c myconfig.yml
parameter: manager = CONFIG.get('manager', 'Bob')
print(manager)

Martin


## Host-dependent paths

<div class="bs-callout bs-callout-primary" role="alert">
    <h4><code>path(name, default)</code></h4>
    <p>The <code>path</code> datatype of SoS is derived from `pathlib.Path`. One of the additions of this datatype is paramters `<code>name</code> and <code>default</code>, which returns a pre-defined <code>path</code> defined in </p>
    <pre>
    CONFIG["hosts"][current-host]["paths"]
    </pre>
    <p>where <code>current-host</code> is normally <code>localhost</code> but can be one of the remote hosts if the function is called from a remote host. A <code>default</code> value could be returned if <code>name</code> is not available in the configuration.</p>
</div>

The `hosts` definitions in `~/.sos/hosts.yml` allow the definition of paths for different hosts. For clarity let us define a local configuration file that points `localhost` to a `example_host` configuration. 

In [13]:
report: output='myconfig.yml'
    localhost: example_host
    hosts:
        example_host:
            address: localhost
            paths:
                home: /Users/{user_name}
                project:  /Users/{user_name}/Documents
                tmp: /tmp

Without worrying about the `localhost` part for now, this configuration file defines a few paths for the localhost. The `paths` could be retrieved using `path(name='project')` so that you can write your script in a host-independent way. For example, the following workflow uses `path(name='project')` to get the host-specific `project` directory, which is defined as `/Users/bpeng1/Documents` in `myconfig.yml`.

In [3]:
%run -c myconfig.yml

sh: workdir=path(name='project')
   echo Working on `pwd`

Working on /Users/bpeng1/Documents


If you are uncertain if `project` is defined for current host, you can use `default` to specify a default value

In [6]:
%run -c myconfig.yml

import os
sh: workdir=path(name='scratch', default='~')
   echo Working on `pwd`

Working on /Users/bpeng1


## Further reading
* [Using remote filesystems](remote_filesystem.html) for `hosts.yml`