# Configuration files

* **Difficulty level**: easy
* **Time need to lean**: 10 minutes or less
* **Key points**:
  * SoS reads multiple configuration files and merge the results
  * User configuration files can be specified with option `-c`
  * Content of configuration file is available through variable `CONFIG`
  * Host-specific paths can be accessed by `path(name, default)`
  

## SoS configuration files

SoS reads configurations from 

* A site configuration file `site_config.yml` under the sos package directory. This is where system adminstrators define system-wide configurations (e.g. host definitions) for all users.
* A host configuration file `~/.sos/hosts.yml` that defines properties of local and remote hosts.
* A global sos configuration file `~/.sos/config.yml` that defines other user-specific settings.
* And an optional configuration file specified by command line option `-c` that defines workflow-specific settings.

The configuration files should be in the format of [`YAML`](http://yaml.org/) or its subset format [`JSON`](http://json-schema.org/implementations.html). When a SoS script is loaded, SoS looks for and parses site and global configuration files and an optional user-specified configuration file. The results are used by SoS for the execution of workflows, and are available to the workflow as a global variable `CONFIG`.

### Merge of multiple configuration files

All configurations from the aforementioned files are merged to a single dictionary. A dictionary could therefore contain keys defined in different configuration files and a latter file could overwrite keys defined in a previous file. For example, if 

* `{'A': {'B': 'old', 'C': 'old'}` is defined in `~/.sos/config.yml` using
  
  ```
  A:
      B: old
      C: old
  ```
  
* `{'A': {'B': 'new', 'D': 'new'}` is defined in `my_config.yml` using
  ```
  A:
      B: new
      D: new
  ```

then the final result using `-c my_config.yml` would be `{'A': {'B': 'new', 'C': 'old', 'D': 'new'}}` as if a sinle configuration file with content
  ```
  A:
      B: new
      C: old
      D: new
  ```
is used. This is how **site or global configurations can be overridden by user configurations**.

### Derived dictionary keys

A special key `based_on` will be processed after all configuration files are loaded. The value of `based_on` should be one or more keys to other dictionaries in the configuration (e.g. `hosts.cluster`). The consequence of this key is that the items from the referred dictionaries would be merged to the present dictionary if they do not exist in the present dictionary. This allows you to derive a dictionary from an existing one. For example, 

In [1]:
%save my_config.yml -f

hosts:
    head_node:
        description: head_node of cluster
        address: domain.com
    cluster:
        description: Cluster
        based_on: hosts.head_node
        queue_type: pbs

In [2]:
%run -c my_config.yml -v1
print(CONFIG['hosts']['cluster'])

{'description': 'Cluster', 'queue_type': 'pbs', 'address': 'domain.com'}


### String interpolation

SoS interpolates string values if they contain `{ }`. The expressions enclosed by `{ }` would be evaluated by variables defined in in the root dictionary of `CONFIG`, or the dictionary in which the value is defined, or variables provided by users in case of task or workflow templates.

For example, let us define a config file using magic `%save`

In [3]:
%save my_config.yml -f

user_name: user
hosts:
  cluster:
    address: "{user_name}@domain.com:{port}"
    port: 123

When the configuration file is loaded with option `-c`, the `address` in `hosts.cluster` is expanded with `user` defined in the root dictionary, and `port` defined in the local dictionary. 

In [4]:
%run -c my_config.yml

print(CONFIG['hosts']['cluster'])

{'address': 'user@domain.com:123', 'port': 123}


Because key `user_name` is frequently used in `hosts.yml`, **SoS automatically defines `user_name` as the local user ID (all lower case) in `CONFIG` if it is not defined in any of the configuration files**.

String interpolation happens after `based_on`, so the following usage is allowed:

In [5]:
%save my_config.yml -f
hosts:
  host_r:
    address: localhost
    R_version: 3.1
    workflow_template: |
      echo module load R/{R_version}
      {command}
  host_r33:    
    based_on: hosts.host_r
    R_version: 3.3

This configuration file defines hosts named `host_r` and `host_r33` with address `localhost`. The `workflow_template` would be used if the host name is specified with option `-r`. Although the example is meant for a cluster system that loads appropriate module with command `module load`, this example just `echo` the `module load` line to show how the `workflow_template` is expanded.

First, if we use host `host_r`, `R_version=3.1` will be used:

In [6]:
%run -r host_r -c my_config.yml -v0
print('Hello')

module load R/3.1
[32m[[0m[32m#[0m[32m][0m 1 step processed (1 job completed)


If we use host `host_r33`, `R_version=3.3` will be used to expand `workflow_template` derived from `host_r`.

In [7]:
%run -r host_r33 -c my_config.yml -v0
print('Hello')

module load R/3.3
[32m[[0m[32m#[0m[32m][0m 1 step processed (1 job completed)


Then, finally, if we provide a value of `R_version` from command line, it will override any existing values defined in the config file.


In [8]:
%run -r host_r R_version=4.3 -c my_config.yml -v0
print('Hello')

module load R/4.3
[32m[[0m[32m#[0m[32m][0m 1 step processed (1 job completed)


## Use of configuration files

### Variable `CONFIG`

As shown above, the dictionary loaded from SoS configuration files is available to SoS workflow as variable `CONFIG`. This allows a workflow to retrieve settings from configuration files.

For example, a workflow could be define as follows, which uses `Bob` as a default value for `manager`

In [9]:
%run -v0
parameter: manager = CONFIG.get('manager', 'Bob')
print(manager)

[32m[[0m[32m#[0m[32m][0m 1 step processed (1 job completed)


uses `Elena` from command line

In [10]:
%run --manager Elena -v0
parameter: manager = CONFIG.get('manager', 'Bob')
print(manager)

[32m[[0m[32m#[0m[32m][0m 1 step processed (1 job completed)


Or, with the following configuration file

In [11]:
%save myconfig.yml -f
manager: Martin

use default values from a configuration file

In [12]:
%run -c myconfig.yml -v0
parameter: manager = CONFIG.get('manager', 'Bob')
print(manager)

[32m[[0m[32m#[0m[32m][0m 1 step processed (1 job completed)


### Host-dependent paths

<div class="bs-callout bs-callout-primary" role="alert">
    <h4><code>path(name, default)</code></h4>
    <p>The <code>path</code> datatype of SoS is derived from `pathlib.Path`. One of the additions of this datatype is paramters `<code>name</code> and <code>default</code>, which returns a pre-defined <code>path</code> defined in </p>
    <pre>
    CONFIG["hosts"][current-host]["paths"]
    </pre>
    <p>where <code>current-host</code> is normally <code>localhost</code> but can be one of the remote hosts if the function is called from a remote host. A <code>default</code> value could be returned if <code>name</code> is not available in the configuration.</p>
</div>

The `hosts` definitions in `~/.sos/hosts.yml` allow the definition of paths for different hosts. For clarity let us define a local configuration file that points `localhost` to a `example_host` configuration. 

In [13]:
%save myconfig.yml -f
localhost: example_host
hosts:
    example_host:
        address: localhost
        paths:
            home: /Users/{user_name}
            project:  /Users/{user_name}/Documents
            tmp: /tmp

Without worrying about the `localhost` part for now, this configuration file defines a few paths for the localhost. The `paths` could be retrieved using `path(name='project')` so that you can write your script in a host-independent way. For example, the following workflow uses `path(name='project')` to get the host-specific `project` directory, which is defined as `/Users/bpeng1/Documents` in `myconfig.yml`.

In [14]:
%run -c myconfig.yml -v1

sh: workdir=path(name='project')
   echo Working on `pwd`

Working on /Users/bpeng1/Documents


If you are uncertain if `project` is defined for current host, you can use `default` to specify a default value

In [15]:
%run -c myconfig.yml -v1

import os
sh: workdir=path(name='scratch', default='~')
   echo Working on `pwd`

Working on /Users/bpeng1
