# Title

* **Difficulty level**: easy
* **Time need to lean**: 10 minutes or less
* **Key points**:
  * a
  

## h2

### Configure local and remote hosts

Now you need to configure your local and remote hosts so that SoS knows how to communicate between them. The hosts configurations should be defined in `~/.sos/hosts.yml`, and should look similar to

```
hosts:
    desktop:
        paths:
            home: /Users/myuser
    monster:
        address: dcdr1ue8ee.yourdomain.com
        paths:
            home: /home/myuser
```

The format is easy enough to edit directly, but you can also use commands such as

```
% sos config --hosts --set hosts.monster.address dcdrlue8ee.yourdomain.com
```

to add or change key `hosts['monster']['address']` to `dcdrlue8ee.yourdomain.com`.

### Configure `address`

You should specify in the `hosts` section the address of remote host, similar to

```
hosts:
  monster:
    address: dcdrlue8ee.yourdomain.com
```

If your account name differs between the local and remote servers, the complete address should be `username@address`. In this example `john@dcdrlue8ee.yourdomain.com` if the remote server account is `john`.

You can also specify `address` for your localhost if you plan to remotely login to the localhost.

### Configure `paths`

`paths` is a list of directories that will be translated between hosts. For example, if you work locally on a Mac machine with home directory `/Users/myuser`, and the remote server is a Linux machine with home directory `/home/myuser`, you should define a `paths` with definitions of `home` as follows:

```
hosts:
    desktop:
        paths:
            home: /Users/myuser
    monster:
        address: dcdr1ue8ee.yourdomain.com
        paths:
            home: /home/myuser
```

In this way, if the local data is `/User/myuser/projects/input.fastq`, the path will be translated to `/home/myuser/projects/input.fastq` during remote execution.

In more complicated cases where there are different directories, more than one `paths` can be specified. For example, if you have directories under different volumes, you can map them differently using

```bash
hosts:
    desktop:
        paths:
            home: /Users/myuser
            project: /Users/myuser/projects
            resource: /Volumes/Resource
    monster:
        address: dcdr1ue8ee.yourdomain.com
        paths:
            home: /home/myuser
            project: /home/myuser/scratch/projects
            resource: /home/myuser/resource
```

Note that

1. You can define multiple `paths` such as `home`, `scratch`, `working`, `resource`, but **paths should be defined for all hosts**.
2. All `paths` should be absolute (starts with `/` for Linux-like systems).
3. SoS expands local directories to absolute path before matching to a `paths`.
4. If there are multiple matches, SoS choose the longest-matching path. For example, path `/Users/myuser/projects/input.txt` would be identified as `project` (not `home`) and be mapped to `/home/myuser/scratch/projects/input.txt`.

### Configure `shared`

Option `shared` tells SoS which file systems are shared between local and remote hosts so that it does not have to synchronize files under these directories between the hosts.

* SoS assumes independent file systems so you do not have to specify option `shared` if the local and remote hosts does not share any file system.
 
* If your local and remote host share all file systems, you should list `/` as shared.

    ```
    hosts:
        desktop:
            shared:
                ALL: /
        monster:
            shared:
                ALL: /
    ```
    The name `ALL` does not matter as long as they match between hosts.

  
* If your local and remote host share one or more shared volumes, you can specify them with

    ```    
    hosts:
        desktop:
            shared:
                project: /project
        monster:
            shared:
                project: /scratch/project
    ```
  
  to indicate that local files under `/project` shared to `monster`.

Items under `shared` are treated as special `paths`. Files under these directories are mapped, but not synchronized.

Note that it is a bad idea to use dropbox or google drive as shared drives because files under these directories are not actually shared so a file created locally will not be available instantly on the remote host.

### Specify `localhost`

After you configure both local and remote host, you will need to tell sos what your `localhost` is in the `hosts` list using command

```
% sos config --global --set localhost desktop
```

which actually writes `localhost: desktop` in the system configuration file. 

If you have defined multiple hosts in the `hosts.yml` file, you should distribute this file to all hosts and set `localhost` accordingly, so that all machines know how to communicate with each other.

### Sample configurations

The server settings are critically important for the successful execution of commands on remote servers. As an example, I am working on a Mac mini (with limited CPU/RAM) and have access to a Mac Pro workstation and a Linux server. The hosts configurations for these machines are

```
hosts:
  mini:
    paths:
      home: /Users/bpeng1
      resource: /Users/bpeng1/.sos/resource
  macpro:
    address: mp-bpeng.mdanderson.edu
    paths:
      home: /Users/bpeng1
      resource: /Volumes/HOME/resource
  linux:
    address: dcdrlpmcfd.mdanderson.edu
    paths:
      home: /home/bpeng1
      resource: /home/bpeng1/.sos/resource
```

I defined two `paths` named `home` and `resource` because although `resource` is at the same location `~/.sos/resource`, it is in a dedicated volume `/Volumes/HOME/` on the macpro.

With this `hosts.yml` and proper definition of `localhost` on each machine, it is possible to submit jobs from `mini` to `macpro` and `linux`, from `macpro` to `linux` and from `linux` to `macpro`. It is not possible to submit jobs remotely to `mini` because no `address` is defined for this host.

### Variable translation

Each task has a **context dictionary** that contains variables that will be used to, for example, compose scripts to be executed. When the script is executed on the remote host, targets specified in variables `step_input` (`_input`), `step_output` (`_output`), `step_depends` (`_depends`) will be **translated synchronized from local to remote file system** (unless they are of type `remote`). The values of these variables will be translated on remote host.

If there are additional files that you would like to synchronize, you should put them in `input:` or `depends:` statement. If for any reason you would like to translate a variable from local to remote host but do not want to synchronize the file, you can add it to option `map_vars`.

### Running task

With all the pieces put together, you can now execute the task on the remote host using `task` options

```sos
depends:  hg19_fasta, hg19_genes_gtf
output:   f"{hg19_star_index}/chrName.txt"
task:     queue='monster', from_host=hg19_star_index
run:  expand=True
    STAR \
        --runThreadN 8 \
        --runMode genomeGenerate \
        --genomeDir {hg19_star_index} \
        --genomeFastaFiles {hg19_fasta} \
        --sjdbGTFfile {hg19_genes_gtf} \
        --sjdbOverhang 100
```

For this example,

1. SoS automatically transfers all input (None in this example) and dependent files (`hg19_fasta1 and `hg19_genes_gtf` in this example) so no `to_host` is needed.
2. Option `from_host` is needed because we need to transfer not only the reprsenting output file (`hg19_star_index}/chrName.txt`), but also the whole directory containing the whole indexes (`hg19_star_index`).

SoS tries its best to automate the process while allowing you to tweak the details with runtime options. Just to recap the use of these  options:

* `to_host` is needed to transfer **additional input** files or directories to remote host.
* `from_host` is needed to tranfer **additional output** files or directories from remote host.


## Further reading

* 