# Specifying and synchronization of remote files

* **Difficulty level**: intermediate
* **Time need to lean**: 20 minutes or less
  * Paths that are relative to the current working directory are portable across hosts.
  * Use named paths (`#name`) to specify absolute paths that are different across local and remote hosts.
  * Options `to_host` and `from_host` specify files and directories send before task execution and retrieve after task execution, respectively.

## Path definitions and named paths

When local and remote hosts do not share file systems (or share only some file systems), things can get a bit complicated because SoS will need to decide what paths to use on the remote host. The most important thing to remember here is that **paths across local and remote hosts are linked by named paths defined in the SoS host definition file**.

For example, a host definition file (usually `~/.sos/hosts.yml`) could have the following `paths` definitions (incomplete)

```yaml
localhost: office
hosts:
    office:
        paths:
            home:  /Users/{user_name}
            projects: /Users/{user_name}/projects
            scratch: /Users/{user_name}/scratch
    cluster:
        paths:
            home:  /home/{user_name}
            projects: /home/projects/{user_name}
            scratch: /mount/scratch
```

so that paths under `home`, `projects`, or `scratch` could be linked across `office` and `cluster`.

Similar to `~/result.txt` that indicates `result.txt` under the user's home directory, which can be different across different hosts, **named path, namely paths that starts with `#name`, such as `#projects/RNASeq` are paths that are context specific**. If you specify `_output='#projects/RNASeq/genes.txt`, the paths will refer to different files on different hosts with different definitions for `#projects`.

## Use of relative path

Let us execute an task on a remote host defined in a docker image. The task does nothing but reporting the value of `_output` and its current working directory. The output file `result.txt` is sent back to the local host after the completion of the task.

As expected, the value of `_output` is a relative path `result.txt`. The working directory is `vatlab/sos-docs/src/user_guide` under `/root`, which corresponds to the locally working directory.

In [1]:
%preview result.txt

%run -c ~/docker.yml -q docker 

output: 'result.txt'
task:

sh: expand=True
    echo {_output} > {_output}
    echo `pwd` >> {_output}

0,1,2,3,4
,d87b0aa3308b3ec9,7c0789b29bfba84edefaultuser_guide,Ran for < 5 seconds,missing


result.txt
/root/vatlab/sos-docs/src/user_guide

## Absolute paths and named paths

If you would like to specify an absolute path, you can use either `~` as home directory, or any of the named paths.

In the following workflow, the output is specified as `#home/result.txt` (which is the same as `~/result.txt`. It is `/root/result.txt` on the remote host, and the current working directory remains the same.

In [2]:
%preview ~/result.txt

%run -c ~/docker.yml -q docker -s force

output: '#home/result.txt'
task:

sh: expand=True
    echo {_output} > {_output}
    echo `pwd` >> {_output}

0,1,2,3,4
,652942b23342ae82,87a9056943dc9a91defaultuser_guide,Ran for < 5 seconds,missing


/root/result.txt
/root/vatlab/sos-docs/src/user_guide

## Working directory of tasks (Option `workdir`)

The `workdir` of task is default to the current working directory, or, in the case of remote execution, the remote counterpart of the current working directory. Option `workdir` controls the working directory of the task.

For example, the following example adds `workdir='#home` to the task. The current working directory of the shell script is changed to `/root`, and the `_output` remains at `#home/result.txt`.

In [3]:
%preview ~/result.txt

%run -c ~/docker.yml -q docker -s force

output: '#home/result.txt'
task:

sh: expand=True, workdir='#home'
    echo {_output} > {_output}
    echo `pwd` >> {_output}

0,1,2,3,4
,dbc63a4bf9416b58,defaultf27dc1e18f432a10user_guide,Ran for < 5 seconds,missing


/root/result.txt
/root

However, **change of `workdir` might result in the misplace of the output files**. For example, if we remove `#home` from `_output` and specify `workdir`, the `_output` will be written to specified `workdir` but SoS still assumes that the `_output` is under the current project directory and will fail to retrieve the file.

In [5]:
%run -c ~/docker.yml -q docker -s force

output: 'result_error.txt'
task:

sh: expand=True, workdir='#home'
    echo {_output} > {_output}
    echo `pwd` >> {_output}

0,1,2,3,4
,260346ee4c42cce3,672fffefa5d36516defaultuser_guide,Ran for < 5 seconds,missing


[91mERROR[0m: [91m[default]: Failed to copy /root/vatlab/sos-docs/src/user_guide/result_error.txt from docker using command "rsync -a --no-g -e 'ssh -o 'ControlMaster=auto' -o 'ControlPath=/Users/bpeng/.ssh/controlmasters/%r@%h:%p' -o 'ControlPersist=10m' -p 32798' root@localhost:/root/vatlab/sos-docs/src/user_guide/result_error.txt "/Users/bpeng/vatlab/sos-docs/src/user_guide"": command return 23[0m


RuntimeError: Workflow exited with code 1

## Sending additional files before task execution (Option `to_host`)

Option `to_host` specifies additional files or directories that would be synchronized to the remote host before tasks are executed. It can be specified as

* A single file or directory (with respect to local file system), or
* A list of files or directories, or

The files or directories will be translated using the host-specific path maps. Note that if a symbolic link is specified in `to_host`, both the symbolic link and the path it refers to would be synchronized to the remote host.

Just to demontrate how to use this option, let us copy all notebooks in this directory to a remote host and count the number of them.

In [3]:
%preview -n wc.txt 
output: 'wc.txt'
task: to_host='task*.ipynb', queue='bcb' 
sh: expand=True
  wc -l *.ipynb > {_output}

0,1,2,3,4
,9e7b75df6a5d3767,5b7627b1ac52aa8fscratch_0user_guide,Ran for < 5 seconds,missing


     363 task_files.ipynb
     386 task_management.ipynb
     817 task_statement.ipynb
     223 task_tags.ipynb
     390 task_template.ipynb

## Retrieving additional files after task completion (Option `from_host`)

Option `from_host` specifies additional files or directories that would be synchronized from the remote host after tasks are executed. It can be specified as

* A single file or directory (with respect to local file system), or
* A list of files or directories

The files or directories will be translated using the host-specific path maps to determine what remote files to retrieve.