# Specifying and synchronization of remote files

* **Difficulty level**: intermediate
* **Time need to lean**: 20 minutes or less
  * Paths that are relative to the current working directory are portable across hosts.
  * Use named paths (`#name`) to specify absolute paths that are different from local and remote hosts.
  * Options `to_host` and `from_host` specify files and directories send before task execution and retrieve after task execution, respectively.

## Path defintions for hosts

When local and remote hosts do not share file systems (or share only some file systems), things can get a bit complicated because SoS will need to decide what paths to use on the remote host. The most important thing to rememver here is that **paths across local and remote hosts are linked by named paths defined in the SoS host definition file**.

For example, a host definition file (usually `~/.sos/hosts.yml`) could have the following `paths` definitions

```yaml
localhost: office
hosts:
    office:
        paths:
            home:  /Usrs/{user_name}
            projects: /Users/{user_name}/projects
            scratch: /usr/{user_name}/scratch
    cluster:
        paths:
            home:  /home/{user_name}
            projects: /home/projects/{user_name}
            scratch: /mount/scratch
```

so that paths under `home`, `projects`, or `scratch` could be linked across `office` and `cluster`.


## Working directory of tasks (Option `workdir`)

The `workdir` of task is default to the current working directory, or, in the case of remote execution, the remote counterpart of the current working directory. The `workdir` must be under one of the named paths.

Option `workdir` controls the working directory of the task. For example, the following step downloads a file to the `resource` directory using [action `download`](download.html). Note that SoS creates `workdir` if it does not exist.

In [1]:
task: queue='localhost', workdir='resource'

download:
  ftp://speedtest.tele2.net/512KB.zip

0,1,2,3,4
,85ea891331ab4bcb,5057fa441d6e1755cell_5c57eee2user_guide,Ran for < 5 seconds,completed


In [2]:
!ls resource

512KB.zip


## Sending additional files before task execution (Option `to_host`)

Option `to_host` specifies additional files or directories that would be synchronized to the remote host before tasks are executed. It can be specified as

* A single file or directory (with respect to local file system), or
* A list of files or directories, or

The files or directories will be translated using the host-specific path maps. Note that if a symbolic link is specified in `to_host`, both the symbolic link and the path it refers to would be synchronized to the remote host.

Just to demontrate how to use this option, let us copy all notebooks in this directory to a remote host and count the number of them.

In [3]:
%preview -n wc.txt 
output: 'wc.txt'
task: to_host='task*.ipynb', queue='bcb' 
sh: expand=True
  wc -l *.ipynb > {_output}

0,1,2,3,4
,9e7b75df6a5d3767,5b7627b1ac52aa8fscratch_0user_guide,Ran for < 5 seconds,missing


     363 task_files.ipynb
     386 task_management.ipynb
     817 task_statement.ipynb
     223 task_tags.ipynb
     390 task_template.ipynb

## Retrieving additional files after task completion (Option `from_host`)

Option `from_host` specifies additional files or directories that would be synchronized from the remote host after tasks are executed. It can be specified as

* A single file or directory (with respect to local file system), or
* A list of files or directories, or

The files or directories will be translated using the host-specific path maps to determine what remote files to retrieve.

## Absolute paths and named paths

The use of relative paths are highly recommended because relative paths are not system dependent. Although `data/sample1.csv` can be under different paths on local and remote hosts, SoS handles the mapping of current project directory and `data/sample1.csv` would represent the same file under local and remote hosts.

If you have to specify an absolute path, you will need to specify them with named patchs as follows:

In [7]:
%run -r htc-headnode

output: '#home/sos/sos-docs/src/user_guide/random_output.txt'

import random
with open(_output, 'w') as out:
  out.write(f'Random number is {random.randint(0, 1000)}')

INFO: Running [32mdefault[0m: 
INFO: [32mdefault[0m (index=0) is [32mignored[0m due to saved signature
INFO: [32mdefault[0m output:   [32m/home/bpeng1/sos/sos-docs/src/user_guide/random_output.txt[0m
INFO: Workflow default (ID=95ff85f084c10b32) is ignored with 1 ignored step.
