Skip to content

Bash script utility for running secure copy of a file in chunks in parallel

License

Notifications You must be signed in to change notification settings

yinonavraham/pscp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parallel Secure Copy (pscp)

Bash script utility for running secure copy of a file in chunks in parallel.

Installation

From your favorite terminal, run:

curl -s https://raw.githubusercontent.com/yinonavraham/pscp/master/bin/setup/install.sh | bash

Usage

pscp [OPTIONS] FILE DEST

Example

In order to copy a local file named myfile to path /path/to/ in a remote host named some-other-host, authenticating as myuser (based on SSH key), use:

pscp myfile myuser@some-other-host:/path/to/

Basic Options

  • --help
    Show usage help. Use this to inspect all options the utility supports.
  • --threads=<value> Specify the number of chunks to split the file to. These chunks are then sent in parallel.
  • --verbose
    Print verbose output of the actions done by the script.

Implementation

The script performs the following main operations:

  1. Create a temporary local transaction directory. This directory is used to save all temporary files created during the transfer.
  2. Split the source file to chunks (according to the threads value). These chunk files are saved in the local transaction directory.
  3. Create a temporary transaction directory in the remote host.
  4. Start background processes to transfer each chunk file (using scp). Each such process appends a line to a control file (saved in the local transaction directory).
  5. Wait for all background processes to finish. This is done by polling on the control file.
  6. Assemble the file in the remote host from all the chunk files.
  7. Verify the checksum of the source file and the remote file is the same.
  8. Cleanup the transaction directories (local and remote).

Maintenance

Update
To update pscp to its latest version, use:

pscp --setup=update

Uninstall
To remove pscp installation, use:

pscp --setup=uninstall

Statistics

This script started as an experiment. As part of the experiment files of various sizes were copied between AWS instances, from Frankfurt DE to Oregon US. The same files were copied 10 times using scp (i.e. single process) and 10 times using pscp with default threads (i.e. 10 parallel scp processes, each transfers a single chunk). The results, as can be seen below, transfer time in parallel (using pscp) is approximately 30%-40% of the time compared to using scp.

Average Transfer Time vs. Size

Average Transfer Rate vs. Size

Notes

There are several overheads which are not included in the calculations above:

  1. Splitting the file in the source and re-assembling it in the destination.
    This is a mandatory overhead, but it usually takes no more than 2-3 seconds.
  2. In order to verify that the file was transferred successfully, the script performs a checksum verification. It calculates the checksum (SHA-1) of the file in the source, the checksum of the file in the destination and compares them. This verification also takes ~2 seconds, but can be disabled using the --no-verify flag.
  3. The script cleans up temporary files (mainly the chunks and a control file) from both the source and the destination. This cleanup also takes ~2 seconds, but can be disabled using the --no-cleanup flag.

Contributing

Contributions of all kinds are more than welcome (bug fixes, new features, documentation, etc.).
Feel free to create an issue, or even better - open a pull request.

About

Bash script utility for running secure copy of a file in chunks in parallel

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages