Skip to content

Latest commit



182 lines (120 loc) · 6.56 KB

File metadata and controls

182 lines (120 loc) · 6.56 KB


Perl Multithreaded multipart sync to Amazon AWS Glacier service.


Amazon AWS Glacier is an archive/backup service with very low storage price. However with some caveats in usage and archive retrieval prices. Read more about Amazon AWS Glacier

mt-aws-glacier is a client application for Glacier.


  • Version 0.74 beta


  • Does not use any existing AWS library, so can be flexible in implementing advanced features
  • Glacier Multipart upload
  • Multithreaded upload
  • Multipart+Multithreaded upload
  • Multithreaded retrieval, deletion and download
  • Tracking of all uploaded files with a local journal file (opened for write in append mode only)
  • Checking integrity of local files using journal
  • Ability to limit number of archives to retrieve

Coming-soon features

  • Multipart download (using HTTP Range header)
  • Ability to limit amount of archives to retrieve, by size, or by traffic/hour
  • Use journal file as flock() mutex
  • Checking integrity of remote files
  • Upload from STDIN
  • Some integration with external world, ability to read SNS topics
  • Simplified distribution for Debian/RedHat
  • Split code to re-usable modules, publish on CPAN (Currently there are great existing Glacier modules on CPAN - see Net::Amazon::Glacier by Tim Nordenfur )
  • Create/Delete vault function

Planed next version features

  • Amazon S3 support

Important bugs/missed features

  • Zero length files are ignored
  • chunk size hardcoded as 2MB
  • Only multipart upload implemented, no plain upload
  • Retrieval works as proof-of-concept, so you can't initiate retrieve job twice (until previous job is completed)
  • No way to specify SNS topic
  • HTTP only, no way to configure HTTPS yet (however it works fine in HTTPS mode)
  • Internal refactoring needed, no comments in source yet, unit tests not published
  • Journal file required to restore backup. To be fixed. Will store file metainformation in archive description.

Production ready

  • Not recomended to use in production until first "Release" version. Currently Beta.


  • Install the following CPAN modules:

      		LWP::UserAgent JSON::XS

that's all

  • in case you use HTTPS, also install

  • Some CPAN modules better install as OS packages (example for Ubuntu/Debian)

      		libjson-xs-perl liblwp-protocol-https-perl liburi-perl

Warnings ( MUST READ )

  • When playing with Glacier make sure you will be able to delete all your archives, it's impossible to delete archive or non-empty vault in amazon console now. Also make sure you have read all AWS Glacier pricing/faq.

  • Read their pricing FAQ again, really. Beware of retrieval fee.

  • Backup your local journal file. Currently it's impossible to correctly restore backup without journal file. ( Remote metadata storage will be implemented soon )

  • With low "partsize" option you pay a bit more (Amazon charges for each upload request)

  • With high partsize*concurrency there is a risk of getting network timeouts HTTP 408/500 or even signature expiration errors.

  • Memory usage (for 'sync') formula is ~ min(NUMBER_OF_FILES_TO_SYNC, max-number-of-files) + partsize*concurrency


  1. Create a directory containing files to backup. Example /data/backup

  2. Create config file, say, glacier.cfg

     		region=us-east-1 #eu-west-1, us-east-1 etc
  3. Create a vault in specified region, using Amazon Console (myvault)

  4. Choose a filename for the Journal, for example, journal.log

  5. Sync your files

     		./ sync --config=glacier.cfg --from-dir /data/backup --to-vault=myvault --journal=journal.log --concurrency=3
  6. Add more files and sync again

  7. Check that your local files not modified since last sync

     		./ check-local-hash --config=glacier.cfg --from-dir /data/backup --to-vault=myvault -journal=journal.log
  8. Delete some files from your backup location

  9. Initiate archive restore job on Amazon side

     		./ restore --config=glacier.cfg --from-dir /data/backup --to-vault=myvault -journal=journal.log --max-number-of-files=10
  10. Wait 4+ hours

  11. Download restored files back to backup location

    		./ restore-completed --config=glacier.cfg --from-dir /data/backup --to-vault=myvault -journal=journal.log
  12. Delete all your files from vault

    		./ purge-vault --config=glacier.cfg --from-dir /data/backup --to-vault=myvault -journal=journal.log

Additional command line options

  1. "concurrency" (with 'sync' command) - number of parallel upload streams to run. (default 4)

  2. "partsize" (with 'sync' command) - size of file chunk to upload at once, in Megabytes. (default 16)

  3. "max-number-of-files" (with 'sync' or 'restore' commands) - limit number of files to sync/restore. Program will finish when reach this limit.


Test/Play with it

  1. create empty dir MYDIR

  2. Set vault name inside

  3. Run

     ./ init MYDIR
     ./ retrieve MYDIR
     ./ restore MYDIR


	./ init MYDIR
	./ purge MYDIR

Minimum AWS permissions

something like that

			"Statement": [
  				"Effect": "Allow",

tracking pixel