mt-aws-glacier

Perl Multithreaded multipart sync to Amazon AWS Glacier service.

Intro

Amazon AWS Glacier is an archive/backup service with very low storage price. However with some caveats in usage and archive retrieval prices. Read more about Amazon AWS Glacier

mt-aws-glacier is a client application for Glacier.

Version

Version 0.74 beta

Features

Does not use any existing AWS library, so can be flexible in implementing advanced features
Glacier Multipart upload
Multithreaded upload
Multipart+Multithreaded upload
Multithreaded retrieval, deletion and download
Tracking of all uploaded files with a local journal file (opened for write in append mode only)
Checking integrity of local files using journal
Ability to limit number of archives to retrieve

Coming-soon features

Multipart download (using HTTP Range header)
Ability to limit amount of archives to retrieve, by size, or by traffic/hour
Use journal file as flock() mutex
Checking integrity of remote files
Upload from STDIN
Some integration with external world, ability to read SNS topics
Simplified distribution for Debian/RedHat
Split code to re-usable modules, publish on CPAN (Currently there are great existing Glacier modules on CPAN - see Net::Amazon::Glacier by Tim Nordenfur https://metacpan.org/module/Net::Amazon::Glacier )
Create/Delete vault function

Planed next version features

Amazon S3 support

Important bugs/missed features

Zero length files are ignored
chunk size hardcoded as 2MB
Only multipart upload implemented, no plain upload
Retrieval works as proof-of-concept, so you can't initiate retrieve job twice (until previous job is completed)
No way to specify SNS topic
HTTP only, no way to configure HTTPS yet (however it works fine in HTTPS mode)
Internal refactoring needed, no comments in source yet, unit tests not published
Journal file required to restore backup. To be fixed. Will store file metainformation in archive description.

Production ready

Not recomended to use in production until first "Release" version. Currently Beta.

Installation

Install the following CPAN modules:
```
  		LWP::UserAgent JSON::XS
```

that's all

in case you use HTTPS, also install
```
  		LWP::Protocol::https
```
Some CPAN modules better install as OS packages (example for Ubuntu/Debian)
```
  		libjson-xs-perl liblwp-protocol-https-perl liburi-perl
```

Warnings ( MUST READ )

When playing with Glacier make sure you will be able to delete all your archives, it's impossible to delete archive or non-empty vault in amazon console now. Also make sure you have read all AWS Glacier pricing/faq.
Read their pricing FAQ again, really. Beware of retrieval fee.
Backup your local journal file. Currently it's impossible to correctly restore backup without journal file. ( Remote metadata storage will be implemented soon )
With low "partsize" option you pay a bit more (Amazon charges for each upload request)
With high partsize*concurrency there is a risk of getting network timeouts HTTP 408/500 or even signature expiration errors.
Memory usage (for 'sync') formula is ~ min(NUMBER_OF_FILES_TO_SYNC, max-number-of-files) + partsize*concurrency

Usage

Create a directory containing files to backup. Example /data/backup

Create config file, say, glacier.cfg

 		key=YOURKEY                                                                                                                                                                                                                                                      
 		secret=YOURSECRET                                                                                                                                                                                                                               
 		region=us-east-1 #eu-west-1, us-east-1 etc

Create a vault in specified region, using Amazon Console (myvault)
Choose a filename for the Journal, for example, journal.log

Sync your files

 		./mtglacier.pl sync --config=glacier.cfg --from-dir /data/backup --to-vault=myvault --journal=journal.log --concurrency=3

Add more files and sync again

Check that your local files not modified since last sync

 		./mtglacier.pl check-local-hash --config=glacier.cfg --from-dir /data/backup --to-vault=myvault -journal=journal.log

Delete some files from your backup location

Initiate archive restore job on Amazon side

 		./mtglacier.pl restore --config=glacier.cfg --from-dir /data/backup --to-vault=myvault -journal=journal.log --max-number-of-files=10

Wait 4+ hours

Download restored files back to backup location

		./mtglacier.pl restore-completed --config=glacier.cfg --from-dir /data/backup --to-vault=myvault -journal=journal.log

Delete all your files from vault

		./mtglacier.pl purge-vault --config=glacier.cfg --from-dir /data/backup --to-vault=myvault -journal=journal.log

Additional command line options

"concurrency" (with 'sync' command) - number of parallel upload streams to run. (default 4)
```
 		--concurrency=4
```
"partsize" (with 'sync' command) - size of file chunk to upload at once, in Megabytes. (default 16)
```
 		--partsize=16
```
"max-number-of-files" (with 'sync' or 'restore' commands) - limit number of files to sync/restore. Program will finish when reach this limit.
```
 		--max-number-of-files=100
```

Test/Play with it

create empty dir MYDIR
Set vault name inside cycletest.sh

Run

 ./cycletest.sh init MYDIR
 ./cycletest.sh retrieve MYDIR
 ./cycletest.sh restore MYDIR

OR

	./cycletest.sh init MYDIR
	./cycletest.sh purge MYDIR

Minimum AWS permissions

something like that

			{
			"Statement": [
				{
  				"Effect": "Allow",
  				"Resource":["arn:aws:glacier:eu-west-1:XXXXXXXXXXXX:vaults/test1",
	  				"arn:aws:glacier:us-east-1:XXXXXXXXXXXX:vaults/test1",
	  				"arn:aws:glacier:eu-west-1:XXXXXXXXXXXX:vaults/test2",
	  				"arn:aws:glacier:eu-west-1:XXXXXXXXXXXX:vaults/test3"],
  				"Action":["glacier:UploadArchive",
            				"glacier:InitiateMultipartUpload",
							"glacier:UploadMultipartPart",
            				"glacier:UploadPart",
            				"glacier:DeleteArchive",
							"glacier:ListParts",
							"glacier:InitiateJob",
							"glacier:ListJobs",
							"glacier:GetJobOutput",
							"glacier:ListMultipartUploads",
							"glacier:CompleteMultipartUpload"] 
				}
			]
			}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

mt-aws-glacier

Intro

Version

Features

Coming-soon features

Planed next version features

Important bugs/missed features

Production ready

Installation

Warnings ( MUST READ )

Usage

Additional command line options

Test/Play with it

Minimum AWS permissions

Files

README.md

Latest commit

History

README.md

File metadata and controls

mt-aws-glacier

Intro

Version

Features

Coming-soon features

Planed next version features

Important bugs/missed features

Production ready

Installation

Warnings ( MUST READ )

Usage

Additional command line options

Test/Play with it

Minimum AWS permissions