Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use tar archives to upload files #10

Closed
skin opened this issue Nov 20, 2012 · 7 comments
Closed

Use tar archives to upload files #10

skin opened this issue Nov 20, 2012 · 7 comments

Comments

@skin
Copy link

skin commented Nov 20, 2012

Hi @vsespb
First of all, let me say thank you for your tool, it's very good!
Is it possible to add an option to archive data in tar chunks before upload them to glacier?
Create tar files will save glacier requests and will also improve the upload when a directory is full of small files.

Thanks!

Cheers

@vsespb
Copy link
Owner

vsespb commented Nov 20, 2012

Hello.
Thank you for your feedback.

Yes. I am planning to add option for multithreaded upload of one file from STDIN. So you will be able to
redirect tar output to mtglacier input. Something like that: "tar cz ... | mtglacier"

@skin
Copy link
Author

skin commented Nov 20, 2012

That could be really good!
But how do you let it interact with the journal parameter?
Could be possible to do the same during the restore phase?

Thank you again

@vsespb
Copy link
Owner

vsespb commented Nov 20, 2012

I was planning to do something like that:

tar cz ... | mtglacier --journal=/path/to/journal --stdin --stdin-file-name=myarchive.tar --from-dir=/path/to/data

that will

  1. put file into Glacier
  2. give it a filename "myarchive.tar" in journal
  3. possibly and optionally write it also into /path/to/data/myarchive.tar

after restore

  1. restore it to myarchive.tar. no unpacking.

that's all.

This way user will save time of writing huge archive to disk and reading it again by mtglacier.

What you asking for - auto arching/unarchiving - will make workflow pretty complex to the end user.
So I think it's better/easier for user to code .sh script which will split data to archives (maybe few archives, maybe not only tar etc) and then upload to glacier. And then possibly automate restore from archives using shell scripts.

Also other thoughts: Glacier full restore is pretty expensive, so typical use case is use it as secondary backup, not as primary. In this case ease of restore is not much required.

But maybe you have idea how to support, say, tar + tag.gz archiving / unarchiving and not make the app extra complicated to implement and use, and keep Journal/Sync concept easy understandable?

@skin
Copy link
Author

skin commented Nov 20, 2012

I do like your upload approach, i would say it's exactly what i would like to have.
Yes point #3 can be optional, everytime i upload a tar file i don't need to keep it.

About the restore, my question was just to understand your intentions, but i see your point of view.
I know Glacier full restore is pretty expensive but it will be anyway my first backup system, for now.

I guess i will implement the restore function with a bash script file that could be executed after mtglacier.
Something like this :

  1. iterate over the tar files list (ordered by a naming convention defined during upload)
  2. extract every tar file into the /targetfolder
  3. delete it

@vsespb
Copy link
Owner

vsespb commented Nov 20, 2012

Ok. cool.
Yes, use shell scripting for now. I am going to implement reading from STDIN, but this is not first priority for me.
I will keep this issue open and inform you here after implementing this. But estimate is probably 2-3 months.

@skin
Copy link
Author

skin commented Nov 21, 2012

Ok, let me know if i can help you!
Anyway, i will do this workaround in order to simulate the STDIN

  1. create tar
  2. move tar in tmp folder
  3. mtglacier --journal=/somewhere/journal --from-dir=/tmpfolder
  4. remove tar

Once your implementation is ready i will switch my bash script.

@vsespb
Copy link
Owner

vsespb commented Mar 4, 2013

Upload single file or file from STDIN implemented (but only together with Journal functionality). Multithreaded. Without intermediate file buffering.
Download of single files not implemented yet.

see upload-file command in README.

So, I am closing this. Reopen if you have more questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants