Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too much CPU used when using multipart - should have a way to throttle upload speed ? #32

Closed
farialima opened this issue Sep 20, 2012 · 6 comments

Comments

@farialima
Copy link

This may not be easy to fix, but it's feedback, never bad to give...

I'm using glacier-cmd-interface to upload from DreamHost shared hosting to Amazon. however, for files that are bigger than 100MB I get:

(ve)[tigers]$ amazon-glacier-cmd-interface/glacier/glacier.py upload my_backups my_backup_file


Yikes! One of your processes (python, pid 14861) was just killed for excessive resource usage.                                                                                  
Please contact DreamHost Support for details.


Killed
(ve)[tigers]$ 

If the file is less than 100MB things are OK.

The process is killed while in:


    def make_request(self, method, path, headers=None, data='', host=None,
                     auth_path=None, sender=None, override_num_retries=None):
        headers = headers or {}
        headers.setdefault("x-amz-glacier-version","2012-06-01")
        return super(GlacierConnection, self).make_request(method, path, headers,
                                                           data, host, auth_path,
                                                           sender, override_num_retries)

So it may be that we are sending too much / too fast. I've tried to throtle CPU usage, but to no avail.

I would suggest to add a way to throttle the upload speed (as an option): I would suppose it would fix this, and be useful for many people (you don't want backup upload to take all the bandwidth...)

Probably not easy to implement - but who know...

Since this library seems very useful, I thought it was worth reporting any issue I have ! thank you for this lib.

@gburca
Copy link
Contributor

gburca commented Sep 21, 2012

If you want to throttle upload speed (and you're in control of the machine, and it's running some flavor of Linux, and etc...), take a look at /sbin/tc. It's not the most user-friendly tool out there, but it's very powerful. With a little bit of scripting you can run it before you start the glacier upload and it's probably the most effective way to throttle your bandwidth. For some inspiration, here's the relevant portion from the script I use:

TC=/sbin/tc
IF=eth0
REGION="us-east-1"
IP=`dig +short +answer "glacier.${REGION}.amazonaws.com" A | grep -v '\.$' | tr '\n' ' '`
U32="$TC filter add dev $IF protocol ip parent 1:0 prio 1 u32"

$TC qdisc add dev $IF root handle 1: htb default 30
$TC class add dev $IF parent 1: classid 1:2 htb rate 200kbps
for ip in $IP; do
    $U32 match ip dst $ip/32 flowid 1:2
done

And to remove the filtering

$TC qdisc del dev $IF root

The nice part is that this technique works for any application not just the glacier command line tool.

@offlinehacker
Copy link
Contributor

Well solution by @gburca is cool and i think it should solve most of the problems, but still we might implement speed throttle once there won't be any more important bugs to solve, so let's leave this ticket open.

@wvmarle
Copy link
Contributor

wvmarle commented Oct 10, 2012

Looking back at this issue I suspect it had to do with memory use rather than upload speed (the original upload code would use 4-5 times block size - so files >100 MB would eat up 400-500 MB of RAM - not surprising a cloud host would baulk at such a resource demand).

For throttling upload speed: at this moment glacier-cmd supports only a single upload thread at a time (now that could be an enhancement: allowing multiple uploads in parallel). It will use only as much speed as the system allows. Besides that I have no idea on how to throttle speeds, I think this would have to be done in boto, which is where the data is actually sent out.

@uskudnik
Copy link
Owner

Yup, since we are migrating to boto this will most probably have to be done in the boto itself. Whether or not they will accept this or if they want this, I have no idea.

I did a bit of research on the subject and it appears it can be done, but seems to be a bit complicated.

See http://stackoverflow.com/questions/456649/throttling-with-urllib2 and http://pastie.org/3120175.

It also appears twisted can do it but I would rather not mix Twisted into the equation if we can do it on our own. http://twistedmatrix.com/documents/10.1.0/api/twisted.protocols.policies.ThrottlingFactory.html

@wvmarle
Copy link
Contributor

wvmarle commented Oct 10, 2012

I just had a quick look at the sources, and I think it'd be rather easy to implement because basically what they do is "send some data, wait a bit, send some more data, wait again" so that the overall rate is within a limit. We could do the same: send a part of data, wait a bit, send another part of data. But then you're not really limiting the rate, you're sending in bursts, saturating your pipe part of the time, sending nothing the rest of the time.
The key question: is this useful? Is it worth the effort? Should we attempt it to begin with? Normally people want as fast an upload as possible. And as I said I suspect it's the memory it took, not the transfer speed - farialima could confirm this.
TC I'm sure is the best solution if you want to limit speeds. A bitch to set up, but overall more flexible, and it's designed to do just that. Figure out how it can be done, add example to the docs, forget about it. I think those who truly need it will be able to figure it out.

@uskudnik
Copy link
Owner

Example in the doc will do. Thats completely in line with the whole Linux philosophy of having one tool for the job and in that fashion TC gives a lot more flexibility to our users that we could ever provide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants