BaseStorage interface is missing callback parameter to optimize for Tornado Runloop #452

RobertBiehl · 2015-03-27T07:27:24Z

I noticed a huge bottleneck when using the thumbor storages and result storages:

The BaseStorage interface in
https://github.com/thumbor/thumbor/blob/master/thumbor/result_storages/__init__.py#L22

class BaseStorage(object):
    ...
    def put(self, bytes):
        ...

    def get(self):
        ...

supplies only synchronous methods with return values instead of callbacks like the loader interface in
https://github.com/thumbor/thumbor/blob/master/thumbor/loaders/file_loader.py

def load(context, path, callback):

(perfectly fine)

https://github.com/thumbor/thumbor/blob/master/thumbor/result_storages/__init__.py
Should look more like

#!/usr/bin/python
# -*- coding: utf-8 -*-
...
import os
from os.path import exists

class BaseStorage(object):
    def __init__(self, context):
        self.context = context

    def put(self, bytes, callback):
        raise NotImplementedError()

    def get(self, callback):
        raise NotImplementedError()

    def last_updated(self, callback):
        raise NotImplementedError()

    def ensure_dir(self, path, callback):
        raise NotImplementedError()

ndc-empora · 2015-03-27T07:35:26Z

+1

vvh-empora · 2015-03-27T08:20:39Z

+1

navybk · 2015-03-27T09:22:18Z

+1

masom · 2015-03-27T14:43:55Z

There would definitely be a performance boost when using slower storages although this will introduce a BC with all third-party storages.

I'll see if I can come up with a PR that would work with callback / no-callback scenarios.

RobertBiehl · 2015-03-27T15:15:42Z

Thank you for looking into this.
We would like to use https://github.com/willtrking/thumbor_aws to load images from s3 and and store results in s3 as well. While we noticed that thumbor_aws completely ignored the fact that you are supposed to use the runloop in tornado, we noticed that while we could rewrite the loader, there was no solution for the storages and result_storages.

Rob

masom · 2015-03-27T15:32:54Z

As of 4.12 thumbor uses a threadpool to load results from storage: https://github.com/thumbor/thumbor/blob/master/thumbor/handlers/__init__.py#L258

Instead of using callbacks I'll attempt to move the storage.get to the threadpool.

masom · 2015-03-27T15:56:38Z

@RobertBiehl have you tried enabling the thread pool?

RobertBiehl · 2015-03-27T16:03:10Z

Not yet, I will have a look. Thanks.

masom · 2015-03-27T16:12:34Z

@RobertBiehl I'm about to submit a patch.

ab -n 100 -c 10

With _fetch in the ThreadPool:

Concurrency Level:      10
Time taken for tests:   34.929 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      9056800 bytes
HTML transferred:       9031300 bytes
Requests per second:    2.86 [#/sec] (mean)
Time per request:       3492.871 [ms] (mean)
Time per request:       349.287 [ms] (mean, across all concurrent requests)
Transfer rate:          253.22 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       0
Processing:  1139 3237 1348.8   3207    5222
Waiting:     1139 3229 1349.2   3172    5187
Total:       1139 3237 1348.8   3207    5222

Current Thumbor pretty much did not finish when the storage was blocking.

RobertBiehl · 2015-03-27T16:18:28Z

@masom which value ENGINE_THREADPOOL_SIZE is reasonable?

RobertBiehl · 2015-03-27T16:19:52Z

We were seeing about 2 req/s, and only a 5% CPU usage without threadpool. (So not that much slower than you benchmark)

guilhermef · 2015-03-27T16:22:51Z

Why don't we go with the first option of putting callbacks on the storage methods and just deal with this breaking change ?
It would just break people who implement their own storage and we'll update the version as a major.

RobertBiehl · 2015-03-27T16:25:06Z

@guilhermef I'd prefer that! :)

masom · 2015-03-27T17:27:19Z

Current thumbor with slow storage:

Concurrency Level:      10
Time taken for tests:   280.485 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      9056800 bytes
HTML transferred:       9031300 bytes
Requests per second:    0.36 [#/sec] (mean)
Time per request:       28048.457 [ms] (mean)
Time per request:       2804.846 [ms] (mean, across all concurrent requests)
Transfer rate:          31.53 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing: 19317 28014 7899.7  29231   54645
Waiting:    19227 27937 7898.0  29200   54489
Total:      19317 28014 7899.7  29231   54645

Percentage of the requests served within a certain time (ms)
  50%  29231
  66%  30281
  75%  30370
  80%  32265
  90%  40491
  95%  46588
  98%  50539
  99%  54645
 100%  54645 (longest request)

Thumbor with ThreadPool patch:

Concurrency Level:      10
Time taken for tests:   34.929 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      9056800 bytes
HTML transferred:       9031300 bytes
Requests per second:    2.86 [#/sec] (mean)
Time per request:       3492.871 [ms] (mean)
Time per request:       349.287 [ms] (mean, across all concurrent requests)
Transfer rate:          253.22 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       0
Processing:  1139 3237 1348.8   3207    5222
Waiting:     1139 3229 1349.2   3172    5187
Total:       1139 3237 1348.8   3207    5222

Percentage of the requests served within a certain time (ms)
  50%   3207
  66%   4147
  75%   4173
  80%   4252
  90%   5149
  95%   5159
  98%   5191
  99%   5222
 100%   5222 (longest request)

I'll try to get the callback stuff in there.

Right now the file_storage and redis_storage will likely not benefit from tornado's async callback.

RobertBiehl · 2015-03-27T17:28:06Z

@masom

S3 storage will benefit from it I bet

w.r.t threadpool:
I tried ENGINE_THREADPOOL_SIZE = 5 but doesn't work well yet. I also have in total 4 thumbor instances running (for each core), and they are pausing execution for tens of seconds after a couple of requests with an idling CPU.

masom · 2015-03-27T17:29:37Z

Yeah looking at the aws storage the async callback should give it a big benefit.

I'll open a PR, would you mind trying that PR with that thread pool size?

#454

RobertBiehl · 2015-03-27T17:37:21Z

@masom Thanks I will try it out and let you know how it affected performance.

RobertBiehl · 2015-03-27T18:15:51Z

@masom before I can give you feedback I need to solve an issue I have with https://github.com/willtrking/thumbor_aws. After a while hundreds of tcp connections are stuck in CLOSE_WAIT state. Currently the throughput pauses for a long time before some requests can get through again. I'll let you know once I can get back to really try out the threadpool patch.

RobertBiehl · 2015-03-27T19:25:13Z

@masom Ok the problem disappeared temporarily for me to be able to do some benchmarks.

With your #454 I now get 100% CPU across the board with 5 thumbor instances and a ENGINE_THREADPOOL_SIZE of 5. Concurrency is 20 and tested over about 1500 unique requests.
(it is running on a c3.large EC2 AWS instance by the way)

Test 1 (first time image requests; loading from s3, generating new thumbnails and storing results in s3)

Requests per second: 22.6 [#/sec] (mean)

Test 2 (cached image requests; loading from s3 result storage)

Requests per second: 113.2 [#/sec] (mean)

masom · 2015-03-27T19:40:12Z

@RobertBiehl Are these numbers better than without the thread pool?

The ThreadPool should give a huge boost in performance for many i/o blocking tasks. Ideally at some point there should be dedicated thread pools per layer ( loader, storage, result_storage, engine ) to allow fine grained control.

RobertBiehl · 2015-03-27T22:34:58Z

Yes it is definitly faster now. But I think the threadpool should only be used in situations where there are usually no non-blocking options (e.g. for image processing tasks). I thought using Tornado's runloop is the best practice for most I/O situations. So I'd be all for new async storage interfaces for the next major version. :)

masom · 2015-03-27T23:41:11Z

Yes althought right now the storages coming with thumbor are blocking the
IO loop.
On Mar 27, 2015 6:35 PM, "RobertBiehl" notifications@github.com wrote:

Yes it is definitly faster now. But I think the threadpool should only be
used in situations where there are usually no non-blocking options (e.g.
for image processing tasks). I thought using Tornado's runloop is the best
practice for most I/O situations. So I'd be all for new async storage
interfaces for the next major version. :)

—
Reply to this email directly or view it on GitHub
#452 (comment).

dhardy92 · 2015-03-28T20:39:09Z

+1 for callbacks
By the way we are here working on moving to S3 storage too. Glad this plugin thumbor_aws is geting some love :)

This was referenced Mar 30, 2015

Callback-based storage. #457

Closed

Storage use tornado.concurrent.Future #459

Merged

guilhermef closed this as completed in #459 Apr 2, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BaseStorage interface is missing callback parameter to optimize for Tornado Runloop #452

BaseStorage interface is missing callback parameter to optimize for Tornado Runloop #452

RobertBiehl commented Mar 27, 2015

ndc-empora commented Mar 27, 2015

vvh-empora commented Mar 27, 2015

navybk commented Mar 27, 2015

masom commented Mar 27, 2015

RobertBiehl commented Mar 27, 2015

masom commented Mar 27, 2015

masom commented Mar 27, 2015

RobertBiehl commented Mar 27, 2015

masom commented Mar 27, 2015

RobertBiehl commented Mar 27, 2015

RobertBiehl commented Mar 27, 2015

guilhermef commented Mar 27, 2015

RobertBiehl commented Mar 27, 2015

masom commented Mar 27, 2015

RobertBiehl commented Mar 27, 2015

masom commented Mar 27, 2015

RobertBiehl commented Mar 27, 2015

RobertBiehl commented Mar 27, 2015

RobertBiehl commented Mar 27, 2015

masom commented Mar 27, 2015

RobertBiehl commented Mar 27, 2015

masom commented Mar 27, 2015

dhardy92 commented Mar 28, 2015

BaseStorage interface is missing callback parameter to optimize for Tornado Runloop #452

BaseStorage interface is missing callback parameter to optimize for Tornado Runloop #452

Comments

RobertBiehl commented Mar 27, 2015

ndc-empora commented Mar 27, 2015

vvh-empora commented Mar 27, 2015

navybk commented Mar 27, 2015

masom commented Mar 27, 2015

RobertBiehl commented Mar 27, 2015

masom commented Mar 27, 2015

masom commented Mar 27, 2015

RobertBiehl commented Mar 27, 2015

masom commented Mar 27, 2015

RobertBiehl commented Mar 27, 2015

RobertBiehl commented Mar 27, 2015

guilhermef commented Mar 27, 2015

RobertBiehl commented Mar 27, 2015

masom commented Mar 27, 2015

RobertBiehl commented Mar 27, 2015

masom commented Mar 27, 2015

RobertBiehl commented Mar 27, 2015

RobertBiehl commented Mar 27, 2015

RobertBiehl commented Mar 27, 2015

masom commented Mar 27, 2015

RobertBiehl commented Mar 27, 2015

masom commented Mar 27, 2015

dhardy92 commented Mar 28, 2015