Patch: workaround for libcurl CURL_MAX_WRITE_SIZE #101

Closed
elephantum opened this Issue Jun 19, 2010 · 4 comments

2 participants

@elephantum

The problem is the following: even the libcurl's documentation says that there is no need to run .perform() again after the E_OK is returned, the truth is more subtle. E_OK is returned in two different cases: 1) there is really nothing to do, 2) CURL_MAX_WRITE_SIZE of data was passed to write_function. There is no way to check whether there is any data available right now. Even epoll'ing wouldn't help in some cases, when data is read into internal libcurl's buffers. This behavior significantly degrades performance of fetching http-resources which are bigger than 16Kb. It is especially well seen when some synchronous work should be done (50-100ms XSL transformations in my case).

The problem can be illustrated with this snippet of code: http://gist.github.com/444860/ run it before and after the patch to feel the difference in downloading speed.

I've developed a workaround for the given problem which I propose for merge into tornado: http://github.com/elephantum/tornado/commit/384ce35b9c4b5de9cb247ac4bd5c810ca632daa0

@elephantum

proofpic from hh.ru production: http://skitch.com/elephantum/de6j4/monik

@elephantum

the same problem exists in recently introduced AsyncHTTPClient2. patch: http://github.com/elephantum/tornado/commit/947d2d0124ed8a9512fc440435c0de2732874a81

@elephantum

some more data from hh.ru production:

distribution of number of multi.perform() calls at a time: http://skitch.com/elephantum/djh23/figure-1
distribution of duration of multi.perform() chain: http://skitch.com/elephantum/djh3f/figure-2

@bdarnell
tornadoweb member

Closing since I'm not sure if this is still an issue and the proposed patch has been deleted. If it's still a problem feel free to reopen.

@bdarnell bdarnell closed this Apr 28, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment