New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use BytesIO to make interface of read-only data file-like #1110

Closed
kmike opened this Issue Jul 15, 2014 · 6 comments

Comments

Projects
None yet
3 participants
@kmike
Contributor

kmike commented Jul 15, 2014

Hi,

When doing #1109 and checking scrapy/scrapy#803 I've run a couple of benchmarks:

The problem is that BytesIO copies the data, so using it just to make interface file-like for readonly data could be quite wasteful.

I believe this is a problem in curl_httpclient, simple_httpclient and wsgi tornado modules, and by using another wrapper (or at least by falling back to cStringIO in Python 2.x) they can be made faster and more memory efficient.

@bdarnell

This comment has been minimized.

Show comment
Hide comment
@bdarnell

bdarnell Jul 16, 2014

Member

It's unfortunate that BytesIO is so slow. We have designed parts of the API around the untested assumption that StringIO or BytesIO can be faster because they don't have to allocate a contiguous block of memory until getvalue(). If that's not true and it is in fact slower, we should probably make HTTPResponse.body a regular variable and make HTTPResponse.buffer the lazy property.

Member

bdarnell commented Jul 16, 2014

It's unfortunate that BytesIO is so slow. We have designed parts of the API around the untested assumption that StringIO or BytesIO can be faster because they don't have to allocate a contiguous block of memory until getvalue(). If that's not true and it is in fact slower, we should probably make HTTPResponse.body a regular variable and make HTTPResponse.buffer the lazy property.

@bdarnell bdarnell added the multiple label Jul 16, 2014

@kmike

This comment has been minimized.

Show comment
Hide comment
@kmike

kmike Jul 16, 2014

Contributor

BytesIO uses PyMem_Realloc / memset / memcpy to copy constructor argument contents to the internal buffer; it doesn't store a reference to original data. In pure-Python version the code is

def __init__(self, initial_bytes=None):
    buf = bytearray()
    if initial_bytes is not None:
        buf += initial_bytes
    self._buffer = buf
    self._pos = 0
Contributor

kmike commented Jul 16, 2014

BytesIO uses PyMem_Realloc / memset / memcpy to copy constructor argument contents to the internal buffer; it doesn't store a reference to original data. In pure-Python version the code is

def __init__(self, initial_bytes=None):
    buf = bytearray()
    if initial_bytes is not None:
        buf += initial_bytes
    self._buffer = buf
    self._pos = 0
@kmike

This comment has been minimized.

Show comment
Hide comment
@kmike
Contributor

kmike commented Jul 22, 2014

@greedo

This comment has been minimized.

Show comment
Hide comment
@greedo

greedo Sep 8, 2014

It looks like this issue has been resolved
http://hg.python.org/cpython/rev/79a5fbe2c78f

io.BytesIO() now defers making a copy until it is mutated

greedo commented Sep 8, 2014

It looks like this issue has been resolved
http://hg.python.org/cpython/rev/79a5fbe2c78f

io.BytesIO() now defers making a copy until it is mutated

@kmike

This comment has been minimized.

Show comment
Hide comment
@kmike

kmike Sep 8, 2014

Contributor

Yes, kudos to David Wilson!

It is likely the fix will be in Python 3.5 (~ November 2015?), but until it is released the issue affects all Python 3.x versions and all Python 2.x versions. It seems the fix won't be in 3.4.x.

Contributor

kmike commented Sep 8, 2014

Yes, kudos to David Wilson!

It is likely the fix will be in Python 3.5 (~ November 2015?), but until it is released the issue affects all Python 3.x versions and all Python 2.x versions. It seems the fix won't be in 3.4.x.

@bdarnell

This comment has been minimized.

Show comment
Hide comment
@bdarnell

bdarnell Jul 14, 2018

Member

Support for python 3.4 and older versions has been dropped, so this issue is moot.

Member

bdarnell commented Jul 14, 2018

Support for python 3.4 and older versions has been dropped, so this issue is moot.

@bdarnell bdarnell closed this Jul 14, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment