Skip to content

Standardize HTTPResponse.read(X) behavior regardless of compression #2798

Merged
sethmlarson merged 20 commits intourllib3:mainfrom
pquentin:extract-raw-read
Nov 14, 2022
Merged

Standardize HTTPResponse.read(X) behavior regardless of compression #2798
sethmlarson merged 20 commits intourllib3:mainfrom
pquentin:extract-raw-read

Conversation

@pquentin
Copy link
Member

@pquentin pquentin commented Nov 11, 2022

Using the findings from #2787, I reimplemented #2712 to only do one copy at worst, using 4GiB of memory for a 2GiB read. This is much better than #2712 which was doing three copies at worst, and two copies when using memory view.

More importantly, this is equivalent to the current solution where we also have one copy at worst when calling self._decode()! So the blockers from #2718 are gone.

Closes #709 (given the decision made in #2769), Closes #2712, Closes #2128

Open questions:

Co-authored-by: Franek Magiera <framagie@gmail.com>

When asking for amt bytes, we'll always return that amount unless there
is not enough data left. When no data is left, we return b'' to signal
EOF. This is the only situation where we can return an empty number of
bytes.
The buffer was "losing" in test_requesting_large_resources_via_ssl, but
only because there was no compression.
sethmlarson
sethmlarson previously approved these changes Nov 11, 2022
Copy link
Member

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, only had a few nits and two question then we can ship! :shipit:

while fetched < n:
remaining = n - fetched
chunk = self.buffer.popleft()
if remaining < len(chunk):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len(chunk) is calculated 3 times, we can cache this value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually don't add micro optimizations without benchmarking, but this time I was lazy: b24d1da (#2798)

left_chunk, right_chunk = chunk[:remaining], chunk[remaining:]
ret.write(left_chunk)
self.buffer.appendleft(right_chunk)
self._size -= len(left_chunk)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading the logic right, len(left_chunk) will always be "remaining", right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in b24d1da (#2798)


def get(self, n: int) -> bytes:
if not self.buffer:
raise ValueError("buffer is empty")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should this be a RuntimeError instead of ValueError?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 0dc26a3 (#2798).

flush_decoder = True

data = self._decode(data, decode_content, flush_decoder)
if not data and len(self._decoded_buffer) == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the locations you're referencing for #2799 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. More precisely, #2799 is an issue since urllib3 1.11, and this is a new place that is wrong and will need to be updated if we decide to fix #2799.

@pquentin
Copy link
Member Author

My implementation of #2800 breaks the requests test suite, not sure why yet.

@sethmlarson
Copy link
Member

@pquentin I wonder if we need to change the call to read() with decode_content=False inside of the "drain" method on HTTPResponse? I know Requests allows for raw data access.

sethmlarson
sethmlarson previously approved these changes Nov 13, 2022
Copy link
Member

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks excellent, let's merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Standardize HTTPResponse.read(X) behavior regardless of compression .read should only return empty (false) when there is nothing left to read (EOS)

2 participants