Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize HTTPResponse.read(X) behavior regardless of compression #2798

Merged
merged 20 commits into from
Nov 14, 2022

Conversation

pquentin
Copy link
Member

@pquentin pquentin commented Nov 11, 2022

Using the findings from #2787, I reimplemented #2712 to only do one copy at worst, using 4GiB of memory for a 2GiB read. This is much better than #2712 which was doing three copies at worst, and two copies when using memory view.

More importantly, this is equivalent to the current solution where we also have one copy at worst when calling self._decode()! So the blockers from #2718 are gone.

Closes #709 (given the decision made in #2769), Closes #2712, Closes #2128

Open questions:

Co-authored-by: Franek Magiera <framagie@gmail.com>

When asking for amt bytes, we'll always return that amount unless there
is not enough data left. When no data is left, we return b'' to signal
EOF. This is the only situation where we can return an empty number of
bytes.
The buffer was "losing" in test_requesting_large_resources_via_ssl, but
only because there was no compression.
sethmlarson
sethmlarson previously approved these changes Nov 11, 2022
Copy link
Member

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, only had a few nits and two question then we can ship! :shipit:

while fetched < n:
remaining = n - fetched
chunk = self.buffer.popleft()
if remaining < len(chunk):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len(chunk) is calculated 3 times, we can cache this value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually don't add micro optimizations without benchmarking, but this time I was lazy: b24d1da (#2798)

left_chunk, right_chunk = chunk[:remaining], chunk[remaining:]
ret.write(left_chunk)
self.buffer.appendleft(right_chunk)
self._size -= len(left_chunk)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading the logic right, len(left_chunk) will always be "remaining", right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in b24d1da (#2798)

src/urllib3/response.py Show resolved Hide resolved

def get(self, n: int) -> bytes:
if not self.buffer:
raise ValueError("buffer is empty")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should this be a RuntimeError instead of ValueError?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 0dc26a3 (#2798).


data = self._decode(data, decode_content, flush_decoder)
if not data and len(self._decoded_buffer) == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the locations you're referencing for #2799 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. More precisely, #2799 is an issue since urllib3 1.11, and this is a new place that is wrong and will need to be updated if we decide to fix #2799.

test/test_response.py Show resolved Hide resolved
@pquentin
Copy link
Member Author

My implementation of #2800 breaks the requests test suite, not sure why yet.

@sethmlarson
Copy link
Member

@pquentin I wonder if we need to change the call to read() with decode_content=False inside of the "drain" method on HTTPResponse? I know Requests allows for raw data access.

sethmlarson
sethmlarson previously approved these changes Nov 13, 2022
Copy link
Member

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks excellent, let's merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants