Standardize HTTPResponse.read(X) behavior regardless of compression #2798
Standardize HTTPResponse.read(X) behavior regardless of compression #2798sethmlarson merged 20 commits intourllib3:mainfrom
Conversation
Co-authored-by: Franek Magiera <framagie@gmail.com>
077d3b3 to
34ff983
Compare
ee8fe37 to
9e83b03
Compare
9e83b03 to
75735fe
Compare
Co-authored-by: Franek Magiera <framagie@gmail.com> When asking for amt bytes, we'll always return that amount unless there is not enough data left. When no data is left, we return b'' to signal EOF. This is the only situation where we can return an empty number of bytes.
75735fe to
a7933eb
Compare
4f9d876 to
9674b7a
Compare
The buffer was "losing" in test_requesting_large_resources_via_ssl, but only because there was no compression.
e272839 to
9f2e2a9
Compare
src/urllib3/response.py
Outdated
| while fetched < n: | ||
| remaining = n - fetched | ||
| chunk = self.buffer.popleft() | ||
| if remaining < len(chunk): |
There was a problem hiding this comment.
len(chunk) is calculated 3 times, we can cache this value.
There was a problem hiding this comment.
I usually don't add micro optimizations without benchmarking, but this time I was lazy: b24d1da (#2798)
src/urllib3/response.py
Outdated
| left_chunk, right_chunk = chunk[:remaining], chunk[remaining:] | ||
| ret.write(left_chunk) | ||
| self.buffer.appendleft(right_chunk) | ||
| self._size -= len(left_chunk) |
There was a problem hiding this comment.
If I'm reading the logic right, len(left_chunk) will always be "remaining", right?
src/urllib3/response.py
Outdated
|
|
||
| def get(self, n: int) -> bytes: | ||
| if not self.buffer: | ||
| raise ValueError("buffer is empty") |
There was a problem hiding this comment.
nit: Should this be a RuntimeError instead of ValueError?
| flush_decoder = True | ||
|
|
||
| data = self._decode(data, decode_content, flush_decoder) | ||
| if not data and len(self._decoded_buffer) == 0: |
There was a problem hiding this comment.
This is one of the locations you're referencing for #2799 ?
9f2e2a9 to
c9cf398
Compare
|
My implementation of #2800 breaks the requests test suite, not sure why yet. |
|
@pquentin I wonder if we need to change the call to read() with decode_content=False inside of the "drain" method on HTTPResponse? I know Requests allows for raw data access. |
079811f to
b518302
Compare
sethmlarson
left a comment
There was a problem hiding this comment.
This looks excellent, let's merge!
The deadsnakes PPA no longer supports 3.11-dev it seems.
Using the findings from #2787, I reimplemented #2712 to only do one copy at worst, using 4GiB of memory for a 2GiB read. This is much better than #2712 which was doing three copies at worst, and two copies when using memory view.
More importantly, this is equivalent to the current solution where we also have one copy at worst when calling
self._decode()! So the blockers from #2718 are gone.Closes #709 (given the decision made in #2769), Closes #2712, Closes #2128
Open questions:
read(decode_content=True)followed byread(decode_content=False)? #2800