Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yield streamed bytes as soon as they arrive #2125

Closed
wants to merge 1 commit into from

Conversation

djrobstep
Copy link

@djrobstep djrobstep commented Dec 31, 2020

Right now, calling read with amt = None (or requests.iter_content(chunk_size=None)) seems to wait for the entire response contents to arrive before yielding, which contradicts the docs in both urllib3 and requests. This is discussed in #2123.

For instance, this change makes the following correctly yield bytes as they arrive, provided CHUNK_SIZE is set to -1:

import requests

URL = 'https://httpbin.org/drip?duration=2'

r = requests.get(URL, stream=True)

for x in r.iter_content(chunk_size=CHUNK_SIZE):
    print(f'response: {x}')

The requests docs say this should already be happening if CHUNK_SIZE is set to None, but it doesn't:

chunk_size must be of type int or None. A value of None will function differently depending on the value of stream. stream=True will read data as it arrives in whatever size the chunks are received. If stream=False, data is returned as a single chunk

And so do the docs for HTTPResponse's stream( that this value is passed through to:

How much of the content to read. The generator will return up to this much data per iteration, but may return less. This is particularly likely when using compressed data. However, the empty string will never be returned.

But in fact None forces the entire response to be read before yielding.

I was hesitant to make a change that would affect the existing behaviour of None or positive integers, so here you specify a chunk size of -1 to activate this "stream on any bytes" behaviour. Hopefully this is suitable.

Base automatically changed from master to main January 16, 2021 20:06
@sethmlarson
Copy link
Member

Sorry that this PR has stalled completely for over a year and now has conflicts. Unless there's someone on our team that wants to push this one over the finish line I may close it.

I'm not sure I love the .read(-1) being used to mean .read1(), shouldn't we instead implement HTTPResponse.read1()?

@pquentin
Copy link
Member

Agreed that read1() is the way to go here.

@sethmlarson
Copy link
Member

Going to close this PR in favor of one implementing read1().

smason added a commit to smason/urllib3 that referenced this pull request Nov 8, 2023
Method suggested in urllib3#2125 as a nicer way to get resp.stream(None)
working for non-chunked responses.

Passes a parameter down the _*_read chain so that their implemention can
be reused.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants