Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No Retry in HTTPResponse.read() method. #542

Open
dstufft opened this issue Feb 3, 2015 · 11 comments
Open

No Retry in HTTPResponse.read() method. #542

dstufft opened this issue Feb 3, 2015 · 11 comments

Comments

@dstufft
Copy link
Contributor

dstufft commented Feb 3, 2015

It appears that the retry support in urllib3 doesn't cover the HTTPResponse.read() method. This means that if you pass preload_content=False to urlopen() as requests does, that any errors that occur while read()ing the body will not automatically be retried. Doing this transparently would be difficult to do since urllib3 doesn't know what the calling library has done with the parts of the body it's already received.

What would be nice though is to provide the Retry() object on the response object and provide instructions for how the caller of the library can use it to implement retries themselves. Perhaps this could look something like:

response = pool.urlopen(...)
while not response.closed and not response.retries.is_exhausted():
    try:
        data = []
        data.append(response.read(4096))
    except response.retries.exceptions as exc:
        # This would raise an error if there are no more retries
        response = response.retry()

Thoughts?

@shazow
Copy link
Member

shazow commented Feb 3, 2015

I think you're right, we do need something like this. Especially because we ultimately want to make preload_content=False the default (see #436).

I agree it makes sense to attach the retry object to the response, but keep in mind that it's immutable, so you'd need to replace the response on retries somehow. I'm not sure the API you proposed would work in this scenario.

Ideally, the response should know to retry based on the request's retry configuration (which would be easier if the request's retry object is attached to the response). I'm not 100% sure that would be feasible for a response to trigger a request implicitly with the way the direction of responsibility is laid out in the library, but it's worth exploring first.

@dstufft
Copy link
Contributor Author

dstufft commented Feb 3, 2015

Wouldn't my last line do that? The response.retry() just calls the poolmanager.urlopen() again with the same arguments as before and returns the response, so you have to set response = response.retry() on it. Then the code loops back around and since the response object is a new response it'll reinitialize the data = [] variable and read off the response again. The response.retry() would raise an exception if there are no more retries so you'd still get your MaxRetryError.

@piotr-dobrogost
Copy link

You are setting data = [] after every read. Is it really what you want?

@dstufft
Copy link
Contributor Author

dstufft commented Feb 3, 2015

My example is a little wrong, it'd like be read() not read(4096). If there isn't a retry the loop would only be executed once. The loop is how the block of code gets re-executed if it needs to retry.

@shazow
Copy link
Member

shazow commented Feb 3, 2015

@dstufft Ah I see, yea that might work. Ideally it would be implicit, based on the response's retries object rather than necessarily explicit?

@dstufft
Copy link
Contributor Author

dstufft commented Feb 3, 2015

I'm not sure I follow, maybe it would help if I annotated this?

# Make our original request, any connection errors that occur during this will
# be handled by the built in urllib3 retries.
response = pool.urlopen(...)

# Now we want to read the data from the response, we use a loop here so that
# this block of code can be repeated until we either run out of retries or
# we successfully read the entire response. It is important that end users
# ensure to have read the *entire* body of the response in this loop or they
# otherwise ensure that the response has been closed within the body of this
# loop.
while not response.closed:
    # *Everything* that works with data from the response must be within this
    # look, this includes initializing any buffers, etc. If any of this is done
    # outside of the loop then we will leak state between retries.
    data = []

    # We need a try: except block to catch any of the errors that we want to
    # retry on that might occur while reading the response.
    try:
        # Actually read our data from the response
        read_data = response.read(4096)
        while read_data:
            data.append(read_data)
            read_data.response.read(4096)
    # We catch an object here which lists all of the normal retry exceptions,
    # this will ensure that end users do not need to maintain a list of
    # different exceptions themselves.
    except response.retries.exceptions as exc:
        # We've encountered an error that we can retry on, so we'll call the
        # response.retry() method which will implicit make a new connection,
        # get a new response object and return that response object. The retry
        # method will also handle checking if we have an retries left and if
        # not it will raise a MaxRetryError
        response = response.retry(exc)

    # If we got here, then it means we've successfully read all of the data off
    # the response and we do not want to repeat this anymore. urllib3 will
    # close the response when it detects it's read the last bit of data so
    # there is no need for anything else to happen here, the loop condition
    # will evaluate to False and we'll exit this loop and continue on with the
    # execution of the program.

# Now we've handled the response and gotten all of our data including handling
# retries. So do something with our data.
with open(...) as fp:
    for chunk in data:
        fp.write(chunk)

@dstufft
Copy link
Contributor Author

dstufft commented Feb 3, 2015

Note: I dropped the check if the retries are exhausted from the loop conditional. We don't want to skip the loop if we've used up all of our retries trying to connect to the server but on our very last attempt we successfully connected.

@shazow
Copy link
Member

shazow commented Feb 3, 2015

+1, I think this is a valuable recipe that we should probably add to the docs once it is supported.

@dstufft
Copy link
Contributor Author

dstufft commented Feb 3, 2015

As an additional thing, we can provide a helper function that will wrap up some of this but it will force people to write a function that does the actual reading. The urllib3 provided function would look something like:

def read_with_retries(response, reader):
    while not response.closed:
        try:
            return reader(response)
        except response.retries.exceptions as exc:
            response = response.retry(exc)

You can see that read_with_retries is basically the thing I wrote above, except wrapped up in a function and it takes a function which basically be the application specific part of my recipe from above.

The end user would then do something like this:

def reader(response):
    data = []
    read_data = response.read(4096)
    while read_data:
        data.append(read_data)
        read_data = response.read(4096)
    return data

response = pool.urlopen(...)
data = read_with_retries(response, reader)

with open(...) as fp:
    for chunk in data:
        fp.write(chunk)

@gaborbernat
Copy link

any progress on this?

@Lukasa
Copy link
Sponsor Contributor

Lukasa commented Nov 14, 2016

Nope

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants