No Retry in HTTPResponse.read() method. #542

dstufft · 2015-02-03T00:00:21Z

It appears that the retry support in urllib3 doesn't cover the HTTPResponse.read() method. This means that if you pass preload_content=False to urlopen() as requests does, that any errors that occur while read()ing the body will not automatically be retried. Doing this transparently would be difficult to do since urllib3 doesn't know what the calling library has done with the parts of the body it's already received.

What would be nice though is to provide the Retry() object on the response object and provide instructions for how the caller of the library can use it to implement retries themselves. Perhaps this could look something like:

response = pool.urlopen(...)
while not response.closed and not response.retries.is_exhausted():
    try:
        data = []
        data.append(response.read(4096))
    except response.retries.exceptions as exc:
        # This would raise an error if there are no more retries
        response = response.retry()

Thoughts?

The text was updated successfully, but these errors were encountered:

shazow · 2015-02-03T02:05:52Z

I think you're right, we do need something like this. Especially because we ultimately want to make preload_content=False the default (see #436).

I agree it makes sense to attach the retry object to the response, but keep in mind that it's immutable, so you'd need to replace the response on retries somehow. I'm not sure the API you proposed would work in this scenario.

Ideally, the response should know to retry based on the request's retry configuration (which would be easier if the request's retry object is attached to the response). I'm not 100% sure that would be feasible for a response to trigger a request implicitly with the way the direction of responsibility is laid out in the library, but it's worth exploring first.

dstufft · 2015-02-03T12:22:59Z

Wouldn't my last line do that? The response.retry() just calls the poolmanager.urlopen() again with the same arguments as before and returns the response, so you have to set response = response.retry() on it. Then the code loops back around and since the response object is a new response it'll reinitialize the data = [] variable and read off the response again. The response.retry() would raise an exception if there are no more retries so you'd still get your MaxRetryError.

piotr-dobrogost · 2015-02-03T17:19:34Z

You are setting data = [] after every read. Is it really what you want?

dstufft · 2015-02-03T17:40:00Z

My example is a little wrong, it'd like be read() not read(4096). If there isn't a retry the loop would only be executed once. The loop is how the block of code gets re-executed if it needs to retry.

shazow · 2015-02-03T20:00:19Z

@dstufft Ah I see, yea that might work. Ideally it would be implicit, based on the response's retries object rather than necessarily explicit?

dstufft · 2015-02-03T20:15:35Z

I'm not sure I follow, maybe it would help if I annotated this?

# Make our original request, any connection errors that occur during this will
# be handled by the built in urllib3 retries.
response = pool.urlopen(...)

# Now we want to read the data from the response, we use a loop here so that
# this block of code can be repeated until we either run out of retries or
# we successfully read the entire response. It is important that end users
# ensure to have read the *entire* body of the response in this loop or they
# otherwise ensure that the response has been closed within the body of this
# loop.
while not response.closed:
    # *Everything* that works with data from the response must be within this
    # look, this includes initializing any buffers, etc. If any of this is done
    # outside of the loop then we will leak state between retries.
    data = []

    # We need a try: except block to catch any of the errors that we want to
    # retry on that might occur while reading the response.
    try:
        # Actually read our data from the response
        read_data = response.read(4096)
        while read_data:
            data.append(read_data)
            read_data.response.read(4096)
    # We catch an object here which lists all of the normal retry exceptions,
    # this will ensure that end users do not need to maintain a list of
    # different exceptions themselves.
    except response.retries.exceptions as exc:
        # We've encountered an error that we can retry on, so we'll call the
        # response.retry() method which will implicit make a new connection,
        # get a new response object and return that response object. The retry
        # method will also handle checking if we have an retries left and if
        # not it will raise a MaxRetryError
        response = response.retry(exc)

    # If we got here, then it means we've successfully read all of the data off
    # the response and we do not want to repeat this anymore. urllib3 will
    # close the response when it detects it's read the last bit of data so
    # there is no need for anything else to happen here, the loop condition
    # will evaluate to False and we'll exit this loop and continue on with the
    # execution of the program.

# Now we've handled the response and gotten all of our data including handling
# retries. So do something with our data.
with open(...) as fp:
    for chunk in data:
        fp.write(chunk)

dstufft · 2015-02-03T20:16:35Z

Note: I dropped the check if the retries are exhausted from the loop conditional. We don't want to skip the loop if we've used up all of our retries trying to connect to the server but on our very last attempt we successfully connected.

shazow · 2015-02-03T20:27:21Z

+1, I think this is a valuable recipe that we should probably add to the docs once it is supported.

dstufft · 2015-02-03T20:30:44Z

As an additional thing, we can provide a helper function that will wrap up some of this but it will force people to write a function that does the actual reading. The urllib3 provided function would look something like:

def read_with_retries(response, reader):
    while not response.closed:
        try:
            return reader(response)
        except response.retries.exceptions as exc:
            response = response.retry(exc)

You can see that read_with_retries is basically the thing I wrote above, except wrapped up in a function and it takes a function which basically be the application specific part of my recipe from above.

The end user would then do something like this:

def reader(response):
    data = []
    read_data = response.read(4096)
    while read_data:
        data.append(read_data)
        read_data = response.read(4096)
    return data

response = pool.urlopen(...)
data = read_with_retries(response, reader)

with open(...) as fp:
    for chunk in data:
        fp.write(chunk)

gaborbernat · 2016-11-14T17:39:55Z

any progress on this?

Lukasa · 2016-11-14T17:40:26Z

Nope

dstufft mentioned this issue Oct 5, 2015

Retry downloads that fail on ECONNRESET pypa/pip#2426

Open

Augustin-FL mentioned this issue Jul 12, 2023

Retry doesn't retry upon ConnectionResetError #2751

Open

zweger mentioned this issue Aug 21, 2023

interrupted download reports as hash failure pypa/pip#11153

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No Retry in HTTPResponse.read() method. #542

No Retry in HTTPResponse.read() method. #542

dstufft commented Feb 3, 2015 •

edited by shazow

shazow commented Feb 3, 2015

dstufft commented Feb 3, 2015

piotr-dobrogost commented Feb 3, 2015

dstufft commented Feb 3, 2015

shazow commented Feb 3, 2015

dstufft commented Feb 3, 2015

dstufft commented Feb 3, 2015

shazow commented Feb 3, 2015

dstufft commented Feb 3, 2015

gaborbernat commented Nov 14, 2016

Lukasa commented Nov 14, 2016

No Retry in HTTPResponse.read() method. #542

No Retry in HTTPResponse.read() method. #542

Comments

dstufft commented Feb 3, 2015 • edited by shazow

shazow commented Feb 3, 2015

dstufft commented Feb 3, 2015

piotr-dobrogost commented Feb 3, 2015

dstufft commented Feb 3, 2015

shazow commented Feb 3, 2015

dstufft commented Feb 3, 2015

dstufft commented Feb 3, 2015

shazow commented Feb 3, 2015

dstufft commented Feb 3, 2015

gaborbernat commented Nov 14, 2016

Lukasa commented Nov 14, 2016

dstufft commented Feb 3, 2015 •

edited by shazow