GZip decoding regression #206

maxcountryman · 2013-07-19T16:20:53Z

See this issue on Requests: https://github.com/kennethreitz/requests/issues/1472

This is the truncated traceback:

...
/project/virtualenv/lib/python2.6/requests/packages/urllib3/response.py", line 188, in read
    "failed to decode it." % content_encoding)
DecodeError: Received response with content-encoding: gzip, but failed to decode it.

cc @schlamar

The text was updated successfully, but these errors were encountered:

shazow · 2013-07-19T17:48:34Z

@schlamar Looks like this is related to d00d305, could you look into this please?

@maxcountryman If you'd like to participate, a urllib3 test showing this failure (and making sure it doesn't regress in the future) would be helpful. :)

maxcountryman · 2013-07-19T17:53:45Z

@shazow Photobucket's API will yield the error. I'm assuming any GZipped response. Perhaps httpbin.org could assist with this?

shazow · 2013-07-19T17:55:26Z

@maxcountryman For urllib3, we write tests that can be run without an internet connection. Take a look at some of the tests here for examples: https://github.com/shazow/urllib3/tree/master/test/with_dummyserver

We already have several GZip-related tests which are passing.

maxcountryman · 2013-07-19T17:57:31Z

@shazow probably don't have time for that, sorry.

shazow · 2013-07-19T17:58:46Z

Nps. If anyone else wants to do this, help is welcome. :)

schlamar · 2013-07-21T11:52:42Z

@maxcountryman gzip on httpbin works fine for me with urllib3 latest master:

>>> import urllib3
>>> http = urllib3.PoolManager()
>>> r = http.request('GET', 'http://httpbin.org/gzip')
>>> r.status
200
>>> r.data
'{\n  "headers": {\n    "Host": "httpbin.org",\n    "Accept-Encoding": "identity",\n    "Connection": "close"\n  },\n  "origin": "217.238.83.99",\n  "gzipped": true,\n  "method": "GET"\n}'

And requests 1.2.3:

>>> requests.get('http://httpbin.org/gzip')
<Response [200]>

Can you give me an exact url which is failing? Preferable without requirement to register for an API key.

maxcountryman · 2013-07-21T13:45:13Z

Nope: you'll need to register for Photobucket and set up an OAuth application in order to access the API. This is a regression from requests 1.2.0 as previously stated.

maxcountryman · 2013-07-21T15:00:57Z

Here's the response from curl, which is the best I can do for you. This exact URL yields the above error. To reproduce you may need to setup an application with Photobucket or model the response.

$ curl -v -L "http://api.photobucket.com/album/zentropa?<snip>"
* About to connect() to api.photobucket.com port 80 (#0)
*   Trying 209.17.80.224...
* connected
* Connected to api.photobucket.com (209.17.80.224) port 80 (#0)
> GET /album/zentropa?<snip> HTTP/1.1
> User-Agent: curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 OpenSSL/0.9.8x zlib/1.2.5
> Host: api.photobucket.com
> Accept: */*
> 
* HTTP 1.0, assume close after body
< HTTP/1.0 301 Redirect
< Date: Sun, 21 Jul 2013 14:55:04 GMT
< Server: Apache
< Set-Cookie: flash=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.photobucket.com
< Location: http://api1260.photobucket.com/album/zentropa?<snip>
< Content-Length: 507
< Connection: close
< Content-Type: application/json
< 
* Closing connection #0
* Issue another request to this URL: 'http://api1260.photobucket.com/album/zentropa?<snip>'
* About to connect() to api1260.photobucket.com port 80 (#0)
*   Trying 209.17.68.33...
* connected
* Connected to api1260.photobucket.com (209.17.68.33) port 80 (#0)
> GET /album/zentropa?<snip>
HTTP/1.0
> User-Agent: curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 OpenSSL/0.9.8x zlib/1.2.5
> Host: api1260.photobucket.com
> Accept: */*
> 
< HTTP/1.1 200 OK
< Date: Sun, 21 Jul 2013 14:55:04 GMT
< Server: Apache
< Set-Cookie: flash=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; domain=.photobucket.com
< Set-Cookie: dMess=%5B%5D; expires=Sun, 21-Jul-2013 16:55:04 GMT; path=/; domain=.photobucket.com
< Set-Cookie: persisted=YTowOnt9; path=/album
< Content-Length: 2422
< Connection: close
< Content-Type: application/json
< 
* Closing connection #0

The same URL raises the reported exception with Requests 1.2.3; this is a regression from Requests 1.2.0 which does not raise the error against the URL.

>>> r = requests.get('http://api.photobucket.com/album/zentropa?<snip>')
>>> r.status_code
200
>>> requests.__version__
'1.2.0'

>>> r = requests.get('http://api.photobucket.com/album/zentropa?<snip>')
Traceback (most recent call last):
...
requests.packages.urllib3.exceptions.DecodeError: Received response with content-encoding: gzip, but failed to decode it.
>>> requests.__version__
'1.2.3'

t-8ch · 2013-07-21T16:03:53Z

@maxcountryman Can you reproduce the issue on http://api.photobucket.com/ping?

Edit: removed stuff about proxy. Only the redirect is HTTP/1.0.

maxcountryman · 2013-07-21T16:09:19Z

There is no proxy: photobucket redirects to silos. It's on the same machine and I don't have time to test the ping endpoint although I doubt it would reproduce it if it doesn't redirect. Now that's about all the help I can provide. I might suggest rolling back the commit that broke this in the meantime if the problem isn't obvious. Bear in mind that the browser and curl handle this same endpoint without issue so this is clearly a problem for any consumer expecting urllib3 to behave in a sane way. HTH.

shazow · 2013-07-21T16:30:11Z

@t-8ch Could it be something to do with switching from using gzip to the zlib module for decoding both?

@maxcountryman I rather not start rolling back commits from previous releases until we fully understand the issue. If it seems like this is photobucket-specific and not a wide-spread issue, we could wait until someone else runs into the bug who has more time to investigate it.

It's important to remember that urllib3 grows thanks to community contributions; we don't have any corporate sponsors paying for developer time to satisfy specific consumers (though this can be arranged if your employers is interested in sponsoring some specific development). :)

t-8ch · 2013-07-21T17:07:45Z

@shazow I am not sure, but I don't think so.

@maxcountryman
Your curl example doens't use gzip and browsers implement a huge number of workarounds for every type of brokeness one can imagine.
If the data itself is not confidental could you edit requests/packages/urllib3/response.py:HTTPResponse.CONTENT_DECODERS to be the empty list and then post requests.get(url).content here or to some place where we can get it.

You could also remove the try-catch around the decoding, so that the underlying exception is rethrown and we see a proper message.
I will open a pull request to keep the message in the rethrown exception.

It would also be great, if you could use git bisect to identify the exact source of the regression. Try requests first and if it's in a commit which updates urllib3 continue there.

shazow · 2013-10-18T11:07:31Z

Looks like this issue stalled. Feel free to reopen if there are new developments. :)

shazow · 2014-02-13T03:50:59Z

Moving the conversation here from https://github.com/Runscope/requests-runscope/issues/6#issuecomment-34945852.

@maxcountryman wrote:

@shazow in particular, I think you should look at reversing this commit: d00d305

If I manually reverse those changes, it seems to correct the issue. i.e. this error: Error -3 while decompressing: incorrect header check

See also: http://stackoverflow.com/questions/3122145/zlib-error-error-3-while-decompressing-incorrect-header-check

I hope you will once again consider reversing the commit, as I had proposed previously. It would be greatly appreciated! :)

@maxcountryman It's not just a matter of "just revert this and everything is dandy."

There are other cascading changes that could be affected.
We could reintroduce a similar breaking change next week and unless there is a test, this exercise will be in vain.
There could be other bugs in the reversed version.

@schlamar do you recall why we stopped using the gzip module?

maxcountryman · 2014-02-13T04:19:55Z

Applied against master, this clearly illustrates the problem:

diff --git a/test/test_response.py b/test/test_response.py
index ecfcbee..2f95d40 100644
--- a/test/test_response.py
+++ b/test/test_response.py
@@ -8,20 +8,17 @@ from urllib3.exceptions import DecodeError

 from base64 import b64decode

-# A known random (i.e, not-too-compressible) payload generated with:
-#    "".join(random.choice(string.printable) for i in xrange(512))
-#    .encode("zlib").encode("base64")
-# Randomness in tests == bad, and fixing a seed may not be sufficient.
+# zlib-failing payload
 ZLIB_PAYLOAD = b64decode(b"""\
-eJwFweuaoQAAANDfineQhiKLUiaiCzvuTEmNNlJGiL5QhnGpZ99z8luQfe1AHoMioB+QSWHQu/L+
-lzd7W5CipqYmeVTBjdgSATdg4l4Z2zhikbuF+EKn69Q0DTpdmNJz8S33odfJoVEexw/l2SS9nFdi
-pis7KOwXzfSqarSo9uJYgbDGrs1VNnQpT9f8zAorhYCEZronZQF9DuDFfNK3Hecc+WHLnZLQptwk
-nufw8S9I43sEwxsT71BiqedHo0QeIrFE01F/4atVFXuJs2yxIOak3bvtXjUKAA6OKnQJ/nNvDGKZ
-Khe5TF36JbnKVjdcL1EUNpwrWVfQpFYJ/WWm2b74qNeSZeQv5/xBhRdOmKTJFYgO96PwrHBlsnLn
-a3l0LwJsloWpMbzByU5WLbRE6X5INFqjQOtIwYz5BAlhkn+kVqJvWM5vBlfrwP42ifonM5yF4ciJ
-auHVks62997mNGOsM7WXNG3P98dBHPo2NhbTvHleL0BI5dus2JY81MUOnK3SGWLH8HeWPa1t5KcW
-S5moAj5HexY/g/F8TctpxwsvyZp38dXeLDjSQvEQIkF7XR3YXbeZgKk3V34KGCPOAeeuQDIgyVhV
-nP4HF2uWHA==""")
+eyJzdGF0dXMiOiJFeGNlcHRpb24iLCJtZXNzYWdlIjoiUmVkaXJlY3QgcmVxdWlyZWQgdG8gYWNjZ
+XNzIHVzZXIgZGF0YSIsImNvZGUiOjMwMSwiY29udGVudCI6eyJzdWJkb21haW4iOiJodHRwOlwvXC
+9hcGkxMjYwLnBob3RvYnVja2V0LmNvbSIsInVybCI6Imh0dHA6XC9cL2FwaTEyNjAucGhvdG9idWN
+rZXQuY29tXC9hbGJ1bVwvemVudHJvcGE/b2F1dGhfbm9uY2U9MDM1ZTU1MWZmOTk1MDIyZTMwM2E5
+MGVmNWE5YWYxNDI5NDQwNzcyMSZmb3JtYXQ9anNvbiZvYXV0aF9jb25zdW1lcl9rZXk9MTQ5ODMyN
+DAwJm9hdXRoX3RpbWVzdGFtcD0xMzkyMjY0MjkwJm9hdXRoX3NpZ25hdHVyZV9tZXRob2Q9SE1BQy
+1TSEExJm9hdXRoX3ZlcnNpb249MS4wJm9hdXRoX3Rva2VuPTM0LjIyNDc1ODlfMTM5MjI2NDI4OSZ
+vYXV0aF9zaWduYXR1cmU9VGE2ZmRrUDljYlV6OWhLc05sczJJTko2NktFJTNEIn0sImZvcm1hdCI6
+Impzb24iLCJtZXRob2QiOiJHRVQiLCJ0aW1lc3RhbXAiOjEzOTIyNjQyOTB9""")


 class TestLegacyResponse(unittest.TestCase):

maxcountryman · 2014-02-13T04:42:12Z

Looking at that, it seems like it's trying to decode non-compressed data, which leads me to believe that previously urllib3 was not considering this request's data as compressed and in later versions (e.g. now) it does.

In fact, in Requests 1.2.0, this data exits the read method here: https://github.com/kennethreitz/requests/blob/d06908d655eca4467521c28d93a68d329d48d524/requests/packages/urllib3/response.py#L158

shazow · 2014-02-13T05:37:58Z

which leads me to believe that previously urllib3 was not considering this request's data as compressed

Can you expand on what you mean by that? urllib3 decides whether the data is compressed by simply looking at the 'content-encoding header: https://github.com/shazow/urllib3/blob/master/urllib3/response.py#L169

Also I get the same error by replacing the zlib payload with any kind of garbage (e.g. aaaaaaaaa).

maxcountryman · 2014-02-13T07:09:38Z

@shazow https://github.com/kennethreitz/requests/blob/d06908d655eca4467521c28d93a68d329d48d524/requests/packages/urllib3/response.py#L147-L158

schlamar · 2014-02-13T07:17:32Z

At that time requests did the decompression on its own. Can someone spot any difference?
https://github.com/kennethreitz/requests/blob/d06908d655eca4467521c28d93a68d329d48d524/requests/utils.py#L349:L378

maxcountryman · 2014-02-13T07:19:18Z

@schlamar yes, the difference is that urllib3 falls out of the read method in the section I've highlighted above: it never tries to decompress the request at all given the input with Requests 1.2.0.

schlamar · 2014-02-13T07:36:38Z

@maxcountryman Yes, that's obvious. However, requests at that time called the decompression method linked above. This method had a fallback to return the uncompressed data if decompression failed but this was flawed: https://github.com/kennethreitz/requests/issues/1249.

So I assume that photobucket does return non-gzipped data while sending a content-encoding=gzip header (which would be an error in their application/web server). You can try if response.raw.decode_content = False; response.raw.read() is working.

Maybe you could provide a raw HTTP response (including headers) from photobucket API, that would be really helpful I guess.

schlamar · 2014-02-13T07:57:27Z

Maybe you could provide a raw HTTP response (including headers) from photobucket API, that would be really helpful I guess.

I have just seen the curl example above. Interestingly, curl doesn't get gzip as content-encoding. Maybe there is some notable difference in the headers sent on the request which might explains this behavior.

schlamar · 2014-02-13T08:38:05Z

Curl does not send an Accept-Encoding header per default. What happens if you run curl with --compressed flag?

Alternatively, you can try to use a requests session and remove Accept-Encoding from its headers member (https://github.com/kennethreitz/requests/blob/c042c08179cae74e7b2c520b33ea19881c0ec842/requests/sessions.py#L213).

schlamar · 2014-03-10T07:27:06Z

@maxcountryman Please check out https://github.com/kennethreitz/requests/pull/1944, this should resolve this issue.

maxcountryman closed this as completed Jul 19, 2013

maxcountryman reopened this Jul 19, 2013

Lukasa mentioned this issue Jul 23, 2013

zlib Error -2 while decompressing in 1.2.3 psf/requests#1481

Closed

shazow closed this as completed Oct 18, 2013

schlamar mentioned this issue Mar 7, 2014

Why decode the response body of a redirect? psf/requests#1939

Closed

schlamar mentioned this issue Mar 10, 2014

Catch errors while handling redirects psf/requests#1944

Merged

Tanych mentioned this issue Feb 23, 2016

Warc Record Decode Error gwu-libraries/sfm-weibo-harvester#5

Closed

danimtb mentioned this issue Apr 3, 2019

Installing from conan-center and then uploading to a private Artifactory repository fails conan-io/conan#4864

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GZip decoding regression #206

GZip decoding regression #206

maxcountryman commented Jul 19, 2013

shazow commented Jul 19, 2013

maxcountryman commented Jul 19, 2013

shazow commented Jul 19, 2013

maxcountryman commented Jul 19, 2013

shazow commented Jul 19, 2013

schlamar commented Jul 21, 2013

maxcountryman commented Jul 21, 2013

maxcountryman commented Jul 21, 2013

t-8ch commented Jul 21, 2013

maxcountryman commented Jul 21, 2013

shazow commented Jul 21, 2013

t-8ch commented Jul 21, 2013

shazow commented Oct 18, 2013

shazow commented Feb 13, 2014

maxcountryman commented Feb 13, 2014

maxcountryman commented Feb 13, 2014

shazow commented Feb 13, 2014

maxcountryman commented Feb 13, 2014

schlamar commented Feb 13, 2014

maxcountryman commented Feb 13, 2014

schlamar commented Feb 13, 2014

schlamar commented Feb 13, 2014

schlamar commented Feb 13, 2014

schlamar commented Mar 10, 2014

GZip decoding regression #206

GZip decoding regression #206

Comments

maxcountryman commented Jul 19, 2013

shazow commented Jul 19, 2013

maxcountryman commented Jul 19, 2013

shazow commented Jul 19, 2013

maxcountryman commented Jul 19, 2013

shazow commented Jul 19, 2013

schlamar commented Jul 21, 2013

maxcountryman commented Jul 21, 2013

maxcountryman commented Jul 21, 2013

t-8ch commented Jul 21, 2013

maxcountryman commented Jul 21, 2013

shazow commented Jul 21, 2013

t-8ch commented Jul 21, 2013

shazow commented Oct 18, 2013

shazow commented Feb 13, 2014

maxcountryman commented Feb 13, 2014

maxcountryman commented Feb 13, 2014

shazow commented Feb 13, 2014

maxcountryman commented Feb 13, 2014

schlamar commented Feb 13, 2014

maxcountryman commented Feb 13, 2014

schlamar commented Feb 13, 2014

schlamar commented Feb 13, 2014

schlamar commented Feb 13, 2014

schlamar commented Mar 10, 2014