Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting exception in _update_chunk_length #1516

Closed
shivam05011996 opened this issue Jan 9, 2019 · 20 comments · Fixed by #1888
Closed

Getting exception in _update_chunk_length #1516

shivam05011996 opened this issue Jan 9, 2019 · 20 comments · Fixed by #1888

Comments

@shivam05011996
Copy link

Hi,
While requesting a particular URL, I came across this error

File "/root/shivam/python3_env/lib/python3.5/site-packages/urllib3/response.py", line 601, in _update_chunk_length
    self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/shivam/python3_env/lib/python3.5/site-packages/urllib3/response.py", line 360, in _error_catcher
    yield
  File "/root/shivam/python3_env/lib/python3.5/site-packages/urllib3/response.py", line 666, in read_chunked
    self._update_chunk_length()
  File "/root/shivam/python3_env/lib/python3.5/site-packages/urllib3/response.py", line 605, in _update_chunk_length
    raise httplib.IncompleteRead(line)
http.client.IncompleteRead: IncompleteRead(0 bytes read)

During handling of the above exception, another exception occurred:

  aceback (most recent call last):
  File "/root/shivam/python3_env/lib/python3.5/site-packages/requests/models.py", line 750, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "/root/shivam/python3_env/lib/python3.5/site-packages/urllib3/response.py", line 490, in stream
    for line in self.read_chunked(amt, decode_content=decode_content):
  File "/root/shivam/python3_env/lib/python3.5/site-packages/urllib3/response.py", line 694, in read_chunked
    self._original_response.close()
  File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/root/shivam/python3_env/lib/python3.5/site-packages/urllib3/response.py", line 378, in _error_catcher
    raise ProtocolError('Connection broken: %r' % e, e)
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "parse_dumped_data.py", line 87, in <module>
    Parse(f, entity)
  File "parse_dumped_data.py", line 17, in __init__
    self.parse_records()
  File "parse_dumped_data.py", line 32, in parse_records
    data_fields = self.get_data(record.get('data'))
  File "parse_dumped_data.py", line 50, in get_data
    data['image_url'] = self.get_image_url(data.get('image'), _id)
  File "parse_dumped_data.py", line 64, in get_image_url
    resp = requests.get(url)
  File "/root/shivam/python3_env/lib/python3.5/site-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "/root/shivam/python3_env/lib/python3.5/site-packages/requests/api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "/root/shivam/python3_env/lib/python3.5/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/root/shivam/python3_env/lib/python3.5/site-packages/requests/sessions.py", line 686, in send
    r.content
  File "/root/shivam/python3_env/lib/python3.5/site-packages/requests/models.py", line 828, in content
    self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
  File "/root/shivam/python3_env/lib/python3.5/site-packages/requests/models.py", line 753, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

This was reported in requests module here

I fixed it as -

def _update_chunk_length(self):
    # First, we'll figure out length of a chunk and then
    # we'll try to read it from socket.
    if self.chunk_left is not None:
        return
    line = self._fp.fp.readline()
    line = line.split(b';', 1)[0]
    try:
        if len(line) == 0:
            self.chunk_left = 0
        else:
            self.chunk_left = int(line, 16)
    except ValueError:
        # Invalid chunked protocol response, abort.
        self.close()
        raise httplib.IncompleteRead(line)

or a one liner as

    def _update_chunk_length(self):
        # First, we'll figure out length of a chunk and then
        # we'll try to read it from socket.
        if self.chunk_left is not None:
            return
        line = self._fp.fp.readline()
        line = line.split(b';', 1)[0]
        line = (len(line)>0 and line or "0")     # added this line
        try:
            self.chunk_left = int(line, 16)
        except ValueError:
            # Invalid chunked protocol response, abort.
            self.close()
            raise httplib.IncompleteRead(line)

Is it worth giving a PR regarding this??

@sethmlarson
Copy link
Member

Is this URL publicly available? I'd like to see the exact HTTP response.

@shivam05011996
Copy link
Author

shivam05011996 commented Jan 10, 2019

Yes, I faced this issue while requesting an URL for wikidata entry, not sure what's the exact URL because this happened after ~94000th iteration.
The URL construction was like - https://commons.wikimedia.org/wiki/File:<image_name>.jpg
Example URL - https://commons.wikimedia.org/wiki/File:Belfast_City_Hall_2.jpg

@sethmlarson
Copy link
Member

If you could get the exact URL where this happens that'd be great. The URL you've given as an example doesn't use Transfer-Encoding: chunked so it shouldn't hit this logic. Below is what I'm seeing when hitting this URL:

>>> import urllib3
>>> p = urllib3.PoolManager()
>>> r = p.request('GET', 'https://commons.wikimedia.org/wiki/File:Belfast_City_Hall_2.jpg', preload_content=False)
>>> [x for x in r.read_chunked()]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Anaconda3\lib\site-packages\urllib3\response.py", line 647, in read_chunked
    "Response is not chunked. "
urllib3.exceptions.ResponseNotChunked: Response is not chunked. Header 'transfer-encoding: chunked' is missing.

@dulitz
Copy link

dulitz commented Feb 2, 2019

I'm having this problem as well, but my webpage isn't public and its contents are sensitive. On a request that succeeds (e.g. from Chrome), the response headers look like this:

HTTP/1.1 200 OK
Date: Sat, 02 Feb 2019 14:22:35 GMT
Server: BarracudaServer.com (Posix)
Content-Type: text/html; charset=utf-8
Cache-Control: no-store, no-cache, must-revalidate, max-age=0
Transfer-Encoding: chunked
Keep-Alive: Keep-Alive
X-Frame-Options: SAMEORIGIN
Strict-Transport-Security: max-age=60000; includeSubDomains

but unfortunately I can't see the chunks in Chrome and it's a TLS request so sniffing on the wire is hard.

Not being able to see the chunks, I won't opine on whether this is "really a bug."

@sethmlarson
Copy link
Member

Is there a way you can get curl to show the chunks? You can remove all the content we just need the boundaries and how much data is between them. If you've got a reliable reproducer we would love to see it. :)

@dulitz
Copy link

dulitz commented Feb 2, 2019

I stepped through urllib3 during the read. The server was returning a 500. The response headers said it was chunked but it wasn't -- I believe the body was empty.

So in my case this wasn't a bug in urllib3; the server definitely sent a spec-violating response. I'm just going to leave this here to remind other folks that an "incomplete read" may be because the data was never written.

@dulitz
Copy link

dulitz commented Feb 4, 2019

Could this be caught in urllib3 and re-raised with a "The server indicated a chunked response. Did the server send a non-chunked response or no chunks at all?" wrapper exception? ValueError is less than ideal to indicate a protocol error.

In my case and issue 4248 above, I'm going to guess that the body was empty; 4248 had another response on the connection while @shivam05011996 and I did not.

I'd suggest that the wrapper exception contain the HTTP response code since that might help diagnose a broken server.

@sigmavirus24
Copy link
Contributor

I'd suggest that the wrapper exception contain the HTTP response code since that might help diagnose a broken server.

I was thinking about this as well. In the original post, we clearly have a response we could attach. Perhaps read_chunked could include self in the exception to make it easier?

@rexwangcc
Copy link

We ran into a similar issue, it turned out to be an issue on the server side that it cannot properly compress and chunk. We got a workaround that passing Accept-Encoding: identity with the request headers. For those who have the same issue and what to bypass the error.

@sethmlarson sethmlarson added this to To Do in HTTP Primitives Mar 7, 2019
@mbatle
Copy link

mbatle commented Jul 6, 2019

I am able to reproduce the same bug by using this code

import requests
requests.get('https://www.telecreditobcp.com/tlcnp/index.do')

I think this is a bug in the server side, but maybe urllib3 could do a better job workarounding this bug in the server, as other libraries / applications do for example (same URL works ok in a web browser like chrome or firefox ...)

This is traceback I get when it fails:
Traceback (most recent call last):
File "/usr/lib/python3.7/site-packages/urllib3/response.py", line 603, in _update_chunk_length
self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b'HTTP/1.1 200 OK\r\n'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3.7/site-packages/urllib3/response.py", line 362, in _error_catcher
yield
File "/usr/lib/python3.7/site-packages/urllib3/response.py", line 668, in read_chunked
self._update_chunk_length()
File "/usr/lib/python3.7/site-packages/urllib3/response.py", line 607, in _update_chunk_length
raise httplib.IncompleteRead(line)
http.client.IncompleteRead: IncompleteRead(17 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3.7/site-packages/requests/models.py", line 750, in generate
for chunk in self.raw.stream(chunk_size, decode_content=True):
File "/usr/lib/python3.7/site-packages/urllib3/response.py", line 492, in stream
for line in self.read_chunked(amt, decode_content=decode_content):
File "/usr/lib/python3.7/site-packages/urllib3/response.py", line 696, in read_chunked
self._original_response.close()
File "/usr/lib64/python3.7/contextlib.py", line 130, in exit
self.gen.throw(type, value, traceback)
File "/usr/lib/python3.7/site-packages/urllib3/response.py", line 380, in _error_catcher
raise ProtocolError('Connection broken: %r' % e, e)
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(17 bytes read)', IncompleteRead(17 bytes read))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python3.7/site-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/usr/lib/python3.7/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3.7/site-packages/requests/sessions.py", line 686, in send
r.content
File "/usr/lib/python3.7/site-packages/requests/models.py", line 828, in content
self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
File "/usr/lib/python3.7/site-packages/requests/models.py", line 753, in generate
raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(17 bytes read)', IncompleteRead(17 bytes read))

@mbatle
Copy link

mbatle commented Jul 10, 2019

As far as I could see, the server (in the example above) sends chunked transfer, but it does not send properly the last 0-length chunk and returns the response code. That's where urllib3 raises exception when trying to decode the length of the chunk from the response status code line. Some insights I could find so far:

  • the same request works in a browser without complain (I have tried sending same headers than the browser but didn't see a different for the sample code reproducing the bug)

  • sometimes the request works perfectly with the sample code for a while, and then stops working again. So as our local side has not changed really, I assume there is some load balancing or something and we are served by another server maybe with possibly a slightly different version of software.

  • the following code makes it deliver part of the content but still not the full content:

r = requests.get('https://www.telecreditobcp.com/tlcnp/index.do', stream=True)
for line in r.iter_lines():
print(line)

  • the bug in my local environment happens with python 3.6.8, 3.7.3 and 2.7.16 versions in Linux. Have tried same kernel though (5.1.16-300.fc30.x86_64).

  • Have tried with Ubuntu and Fedora, both seem to fail same way.

  • I have a server in AWS running Linux with python 3.5.2, and it works always, no bugs

  • I've been told the bug seems not reproducible in Windows environment with any version.

  • I've been told enabling traffic through a VPN makes it work in a Linux environment that used to fail.

  • same bug seems to happen with curl
    curl -X GET https://www.telecreditobcp.com/tlcnp/index.do

curl: (56) Illegal or missing hexadecimal sequence in chunked-encoding

@mbatle
Copy link

mbatle commented Jul 12, 2019

Some more insights:

  • browser like chrome or firefox always work, one difference is they use TLS v1.3, while requests / urllib3 uses TLS v1.2 (I tried to force TLS v1.3 in the python client as well without success)

  • found that old versions of kernel (4.x) and openssl (1.0) work, while newest versions of kernel and openssl don't, but I haven't yet

  • analyzing wireshark traffic was not very helpful as it is encrypted

  • when it fails, it looks like the client is receiving starting from the beginning again and again non-stop (starting on the headers HTTP1/.1 OK ...), instead of receiving the rest of the page ... This can be reproduced with openssl command line directly (so probably should post it there):

openssl s_client -connect www.telecreditobcp.com:443 -servername www.telecreditobcp.com

then send this

GET /tlcnp/index.do HTTP/1.1
Host: www.telecreditobcp.com
Connection: keep-alive
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9,es-419;q=0.8,es;q=0.7

@mbatle
Copy link

mbatle commented Jul 13, 2019

Ok, I confirmed this is a bug in openssl 1.1 versions, working in openssl 1.0
Will fill the bug there, sorry for the noise !

@shivam05011996
Copy link
Author

@mbatle Thank you, for digging in deeper on this issue, if not required, please go ahead and close this issue.

@vrolijken
Copy link

@mbatle I guess I should be watching openssl/openssl#9360 ?

@zenkj
Copy link

zenkj commented Feb 20, 2020

i got exactly the same bug in a HTTP request, so i think it's not about openssl

@dulitz
Copy link

dulitz commented Feb 20, 2020 via email

@sitronet
Copy link

sitronet commented May 8, 2020

I get the same exception here with headers [Transfer-Encoding] : chunked.
on a debian 10
request.__version = 2.21.0
urllib.__version = 1.24.1 ((from dist-package)
i removed the dist-package and install the site-package : 1.25.9 -> same exception

finally i add a line as tell in the #4248

I cannot give you the Url it is a private server.
I can just tell that is on the cometd interface of the LogitechMediaServer

@h-rummukainen
Copy link

I ran into the same issue with yet another private service.

In my case the issue is that the server sends "Transfer-Encoding: chunked" in a 204 NO CONTENT response to a PUT request. The server then follows RFC7230 section 3.3.3 point 1, and does not send any message body - in particular it does not send even the chunk length, which urllib3 expects to receive. The RFC seems to be somewhat ambiguous, and the urllib3 behaviour is understandable in view of section 3.3.3 point 3.

curl used to have the same issue, but it was fixed in this commit: http: don't parse body-related headers bodyless responses

Should urllib3 follow the same logic as curl here, and ignore the header fields Content-Encoding, Content-Length, Content-Range, Last-Modified and Transfer-Encoding whenever the response is supposed to be bodyless?

@hodbn hodbn mentioned this issue Jun 9, 2020
@dulitz
Copy link

dulitz commented Jun 9, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

11 participants