Message body length not matching Content-Length should raise httplib.IncompleteRead even when using stream() #311

patricklaw · 2014-01-10T21:44:23Z

More details in the issue I originally filed with requests (https://github.com/kennethreitz/requests/issues/1855). Here is a repro, slightly modified from the repro in that issue:

import SocketServer as socketserver
import threading
import urllib3
import time


class MyTCPHandler(socketserver.BaseRequestHandler):
    def handle(self):
        self.data = self.request.recv(1024)
        self.request.sendall('HTTP/1.1 200 OK\r\n'
                             'Server: truncator/0.0\r\n'
                             'Content-Length: 20\r\n'
                             'Connection: close\r\n\r\n'
                             '12345')

server = None
def background_server():
    global server
    HOST, PORT = "localhost", 9999
    server = socketserver.TCPServer((HOST, PORT), MyTCPHandler)
    server.serve_forever()


if __name__ == "__main__":
    t = threading.Thread(target=background_server)
    t.daemon = True
    t.start()
    time.sleep(1)
    http = urllib3.PoolManager()
    try:
        g = http.request('GET', 'http://localhost:9999', preload_content=False).stream()
        data = list(g) # This should raise httplib.IncompleteRead
        print(data)
    finally:
        server.shutdown()

shazow · 2014-01-10T21:47:12Z

I'm open to a PR which changes this behaviour, but we need to be careful to take into account what @sigmavirus24 and @Lukasa said.

Are you interested in submitting a patch for this, @patricklaw?

Lukasa · 2014-01-10T23:11:31Z

Again, I have really mixed feelings about doing this on a stream() call. If you're stream()ing, you should be prepared to accept more or less data than the server claims it'll send, and do the appropriate thing in that case, rather than have urllib3 vomit in your face.

shazow · 2014-01-10T23:26:39Z

Could add some kind of .stream(strict=True) which would raise an exception? :/

patricklaw · 2014-01-10T23:27:29Z

Why should you be prepared to accept more or less data than the server claims? And even if you should, why would you expect to not be informed as the spec instructs?

Lukasa · 2014-01-10T23:47:11Z

@patricklaw What I'm getting at here is that it's totally reasonable to have a use case that says "I know that the server incorrectly indicated the length of the content, and I'm ok with that." There are plenty of reasons you might be OK with that: for instance, you might already know the server is faulty, or you might be testing a server you wrote that makes mistakes.

When streaming data, in particular, the user of urllib3 is taking control of how data gets read away from urllib3. In effect, the user is saying "please let me choose how much data I read and when". In that world, the user doesn't need to be informed that they've downloaded more or less data than the server told them, because they should already know. I'd argue that if they don't know they should not be using the streaming API to begin with.

There is added ambiguity in the spec caused by the phrase 'user-agent'. You believe (not necessarily incorrectly, by the way) that urllib3 is a user-agent. I believe that it needn't be a user-agent, and in fact whether it is depends entirely on the particular consumer of urllib3's APIs. For instance, it can be a lower-lying layer inside a higher-level user-agent (such as Requests, which also is sometimes not a user-agent).

I'm entirely open to @shazow's suggestion of having a strict kwarg that decides whether an exception gets raised or not, that's totally reasonable. But I think forcing an exception to be raised is not just unnecessary, but obnoxious: careless users will get burned when the exception they weren't prepared for unwinds their stack and causes them to lose their data. And to be clear, most users are careless, as this kind of error is deeply uncommon.

Lukasa · 2014-01-29T19:39:54Z

I followed up on this, because I was wondering why httplib wasn't raising this anyway. It turns out that httplib does exactly what we do: if you use HTTPResponse.read() with an argument specifying how many bytes to read, it never throws an IncompleteRead error.

With that in mind, I think we can close this and the associated requests issue. Everyone is doing the same thing here, and I think that thing is fine.

shazow · 2014-01-29T20:06:49Z

I'm personally happy to leave this as-is too. @patricklaw If you still feel very strongly about it, I'm willing to review a PR for a .stream(strict=True) param.

amarchandole · 2017-09-06T22:38:54Z

If we are not throwing any exception for incomplete reads, what would be a way to catch such a case? In my case, I am streaming content 4kb at a time and I do not want to go ahead if the fetched content is incomplete. However, the status code is 200 and there is no way at all to find out if the content read is incomplete, as if chunk encoding is present, I do not even have content-length to compare against.

Lukasa · 2017-09-07T05:28:19Z

If chunk encoding is present you should get errors if it is incomplete.

shazow closed this as completed Jan 29, 2014

Lukasa mentioned this issue Jan 29, 2014

When Content-Length differs from received message body length, an exception should be raised psf/requests#1855

Closed

kwlzn mentioned this issue May 17, 2018

pex resolver can end up with truncated reads of binary dists, causing an opaque untranslateable error pex-tool/pex#475

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Message body length not matching Content-Length should raise httplib.IncompleteRead even when using stream() #311

Message body length not matching Content-Length should raise httplib.IncompleteRead even when using stream() #311

patricklaw commented Jan 10, 2014

shazow commented Jan 10, 2014

Lukasa commented Jan 10, 2014

shazow commented Jan 10, 2014

patricklaw commented Jan 10, 2014

Lukasa commented Jan 10, 2014

Lukasa commented Jan 29, 2014

shazow commented Jan 29, 2014

amarchandole commented Sep 6, 2017 •

edited

Loading

Lukasa commented Sep 7, 2017

Message body length not matching Content-Length should raise httplib.IncompleteRead even when using stream() #311

Message body length not matching Content-Length should raise httplib.IncompleteRead even when using stream() #311

Comments

patricklaw commented Jan 10, 2014

shazow commented Jan 10, 2014

Lukasa commented Jan 10, 2014

shazow commented Jan 10, 2014

patricklaw commented Jan 10, 2014

Lukasa commented Jan 10, 2014

Lukasa commented Jan 29, 2014

shazow commented Jan 29, 2014

amarchandole commented Sep 6, 2017 • edited Loading

Lukasa commented Sep 7, 2017

amarchandole commented Sep 6, 2017 •

edited

Loading