New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message body length not matching Content-Length should raise httplib.IncompleteRead even when using stream() #311

Closed
patricklaw opened this Issue Jan 10, 2014 · 9 comments

Comments

Projects
None yet
4 participants
@patricklaw

patricklaw commented Jan 10, 2014

More details in the issue I originally filed with requests (requests/requests#1855). Here is a repro, slightly modified from the repro in that issue:

import SocketServer as socketserver
import threading
import urllib3
import time


class MyTCPHandler(socketserver.BaseRequestHandler):
    def handle(self):
        self.data = self.request.recv(1024)
        self.request.sendall('HTTP/1.1 200 OK\r\n'
                             'Server: truncator/0.0\r\n'
                             'Content-Length: 20\r\n'
                             'Connection: close\r\n\r\n'
                             '12345')

server = None
def background_server():
    global server
    HOST, PORT = "localhost", 9999
    server = socketserver.TCPServer((HOST, PORT), MyTCPHandler)
    server.serve_forever()


if __name__ == "__main__":
    t = threading.Thread(target=background_server)
    t.daemon = True
    t.start()
    time.sleep(1)
    http = urllib3.PoolManager()
    try:
        g = http.request('GET', 'http://localhost:9999', preload_content=False).stream()
        data = list(g) # This should raise httplib.IncompleteRead
        print(data)
    finally:
        server.shutdown()
@shazow

This comment has been minimized.

Collaborator

shazow commented Jan 10, 2014

I'm open to a PR which changes this behaviour, but we need to be careful to take into account what @sigmavirus24 and @Lukasa said.

Are you interested in submitting a patch for this, @patricklaw?

@Lukasa

This comment has been minimized.

Contributor

Lukasa commented Jan 10, 2014

Again, I have really mixed feelings about doing this on a stream() call. If you're stream()ing, you should be prepared to accept more or less data than the server claims it'll send, and do the appropriate thing in that case, rather than have urllib3 vomit in your face.

@shazow

This comment has been minimized.

Collaborator

shazow commented Jan 10, 2014

Could add some kind of .stream(strict=True) which would raise an exception? :/

@patricklaw

This comment has been minimized.

patricklaw commented Jan 10, 2014

Why should you be prepared to accept more or less data than the server claims? And even if you should, why would you expect to not be informed as the spec instructs?

@Lukasa

This comment has been minimized.

Contributor

Lukasa commented Jan 10, 2014

@patricklaw What I'm getting at here is that it's totally reasonable to have a use case that says "I know that the server incorrectly indicated the length of the content, and I'm ok with that." There are plenty of reasons you might be OK with that: for instance, you might already know the server is faulty, or you might be testing a server you wrote that makes mistakes.

When streaming data, in particular, the user of urllib3 is taking control of how data gets read away from urllib3. In effect, the user is saying "please let me choose how much data I read and when". In that world, the user doesn't need to be informed that they've downloaded more or less data than the server told them, because they should already know. I'd argue that if they don't know they should not be using the streaming API to begin with.

There is added ambiguity in the spec caused by the phrase 'user-agent'. You believe (not necessarily incorrectly, by the way) that urllib3 is a user-agent. I believe that it needn't be a user-agent, and in fact whether it is depends entirely on the particular consumer of urllib3's APIs. For instance, it can be a lower-lying layer inside a higher-level user-agent (such as Requests, which also is sometimes not a user-agent).

I'm entirely open to @shazow's suggestion of having a strict kwarg that decides whether an exception gets raised or not, that's totally reasonable. But I think forcing an exception to be raised is not just unnecessary, but obnoxious: careless users will get burned when the exception they weren't prepared for unwinds their stack and causes them to lose their data. And to be clear, most users are careless, as this kind of error is deeply uncommon.

@Lukasa

This comment has been minimized.

Contributor

Lukasa commented Jan 29, 2014

I followed up on this, because I was wondering why httplib wasn't raising this anyway. It turns out that httplib does exactly what we do: if you use HTTPResponse.read() with an argument specifying how many bytes to read, it never throws an IncompleteRead error.

With that in mind, I think we can close this and the associated requests issue. Everyone is doing the same thing here, and I think that thing is fine.

@shazow

This comment has been minimized.

Collaborator

shazow commented Jan 29, 2014

I'm personally happy to leave this as-is too. @patricklaw If you still feel very strongly about it, I'm willing to review a PR for a .stream(strict=True) param.

@amarchandole

This comment has been minimized.

amarchandole commented Sep 6, 2017

If we are not throwing any exception for incomplete reads, what would be a way to catch such a case? In my case, I am streaming content 4kb at a time and I do not want to go ahead if the fetched content is incomplete. However, the status code is 200 and there is no way at all to find out if the content read is incomplete, as if chunk encoding is present, I do not even have content-length to compare against.

@Lukasa

This comment has been minimized.

Contributor

Lukasa commented Sep 7, 2017

If chunk encoding is present you should get errors if it is incomplete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment