Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

HEAD request on large static files are very slow #988

Closed
MatsDahlberg opened this Issue Feb 21, 2014 · 3 comments

Comments

Projects
None yet
2 participants

I have a tornado application that receives HEAD requests on very large static files (15-25 GB files). These files happens to be located on a remote server that is mounted over sshfs.

The HEAD request took 140-250 seconds which seemed strange to me. I found that even though it was a HEAD request tornado would read the whole file just to determine the length of it.

From web.py, lines 1947-1952:

        content_length = 0
        for chunk in content:
            if include_body:
                self.write(chunk)
            else:
                content_length += len(chunk)

In my server I have changed this to:

        content_length = 0
        if include_body:
            for chunk in content:
                self.write(chunk)
        else:
            content_length = os.stat(self.absolute_path).st_size

Now the HEAD request takes 2 ms instead of 140-250 seconds.

Owner

bdarnell commented Feb 22, 2014

Your change breaks support for Range: requests (because the Content-Length must be set to the amount of data returned, not the total data size. I'm not sure it makes sense to combine HEAD and Range:, but it's legal HTTP). The else: block needs to take into account start, end, and get_content_size() (not calling stat() directly).

In my application the Range: request works just fine. The application that uses the tornado server uses a lot of HEAD and GET with Range: requests and there are no problems. If I test the Range: request with curl the header and content looks good as well.

Below is the output from curl on a data file.

test)mats.dahlberg@clinical-db:~/test$ curl -i --header "Range: bytes=20001-36907" http://localhost:8082/static/46-2-1U/mosaik/GATK/46-2-1U.121217_AD1HRHACXX_GATCAG.lane6_sorted_pmd_rreal_brecal_reduced.bam
HTTP/1.1 206 Partial Content
Content-Length: 16907
Accept-Ranges: bytes
Server: TornadoServer/3.1.1
Last-Modified: Thu, 06 Feb 2014 02:06:31 GMT
Content-Range: bytes 20001-36907/4619243976
Date: Mon, 24 Feb 2014 07:14:50 GMT
Content-Type: text/html; charset=UTF-8

The reason it works is that if the request is Range:, then the Content-Range: header is set earlier in the method (line 1926 or 1940). And the Content-Length: is handled by the finish() method on line 829.

Owner

bdarnell commented May 25, 2014

To clarify, the problem I was referring to is when both HEAD and Range: are used - the Content-Length header must be restricted by the selected range; your proposed change would return the size of the entire file. I've just pushed a fix.

@bdarnell bdarnell closed this in 59fee55 May 25, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment