Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

HEAD request on large static files are very slow #988

Closed
MatsDahlberg opened this Issue · 3 comments

2 participants

@MatsDahlberg

I have a tornado application that receives HEAD requests on very large static files (15-25 GB files). These files happens to be located on a remote server that is mounted over sshfs.

The HEAD request took 140-250 seconds which seemed strange to me. I found that even though it was a HEAD request tornado would read the whole file just to determine the length of it.

From web.py, lines 1947-1952:

        content_length = 0
        for chunk in content:
            if include_body:
                self.write(chunk)
            else:
                content_length += len(chunk)

In my server I have changed this to:

        content_length = 0
        if include_body:
            for chunk in content:
                self.write(chunk)
        else:
            content_length = os.stat(self.absolute_path).st_size

Now the HEAD request takes 2 ms instead of 140-250 seconds.

@bdarnell
Owner

Your change breaks support for Range: requests (because the Content-Length must be set to the amount of data returned, not the total data size. I'm not sure it makes sense to combine HEAD and Range:, but it's legal HTTP). The else: block needs to take into account start, end, and get_content_size() (not calling stat() directly).

@MatsDahlberg

In my application the Range: request works just fine. The application that uses the tornado server uses a lot of HEAD and GET with Range: requests and there are no problems. If I test the Range: request with curl the header and content looks good as well.

Below is the output from curl on a data file.

test)mats.dahlberg@clinical-db:~/test$ curl -i --header "Range: bytes=20001-36907" http://localhost:8082/static/46-2-1U/mosaik/GATK/46-2-1U.121217_AD1HRHACXX_GATCAG.lane6_sorted_pmd_rreal_brecal_reduced.bam
HTTP/1.1 206 Partial Content
Content-Length: 16907
Accept-Ranges: bytes
Server: TornadoServer/3.1.1
Last-Modified: Thu, 06 Feb 2014 02:06:31 GMT
Content-Range: bytes 20001-36907/4619243976
Date: Mon, 24 Feb 2014 07:14:50 GMT
Content-Type: text/html; charset=UTF-8

The reason it works is that if the request is Range:, then the Content-Range: header is set earlier in the method (line 1926 or 1940). And the Content-Length: is handled by the finish() method on line 829.

@bdarnell
Owner

To clarify, the problem I was referring to is when both HEAD and Range: are used - the Content-Length header must be restricted by the selected range; your proposed change would return the size of the entire file. I've just pushed a fix.

@bdarnell bdarnell closed this issue from a commit
@bdarnell bdarnell Improve performance of HEAD requests on large static files.
StaticFileHandler would previously read the entire file and throw it
away just to compute its length; now it uses get_content_size()
instead().

Added extra validation in tests by performing both GET and HEAD
versions of all requests and ensuring the content headers match.

Closes #988.
59fee55
@bdarnell bdarnell closed this in 59fee55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.