Memory leak when process big file upload #740

JerryKwan · 2013-04-18T08:34:51Z

something is wrong in tornado v3.0.1
when i upload a large file (about 101M, larger than default max_buffer_size), then tornado server raise an exception as follows:
ERROR:tornado.application:Error in connection callback
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/tornado-3.0.1-py2.7.egg/tornado/tcpserver.py", line 228, in _handle_connection
self.handle_stream(stream, address)
File "/usr/local/lib/python2.7/dist-packages/tornado-3.0.1-py2.7.egg/tornado/httpserver.py", line 157, in handle_stream
self.no_keep_alive, self.xheaders, self.protocol)
File "/usr/local/lib/python2.7/dist-packages/tornado-3.0.1-py2.7.egg/tornado/httpserver.py", line 190, in init
self.stream.read_until(b"\r\n\r\n", self._header_callback)
File "/usr/local/lib/python2.7/dist-packages/tornado-3.0.1-py2.7.egg/tornado/iostream.py", line 148, in read_until
self._try_inline_read()
File "/usr/local/lib/python2.7/dist-packages/tornado-3.0.1-py2.7.egg/tornado/iostream.py", line 398, in _try_inline_read
if self._read_to_buffer() == 0:
File "/usr/local/lib/python2.7/dist-packages/tornado-3.0.1-py2.7.egg/tornado/iostream.py", line 432, in _read_to_buffer
raise IOError("Reached maximum read buffer size")
IOError: Reached maximum read buffer size
but after several big file upload request, there is a big memory increase in tornado server, so i think may be there is a memory leak in processing big file
any suggestions? how could i fix it up?

schlamar · 2013-04-18T10:25:10Z

Try to set self._read_buffer = None in BaseIOStream.close.

A proper fix should clear the write buffer, too.

wsantos · 2013-04-18T13:11:51Z

Should we dynamically increase max_buffer_size ?

        if self._read_buffer_size >= self.max_buffer_size:
            gen_log.error("Reached maximum read buffer size")
            self.close()
            raise IOError("Reached maximum read buffer size")
        return len(chunk)

schlamar · 2013-04-18T13:21:44Z

@wsantos I think the issue is not that the IOError is thrown, which is expected behavior. (Am I right @JerryKwan?)

The unexpected issue is that the memory of the read buffer does not get freed after the IOStream is closed (I assume it doesn't get garbage collected). So the right thing to do would be to free the read (and write) buffer while closing the IOStream.

schlamar · 2013-04-18T13:25:53Z

Increasing a max_* value is really arbitrary behavior, I strongly discourage that.

wsantos · 2013-04-18T13:33:22Z

@schlamar, so its ok to have a buffer with size 104857600 (100M) and send 101M?, how the IOStream handle this ?

schlamar · 2013-04-18T13:49:36Z

@JerryKwan BTW, for such large uploads I would patch tornado to buffer directly on disk.

@wsantos Raises the IOError as expected. On low-level you can handle this gracefully by processing portions of the data (e.g. with read_bytes). The HTTP part however does read_until(HTTP_END), so if the data sent is bigger than 100M (in one HTTP request of course) it will fail. This is IMO fine as you can either adjust the limit on your own or overcome this limitation completely by manually buffering on disk. The latter one would require some tough work, but is the most suitable in this case IMO.

wsantos · 2013-04-18T13:56:00Z

@schlamar i agree, now we can start a server and set the limit we can handle look #732

@JerryKwan it's OK to you to have a configurable option on server start ? maybe it will be the best option.

schlamar · 2013-04-18T14:04:05Z

FYI; bottle is automatically switching to a temporary file for file uploads if content length of request is bigger than max buffer size: https://github.com/defnull/bottle/blob/d89d676a6510541e679fac1ac1e8a5745eb87f76/bottle.py#L1050

Tornado should be doing the same IMO (and decrease max buffer size to 1M or less).

schlamar · 2013-04-18T14:23:31Z

Flask, too: http://flask.pocoo.org/docs/patterns/fileuploads/#improving-uploads

wsantos · 2013-04-18T14:28:07Z

We need to look at this with care, it will block the tornado during file upload ?

schlamar · 2013-04-18T14:33:17Z

We need to look at this with care, it will block the tornado during file upload ?

What exactly and why should any of the proposals here block?

wsantos · 2013-04-18T14:35:03Z

When we are writing in tempfile.

schlamar · 2013-04-18T14:43:39Z

When we are writing in tempfile.

Really? What do you think is blocking more serious: writing 100M all at once to a file (this is what a user usually does after a file upload) or write small chunks to a file...

saghul · 2013-04-18T14:47:06Z

The unexpected issue is that the memory of the read buffer does not get freed after the IOStream is closed (I assume it doesn't get garbage collected). So the right thing to do would be to free the read (and write) buffer while closing the IOStream.

The problem is that someone may want to read the remaining data in the buffer. I guess that reading until you get an exception would consume the buffer and 'fix' the issue. Or maybe have a dedicated cleanup method.

schlamar · 2013-04-18T14:57:35Z

I guess that reading until you get an exception would consume the buffer

It does already read until the buffer is full, then it closes the stream and throws the exception. The buffer doesn't get consumed at all if context length > max buffer size.

Why should anyone read the buffer after the stream is closed? A close forbids any further interaction with the stream anyhow (see _checked_close).

saghul · 2013-04-18T15:04:51Z

Imagine I'm using read_until_delimiter and the connection gets closed. There is data in the buffer which wasn't consumed.

You can actually read that data, with read_bytes, the close check is done after attempting a read from the buffer: https://github.com/facebook/tornado/blob/master/tornado/iostream.py#L383

wsantos · 2013-04-18T16:08:15Z

Its the way to got, but just wondering how to implant it and avoid blocking
problems, if there any.

Grato,

Waldecir

On Thu, Apr 18, 2013 at 11:43 AM, Marc Schlaich notifications@github.comwrote:

When we are writing in tempfile.

Really? What do you think is blocking more serious: writing 100M all at
once to a file (this is what a user usually does after a file upload) or
write small chunks to a file...

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/740#issuecomment-16580581
.

schlamar · 2013-04-18T16:09:35Z

Sure. But this is IMO practically irrelevant. If you could consume the buffer with read_bytes, you would have done this in the first place and prevent the buffer overflow at all :)
If your buffer is full, you reach a point where the content of the buffer is not usable and can be thrown away. Otherwise you just designed your protocol wrong.

schlamar · 2013-04-18T16:19:50Z

Or you allow the user to register a callback for buffer overflow and the default case (no callback registered) would be raising the IOError and clearing the buffer.

wsantos · 2013-04-18T16:22:05Z

We need to only work on fix the memory leak in this issue, and create other
for discuss the Tempfile when size is more than mx_buffer_size. what do you
think ?

Grato,

Waldecir

On Thu, Apr 18, 2013 at 1:19 PM, Marc Schlaich notifications@github.comwrote:

Or you allow the user to register a callback for buffer overflow and the
default case (no callback registered) would be raising the IOError and
clearing the buffer.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/740#issuecomment-16586767
.

schlamar · 2013-04-18T16:30:10Z

and create other for discuss the Tempfile when size is more than mx_buffer_size. what do you think

Exactly my thought 👍

wsantos · 2013-04-18T16:34:56Z

Issue for stream bigger than max_buffer_size #743, im making some testes if memory leak issue.

regards

saghul · 2013-04-18T17:28:47Z

Sure. But this is IMO practically irrelevant. If you could consume the buffer with read_bytes, you would have done this in the first place and prevent the buffer overflow at all :)
If your buffer is full, you reach a point where the content of the buffer is not usable and can be thrown away. Otherwise you just designed your protocol wrong.

IIRC there are other cases when the connection is interrupted (like in case the remote closes it) and the buffer is not emptied either. I can happen for reasons other than overflow.

schlamar · 2013-04-18T19:16:29Z

IIRC there are other cases when the connection is interrupted (like in case the remote closes it) and the buffer is not emptied either. I can happen for reasons other than overflow.

This case should be covered here: https://github.com/facebook/tornado/blob/master/tornado/iostream.py#L289

JerryKwan · 2013-04-19T01:17:39Z

sorry for the late reply
as discussed before, i think the buffered data should be cleared when some exception occured.
so when i add the following data clear statements in function close() in iostream.py, seems like the memory leak problem is solved
now the close() function looks like this：

def close(self, exc_info=False):
        """Close this stream.
        If ``exc_info`` is true, set the ``error`` attribute to the current
        exception from `sys.exc_info` (or if ``exc_info`` is a tuple,
        use that instead of `sys.exc_info`).
        """
        if not self.closed():
            if exc_info:
                if not isinstance(exc_info, tuple):
                    exc_info = sys.exc_info()
                if any(exc_info):
                    self.error = exc_info[1]
            if self._read_until_close:
                callback = self._read_callback
                self._read_callback = None
                self._read_until_close = False
                self._run_callback(callback,
                                   self._consume(self._read_buffer_size))
            if self._state is not None:
                self.io_loop.remove_handler(self.fileno())
                self._state = None
            self.close_fd()
            # clear unneeded buffered data
            self._read_buffer = None
            self._write_buffer = None
            self._closed = True

        self._maybe_run_close_callback()

bdarnell · 2013-04-24T02:46:05Z

The IOStream, buffered data and all, should be getting garbage collected when it's no longer needed. What's keeping the IOStreams alive in this case?

This was referenced Apr 18, 2013

Fixes issue #740 #746

Closed

Fix issue 740, memory leak #747

Closed

bdarnell closed this as completed in 12780c1 Apr 29, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak when process big file upload #740

Memory leak when process big file upload #740

JerryKwan commented Apr 18, 2013

schlamar commented Apr 18, 2013

wsantos commented Apr 18, 2013

schlamar commented Apr 18, 2013

schlamar commented Apr 18, 2013

wsantos commented Apr 18, 2013

schlamar commented Apr 18, 2013

wsantos commented Apr 18, 2013

schlamar commented Apr 18, 2013

schlamar commented Apr 18, 2013

wsantos commented Apr 18, 2013

schlamar commented Apr 18, 2013

wsantos commented Apr 18, 2013

schlamar commented Apr 18, 2013

saghul commented Apr 18, 2013

schlamar commented Apr 18, 2013

saghul commented Apr 18, 2013

wsantos commented Apr 18, 2013

schlamar commented Apr 18, 2013

schlamar commented Apr 18, 2013

wsantos commented Apr 18, 2013

schlamar commented Apr 18, 2013

wsantos commented Apr 18, 2013

saghul commented Apr 18, 2013

schlamar commented Apr 18, 2013

JerryKwan commented Apr 19, 2013

bdarnell commented Apr 24, 2013

Memory leak when process big file upload #740

Memory leak when process big file upload #740

Comments

JerryKwan commented Apr 18, 2013

schlamar commented Apr 18, 2013

wsantos commented Apr 18, 2013

schlamar commented Apr 18, 2013

schlamar commented Apr 18, 2013

wsantos commented Apr 18, 2013

schlamar commented Apr 18, 2013

wsantos commented Apr 18, 2013

schlamar commented Apr 18, 2013

schlamar commented Apr 18, 2013

wsantos commented Apr 18, 2013

schlamar commented Apr 18, 2013

wsantos commented Apr 18, 2013

schlamar commented Apr 18, 2013

saghul commented Apr 18, 2013

schlamar commented Apr 18, 2013

saghul commented Apr 18, 2013

wsantos commented Apr 18, 2013

schlamar commented Apr 18, 2013

schlamar commented Apr 18, 2013

wsantos commented Apr 18, 2013

schlamar commented Apr 18, 2013

wsantos commented Apr 18, 2013

saghul commented Apr 18, 2013

schlamar commented Apr 18, 2013

JerryKwan commented Apr 19, 2013

bdarnell commented Apr 24, 2013