New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak when process big file upload #740
Comments
Try to set A proper fix should clear the write buffer, too. |
Should we dynamically increase max_buffer_size ? if self._read_buffer_size >= self.max_buffer_size:
gen_log.error("Reached maximum read buffer size")
self.close()
raise IOError("Reached maximum read buffer size")
return len(chunk) |
@wsantos I think the issue is not that the IOError is thrown, which is expected behavior. (Am I right @JerryKwan?) The unexpected issue is that the memory of the read buffer does not get freed after the IOStream is closed (I assume it doesn't get garbage collected). So the right thing to do would be to free the read (and write) buffer while closing the IOStream. |
Increasing a |
@schlamar, so its ok to have a buffer with size 104857600 (100M) and send 101M?, how the IOStream handle this ? |
@JerryKwan BTW, for such large uploads I would patch tornado to buffer directly on disk. @wsantos Raises the IOError as expected. On low-level you can handle this gracefully by processing portions of the data (e.g. with read_bytes). The HTTP part however does |
@schlamar i agree, now we can start a server and set the limit we can handle look #732 @JerryKwan it's OK to you to have a configurable option on server start ? maybe it will be the best option. |
FYI; bottle is automatically switching to a temporary file for file uploads if content length of request is bigger than max buffer size: https://github.com/defnull/bottle/blob/d89d676a6510541e679fac1ac1e8a5745eb87f76/bottle.py#L1050 Tornado should be doing the same IMO (and decrease max buffer size to 1M or less). |
We need to look at this with care, it will block the tornado during file upload ? |
What exactly and why should any of the proposals here block? |
When we are writing in tempfile. |
Really? What do you think is blocking more serious: writing 100M all at once to a file (this is what a user usually does after a file upload) or write small chunks to a file... |
The problem is that someone may want to read the remaining data in the buffer. I guess that reading until you get an exception would consume the buffer and 'fix' the issue. Or maybe have a dedicated cleanup method. |
It does already read until the buffer is full, then it closes the stream and throws the exception. The buffer doesn't get consumed at all if context length > max buffer size. Why should anyone read the buffer after the stream is closed? A close forbids any further interaction with the stream anyhow (see |
Imagine I'm using read_until_delimiter and the connection gets closed. There is data in the buffer which wasn't consumed. You can actually read that data, with read_bytes, the close check is done after attempting a read from the buffer: https://github.com/facebook/tornado/blob/master/tornado/iostream.py#L383 |
Its the way to got, but just wondering how to implant it and avoid blocking Grato, Waldecir On Thu, Apr 18, 2013 at 11:43 AM, Marc Schlaich notifications@github.comwrote:
|
Sure. But this is IMO practically irrelevant. If you could consume the buffer with read_bytes, you would have done this in the first place and prevent the buffer overflow at all :) |
Or you allow the user to register a callback for buffer overflow and the default case (no callback registered) would be raising the IOError and clearing the buffer. |
We need to only work on fix the memory leak in this issue, and create other Grato, Waldecir On Thu, Apr 18, 2013 at 1:19 PM, Marc Schlaich notifications@github.comwrote:
|
Exactly my thought 👍 |
Issue for stream bigger than max_buffer_size #743, im making some testes if memory leak issue. regards |
IIRC there are other cases when the connection is interrupted (like in case the remote closes it) and the buffer is not emptied either. I can happen for reasons other than overflow. |
This case should be covered here: https://github.com/facebook/tornado/blob/master/tornado/iostream.py#L289 |
sorry for the late reply def close(self, exc_info=False):
"""Close this stream.
If ``exc_info`` is true, set the ``error`` attribute to the current
exception from `sys.exc_info` (or if ``exc_info`` is a tuple,
use that instead of `sys.exc_info`).
"""
if not self.closed():
if exc_info:
if not isinstance(exc_info, tuple):
exc_info = sys.exc_info()
if any(exc_info):
self.error = exc_info[1]
if self._read_until_close:
callback = self._read_callback
self._read_callback = None
self._read_until_close = False
self._run_callback(callback,
self._consume(self._read_buffer_size))
if self._state is not None:
self.io_loop.remove_handler(self.fileno())
self._state = None
self.close_fd()
# clear unneeded buffered data
self._read_buffer = None
self._write_buffer = None
self._closed = True
self._maybe_run_close_callback() |
The IOStream, buffered data and all, should be getting garbage collected when it's no longer needed. What's keeping the IOStreams alive in this case? |
something is wrong in tornado v3.0.1
when i upload a large file (about 101M, larger than default max_buffer_size), then tornado server raise an exception as follows:
ERROR:tornado.application:Error in connection callback
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/tornado-3.0.1-py2.7.egg/tornado/tcpserver.py", line 228, in _handle_connection
self.handle_stream(stream, address)
File "/usr/local/lib/python2.7/dist-packages/tornado-3.0.1-py2.7.egg/tornado/httpserver.py", line 157, in handle_stream
self.no_keep_alive, self.xheaders, self.protocol)
File "/usr/local/lib/python2.7/dist-packages/tornado-3.0.1-py2.7.egg/tornado/httpserver.py", line 190, in init
self.stream.read_until(b"\r\n\r\n", self._header_callback)
File "/usr/local/lib/python2.7/dist-packages/tornado-3.0.1-py2.7.egg/tornado/iostream.py", line 148, in read_until
self._try_inline_read()
File "/usr/local/lib/python2.7/dist-packages/tornado-3.0.1-py2.7.egg/tornado/iostream.py", line 398, in _try_inline_read
if self._read_to_buffer() == 0:
File "/usr/local/lib/python2.7/dist-packages/tornado-3.0.1-py2.7.egg/tornado/iostream.py", line 432, in _read_to_buffer
raise IOError("Reached maximum read buffer size")
IOError: Reached maximum read buffer size
but after several big file upload request, there is a big memory increase in tornado server, so i think may be there is a memory leak in processing big file
any suggestions? how could i fix it up?
The text was updated successfully, but these errors were encountered: