Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

made SimpleAsyncHTTPClient header parsing more error-resistant. #341

Closed
wants to merge 10 commits into from

4 participants

@jeeyoungk

Implemented read_until_regex in tornado. This is similar to read_until(), except it looks for a specific delimiter. This function is used to parse the header of the website where it returns non-standard, headers delimited by '\n' only (like http://news.ycombinator.com/rss).

Jeeyoung Kim and others added some commits
Jeeyoung Kim Implemented read_until_regex in tornado. This is similar to read_unti…
…l(), except it looks for a specific delimiter. This function is used to parse the header of the website, where it is delimited by '\n' only (like http://news.ycombinator.com/rss)
8640440
Jeeyoung Kim added flag to ignore server side hangup (i.e. server closing socket) …
…as an error event. this allows us to keep reading from the socket, even after the server closes the stream.
fbccc00
@wchau wchau Removed error flag check in handle events, because it's unnecessary. bd31ef5
@wchau wchau Revert "added flag to ignore server side hangup (i.e. server closing …
…socket) as an error event. this allows us to keep reading from the socket, even after the server closes the stream."

This reverts commit fbccc00.
e4e314b
@wchau wchau Revert "Revert "added flag to ignore server side hangup (i.e. server …
…closing socket) as an error event. this allows us to keep reading from the socket, even after the server closes the stream.""

This reverts commit e4e314b.
ff8c391
Jeeyoung Kim added flag to ignore server side hangup (i.e. server closing socket) …
…as an error event. this allows us to keep reading from the socket, even after the server closes the stream.
802c55b
@wchau wchau Ignore connection closed EPOLL events and continue reading
in a smart way. Seperated error events with connection closed events.
TODO: Handle case where there is no more to read in the socket buffer but
the connection is closed already.
118a79b
Jeeyoung Kim Revert "Ignore connection closed EPOLL events and continue reading"
This reverts commit 118a79b.
e041d40
Jeeyoung Kim created logger for tornado.simple_httpclient. 4865316
@wchau wchau Rename branch with contextlogic tag. f5fe197
@bdarnell
Owner

I've merged the read_until_regex change. If you want the other changes considered for inclusion submit a separate pull request for them.

@bdarnell bdarnell closed this
@superduper

Hey there! Why was that changed? What if HTTP/1.0 404 \r\n header will be passed there(like PHP sometimes does)?

Owner

It'll still match - the \n is always present, but some buggy servers omit the \r. Both of the lines you commented on will work with "HTTP/1.0 404 \r\n" (although the space is required, so if a buggy server just sends "HTTP/1.0 404\r\n" it won't work.

Thanks for a prompt reply, I guess thats the reason.. As it fails with php builtin webserver (php5.4, http://php.net/manual/en/features.commandline.webserver.php)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Aug 22, 2011
  1. Implemented read_until_regex in tornado. This is similar to read_unti…

    Jeeyoung Kim authored
    …l(), except it looks for a specific delimiter. This function is used to parse the header of the website, where it is delimited by '\n' only (like http://news.ycombinator.com/rss)
  2. added flag to ignore server side hangup (i.e. server closing socket) …

    Jeeyoung Kim authored
    …as an error event. this allows us to keep reading from the socket, even after the server closes the stream.
Commits on Aug 23, 2011
  1. @wchau
  2. @wchau

    Revert "added flag to ignore server side hangup (i.e. server closing …

    wchau authored
    …socket) as an error event. this allows us to keep reading from the socket, even after the server closes the stream."
    
    This reverts commit fbccc00.
  3. @wchau

    Revert "Revert "added flag to ignore server side hangup (i.e. server …

    wchau authored
    …closing socket) as an error event. this allows us to keep reading from the socket, even after the server closes the stream.""
    
    This reverts commit e4e314b.
  4. @wchau

    added flag to ignore server side hangup (i.e. server closing socket) …

    Jeeyoung Kim authored wchau committed
    …as an error event. this allows us to keep reading from the socket, even after the server closes the stream.
  5. @wchau

    Ignore connection closed EPOLL events and continue reading

    wchau authored
    in a smart way. Seperated error events with connection closed events.
    TODO: Handle case where there is no more to read in the socket buffer but
    the connection is closed already.
  6. Revert "Ignore connection closed EPOLL events and continue reading"

    Jeeyoung Kim authored
    This reverts commit 118a79b.
  7. created logger for tornado.simple_httpclient.

    Jeeyoung Kim authored
  8. @wchau
This page is out of date. Refresh to see the latest.
View
2  setup.py
@@ -33,7 +33,7 @@
extensions.append(distutils.core.Extension(
"tornado.epoll", ["tornado/epoll.c"]))
-version = "2.0git"
+version = "2.0git-cl"
if major >= 3:
import setuptools # setuptools is required for use_2to3
View
2  tornado/__init__.py
@@ -16,5 +16,5 @@
"""The Tornado web server and tools."""
-version = "2.0git"
+version = "2.0git-cl"
version_info = (2, 0, 0)
View
36 tornado/iostream.py
@@ -23,6 +23,7 @@
import logging
import socket
import sys
+import re
from tornado import ioloop
from tornado import stack_context
@@ -78,17 +79,19 @@ def on_body(data):
"""
def __init__(self, socket, io_loop=None, max_buffer_size=104857600,
- read_chunk_size=4096):
+ read_chunk_size=4096, ignore_hangup=True):
self.socket = socket
self.socket.setblocking(False)
self.io_loop = io_loop or ioloop.IOLoop.instance()
self.max_buffer_size = max_buffer_size
self.read_chunk_size = read_chunk_size
+ self.ignore_hangup = ignore_hangup
self._read_buffer = collections.deque()
self._write_buffer = collections.deque()
self._read_buffer_size = 0
self._write_buffer_frozen = False
self._read_delimiter = None
+ self._read_regex = None
self._read_bytes = None
self._read_until_close = False
self._read_callback = None
@@ -98,6 +101,8 @@ def __init__(self, socket, io_loop=None, max_buffer_size=104857600,
self._connecting = False
self._state = None
self._pending_callbacks = 0
+ self._error_flags = ioloop.IOLoop.ERROR & ~ioloop.IOLoop._EPOLLRDHUP if\
+ ignore_hangup else ioloop.IOLoop.ERROR
def connect(self, address, callback=None):
"""Connects the socket to a remote address without blocking.
@@ -124,6 +129,20 @@ def connect(self, address, callback=None):
self._connect_callback = stack_context.wrap(callback)
self._add_io_state(self.io_loop.WRITE)
+ def read_until_regex(self, regex, callback):
+ """Call callback when we read the given regex pattern."""
+ assert not self._read_callback, "Already reading"
+ self._read_regex = re.compile(regex)
+ self._read_callback = stack_context.wrap(callback)
+ while True:
+ # See if we've already got the data from a previous read
+ if self._read_from_buffer():
+ return
+ self._check_closed()
+ if self._read_to_buffer() == 0:
+ break
+ self._add_io_state(self.io_loop.READ)
+
def read_until(self, delimiter, callback):
"""Call callback when we read the given delimiter."""
assert not self._read_callback, "Already reading"
@@ -233,13 +252,13 @@ def _handle_events(self, fd, events):
self._handle_write()
if not self.socket:
return
- if events & self.io_loop.ERROR:
+ if events & self._error_flags:
# We may have queued up a user callback in _handle_read or
# _handle_write, so don't close the IOStream until those
# callbacks have had a chance to run.
self.io_loop.add_callback(self.close)
return
- state = self.io_loop.ERROR
+ state = self._error_flags
if self.reading():
state |= self.io_loop.READ
if self.writing():
@@ -374,6 +393,15 @@ def _read_from_buffer(self):
self._run_callback(callback,
self._consume(loc + delimiter_len))
return True
+ elif self._read_regex is not None:
+ _merge_prefix(self._read_buffer, sys.maxint)
+ m = self._read_regex.search(self._read_buffer[0])
+ if m:
+ callback = self._read_callback
+ self._read_callback = None
+ self._read_regex = None
+ self._run_callback(callback, self._consume(m.end()))
+ return True
return False
def _handle_connect(self):
@@ -478,7 +506,7 @@ def _add_io_state(self, state):
# connection has been closed, so there can be no future events
return
if self._state is None:
- self._state = ioloop.IOLoop.ERROR | state
+ self._state = self._error_flags | state
with stack_context.NullContext():
self.io_loop.add_handler(
self.socket.fileno(), self._handle_events, self._state)
View
9 tornado/simple_httpclient.py
@@ -32,6 +32,7 @@
except ImportError:
ssl = None
+logger = logging.getLogger('tornado.simple_httpclient')
_DEFAULT_CA_CERTS = os.path.dirname(__file__) + '/ca-certificates.crt'
class SimpleAsyncHTTPClient(AsyncHTTPClient):
@@ -99,7 +100,7 @@ def fetch(self, request, callback, **kwargs):
self.queue.append((request, callback))
self._process_queue()
if self.queue:
- logging.debug("max_clients limit reached, request queued. "
+ logger.debug("max_clients limit reached, request queued. "
"%d active, %d queued requests." % (
len(self.active), len(self.queue)))
@@ -268,7 +269,7 @@ def _on_connect(self, parsed):
self.stream.write(b("\r\n").join(request_lines) + b("\r\n\r\n"))
if self.request.body is not None:
self.stream.write(self.request.body)
- self.stream.read_until(b("\r\n\r\n"), self._on_headers)
+ self.stream.read_until_regex(b("\r?\n\r?\n"), self._on_headers)
def _release(self):
if self.release_callback is not None:
@@ -288,7 +289,7 @@ def cleanup(self):
try:
yield
except Exception, e:
- logging.warning("uncaught exception", exc_info=True)
+ logger.warning("uncaught exception", exc_info=True)
self._run_callback(HTTPResponse(self.request, 599, error=e))
def _on_close(self):
@@ -298,7 +299,7 @@ def _on_close(self):
def _on_headers(self, data):
data = native_str(data.decode("latin1"))
- first_line, _, header_data = data.partition("\r\n")
+ first_line, _, header_data = data.partition("\n")
match = re.match("HTTP/1.[01] ([0-9]+)", first_line)
assert match
self.code = int(match.group(1))
Something went wrong with that request. Please try again.