Skip to content
This repository has been archived by the owner on Mar 29, 2022. It is now read-only.

Commit

Permalink
More developer documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
vfaronov committed Aug 6, 2016
1 parent d9e63d1 commit 42863a1
Show file tree
Hide file tree
Showing 15 changed files with 272 additions and 16 deletions.
62 changes: 60 additions & 2 deletions HACKING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,12 @@ Run Pylint::
$ # ... or selectively:
$ pylint httpolice/response.py

The delivery pipeline (Travis CI) enforces various other checks;
if you want to run them locally before pushing to GitHub, see ``.travis.yml``.

Use isort if you like -- there's an ``.isort.cfg`` with the right options --
but this is not enforced automatically for now.

You may also want to use linters for HTML, CSS, and JS (see ``.travis.yml``).


Dependencies
------------
Expand Down Expand Up @@ -112,3 +113,60 @@ Feel free to add ``pragma: no cover`` to code
that would be hard to cover with a natural, functional test.

To run tests with coverage checks locally, use ``tools/pytest_all.sh``.


Typical workflows
~~~~~~~~~~~~~~~~~

Handling a new header
---------------------
Let's say RFC 9999 defines a new header called ``Foo-Bar``,
and you want HTTPolice to understand it.

#. Read RFC 9999.
#. Check for any updates and errata to RFC 9999
(these are shown at the top of the page
if you're using the HTML viewer at `tools.ietf.org`__).
Note that not all errata are relevant: some may have been "rejected".
#. Rewrite the ABNF rules for ``Foo-Bar`` from RFC 9999
into a new module ``httpolice.syntax.rfc9999``,
using the parser combinators from ``httpolice.parse``.
Consult other modules in ``httpolice.syntax`` to get the hang of it.
#. Add information about ``Foo-Bar`` into ``httpolice.known.header``.
Consult the comments in that module regarding the fields you can fill in.
#. Some complex headers may need special-casing in ``httpolice.header``.
See ``CacheControlView`` for an example.

__ https://tools.ietf.org/

All the basic checks for this header (no. 1000, 1063, etc.) should now work.


Adding a notice
---------------
#. Write your notice at the end of ``httpolice/notices.xml``.
Let's say the last notice in HTTPolice has an ID of 1678,
so your new notice becomes 1679.
#. In ``test/combined_data/``, copy ``simple_ok`` to ``1679_1``.
For some notices, it's convenient to start with another file (like ``put``)
or use HAR files instead (``test/har_data/``).
#. Change the ``1679_1`` file in such a way that it should trigger notice 1679.
#. Write "1679" at the top of that file
to indicate the expected outcome of this test case.
In HAR files, use the ``_expected`` key instead.
You can also write comments there. Consult existing files.
#. If necessary, add more test cases: ``1679_2``, and so on.
#. Run your tests and make sure they fail as expected::

$ py.test -k1679

#. Write the actual checks logic.
Usually it goes into one of the four big functions described above,
but sometimes a better place is in ``httpolice.syntax`` (see e.g. no. 1015)
or somewhere else.
#. Run the tests again and make sure they pass.
#. Check the report for your test cases
to make sure the explanation looks good::

$ httpolice -i combined -o html test/combined_data/1679* >/tmp/report.html
$ open /tmp/report.html
1 change: 1 addition & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ __ https://redbot.org/

HTTPolice is hosted `on GitHub`__
and released under the MIT license (see ``LICENSE.txt``).
If you want to hack on HTTPolice, check out ``HACKING.rst``.

__ https://github.com/vfaronov/httpolice

Expand Down
10 changes: 7 additions & 3 deletions httpolice/framing1.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,15 +21,17 @@ def parse_streams(inbound, outbound, scheme=None):
is unreliable, because response framing depends on the request.
:param inbound:
The inbound (request) stream as a byte string, or `None`.
The inbound (request) stream as a :class:`~httpolice.parse.Stream`,
or `None`.
:param outbound:
The outbound (response) stream as a byte string, or `None`.
The outbound (response) stream as a :class:`~httpolice.parse.Stream`,
or `None`.
:param scheme:
The scheme of the request URI, as a Unicode string,
or `None` if unknown.
:return:
An iterable of :class:`Exchange` objects.
Some of the exchanges may be "empty",
Some of the exchanges may be "empty" (aka "complaint boxes"):
containing neither request nor responses,
but only a notice that indicates some general problem with the streams.
"""
Expand All @@ -51,6 +53,7 @@ def parse_streams(inbound, outbound, scheme=None):
yield resp_box

if inbound and not inbound.eof:
# Some data remains on the inbound stream, but we can't parse it.
yield complaint_box(1007, stream=inbound,
nbytes=len(inbound.consume_rest()))

Expand All @@ -67,6 +70,7 @@ def parse_streams(inbound, outbound, scheme=None):
yield resp_box

if outbound and not outbound.eof:
# Some data remains on the outbound stream, but we can't parse it.
yield complaint_box(1010, stream=outbound,
nbytes=len(outbound.consume_rest()))

Expand Down
3 changes: 3 additions & 0 deletions httpolice/known/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# -*- coding: utf-8; -*-

# Most modules in this package are
# synchronized with IANA registries using ``tools/iana.py``.

from httpolice import structure
from httpolice.known.alt_svc_param import known as altsvc
from httpolice.known.auth_scheme import known as auth
Expand Down
39 changes: 39 additions & 0 deletions httpolice/known/header.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,45 @@ def parser_for(name):
return known.get_info(name).get('parser')


# When adding a new header, fill in the fields as follows:
#
# ``_``, ``_citations``
# Obvious, and usually filled by ``tools/iana.py``.
#
# ``rule``
# Usually, if the header is defined as a comma-separated list,
# set this to ``MULTI``
# to indicate that it can appear multiple times in a single message
# (see RFC 7230 Section 3.2.2);
# otherwise, set to ``SINGLE``.
#
# ``parser``
# The grammar symbol that can be used to parse
# *one occurrence* of this header (i.e. one `HeaderEntry.value`).
#
# ``for_request``, ``for_response``
# Whether this header can appear in requests and responses, respectively.
#
# ``precondition``
# Whether this header is a precondition (RFC 7232).
#
# ``proactive_conneg``
# Whether this header is for proactive content negotiation
# (RFC 7231 Section 5.3).
#
# ``bad_for_connection``
# You can set this to ``True`` if
# the presence of this header in a ``Connection`` header
# should trigger notice 1034.
#
# ``bad_for_trailer``
# You can set this to ``True`` if
# the presence of this header in a trailer
# should trigger notice 1026.
#
# ``iana_status``
# Filled by ``tools/iana.py``. You should not need to change it.

known = KnownDict(FieldName, [
{'_': FieldName(u'A-IM'), '_citations': [RFC(4229)]},
{'_': FieldName(u'Accept'),
Expand Down
20 changes: 20 additions & 0 deletions httpolice/known/media_type.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,26 @@ def is_patch(name):
return known.get_info(name).get('patch')


# When adding a new media type, fill in the fields as follows:
#
# ``_``, ``_citations``
# Obvious, and usually filled by ``tools/iana.py``.
#
# ``patch``
# Whether this media type is a patch, usable with the PATCH method
# (see RFC 5789 errata).
#
# ``is_json``
# Set this to ``True`` if the media type uses JSON syntax
# but **does not end** with ``+json``.
#
# ``is_xml``
# Set this to ``True`` if the media type uses XML syntax
# but **does not end** with ``+xml``.
#
# ``deprecated``
# Filled by ``tools/iana.py``. You should not need to change it.

known = KnownDict(MediaType, [
{'_': MediaType(u'application/1d-interleaved-parityfec'),
'_citations': [RFC(6015)]},
Expand Down
14 changes: 14 additions & 0 deletions httpolice/known/method.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,20 @@ def _name_for(cls, item):
return item['_'].replace(u'-', u'_')


# When adding a new method, fill in the fields as follows:
#
# ``_``, ``_citations``, ``safe``, ``idempotent``
# Obvious, and usually filled by ``tools/iana.py``.
#
# ``defines_body``
# Whether a meaning is defined for a payload body with this method.
# (For example, RFC 7231 Section 4.3.1 says
# "a payload within a GET request message has no defined semantics",
# so ``defines_body`` is ``False``.)
#
# ``cacheable``
# Whether responses to this method can be cached (RFC 7234).

known = KnownMethods([
{'_': Method(u'ACL'),
'_citations': [RFC(3744, section=(8, 1))],
Expand Down
14 changes: 14 additions & 0 deletions httpolice/known/status_code.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,20 @@ def is_cacheable(code):
return known.get_info(code).get('cacheable')


# When adding a new status code, fill in the fields as follows:
#
# ``_``, ``_citations``
# Obvious, and usually filled by ``tools/iana.py``.
#
# ``_title``
# The default reason phrase, usually filled by ``tools/iana.py``.
#
# ``cacheable``
# If the status code is defined as cacheable by default,
# set this to ``BY_DEFAULT``.
# If it is defined as never cacheable, set to ``NOT_AT_ALL``.
# Otherwise, set to ``NOT_BY_DEFAULT``.

known = KnownDict(StatusCode, [
{'_': StatusCode(100),
'_citations': [RFC(7231, section=(6, 2, 1))],
Expand Down
23 changes: 23 additions & 0 deletions httpolice/notices.xml
Original file line number Diff line number Diff line change
@@ -1,5 +1,28 @@
<?xml version="1.0" encoding="utf-8"?>

<!--
The vocabulary here should mostly be obvious after studying some examples.
See also :mod:`httpolice.notice`.
One non-obvious thing is how references work
(i.e. the mousover highlights in HTML reports):
- Constants marked up with elements like ``<h/>`` and ``<st/>``
are magically highlighted where they appear in the corresponding message(s)
(see :func:`httpolice.reports.html._magic_references` for details).
For example, if a notice includes ``<st>401</st>``
and appears in a response whose status code is 401,
then hovering the mouse on the notice will highlight the status code.
Sometimes you need to disable this with a ``ref="no"`` attribute.
- Anything referenced by a ``<var/>`` element is magically highlighted.
- If you want to highlight something else,
add an explicit ``<ref/>`` element at the end of the ``<notice/>``.
-->

<notices>

<error id="1000">
Expand Down
17 changes: 13 additions & 4 deletions httpolice/parse.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
They just work. And we get detailed error messages for free.
However, Earley is not very fast (at least this implementation).
We use it for headers, which have complex grammars.
We use it for headers, which have complex grammars but are easy to memoize.
For HTTP/1.x message framing, we just use regexes,
which are derived automatically from the same code (:meth:`Symbol.as_regex`).
Expand Down Expand Up @@ -583,7 +583,13 @@ class Stream(object):

# pylint: disable=attribute-defined-outside-init

"""Wraps a string of bytes that are the input to parsers."""
"""Wraps a string of bytes that are the input to parsers.
This class is directly used in :mod:`httpolice.framing1`,
and it encapsulates some state that is passed around there,
including complaints that are later "dumped" into the parsed message.
In most other cases, you probably want :func:`simple_parse` instead.
"""

_cache = OrderedDict()

Expand Down Expand Up @@ -683,7 +689,7 @@ def parse(self, target, to_eof=False, annotate_classes=None):
annotate_classes = tuple(annotate_classes or ())
key = None
if self._is_empty_state() and to_eof:
# Caching is really only useful
# Memoization is really only useful
# when we're parsing something small in its entirety,
# like a header value.
# The above ``if`` means that the cache won't get in our way
Expand Down Expand Up @@ -1078,7 +1084,10 @@ def _find_pivots(chart, symbol, start, stack=None):

def simple_parse(data, symbol, complain, fail_notice_id, annotate_classes=None,
**extra_context):
"""(Try to) parse an entire string as a single grammar symbol."""
"""(Try to) parse an entire string as a single grammar symbol.
This wraps :class:`Stream` in a simpler interface for the common case.
"""
stream = Stream(force_bytes(data))

try:
Expand Down
2 changes: 1 addition & 1 deletion httpolice/reports/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ def expand_header(hdr):

@singledispatch
def expand_error(error):
return [error]
return [error] # A single paragraph consisting of the error message.

@expand_error.register(ParseError)
def expand_parse_error(error):
Expand Down

0 comments on commit 42863a1

Please sign in to comment.