Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP/2 (reopened #694) #1176

Closed
wants to merge 13 commits into from
Closed

HTTP/2 (reopened #694) #1176

wants to merge 13 commits into from

Conversation

krizhanovsky
Copy link
Contributor

Just need to review the code and understand which changes/extensions/fixes are required for #309. After that the branch can be closed and the code can be recommited to a new branch. The PR is just for documentation what is still for TODO.

@krizhanovsky krizhanovsky changed the title HTP/2 (reopened #694) HTTP/2 (reopened #694) Feb 3, 2019
@krizhanovsky krizhanovsky mentioned this pull request Feb 4, 2019
7 tasks
Copy link
Contributor Author

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HPACK only is implemented, the whole RFC 7540 is left intact. The code doesn't correspond to our coding style and depends on extra libs (hash, bitops and so on). Next, it's developed as a standalone library ignoring current HTTP parser logic, so it requires plenty of data copings and multiple data processing. All in all I'm for rewriting the code from scratch.

tempesta_fw/tls.c Show resolved Hide resolved
tempesta_fw/tls.c Show resolved Hide resolved
tempesta_fw/http2/errors.h Show resolved Hide resolved
tempesta_fw/http2/hstatic.h Show resolved Hide resolved
tempesta_fw/http2/common.h Outdated Show resolved Hide resolved
}
hpack_set_window(hp, window);
}
fields = hpack_decode(hp, &in, length, &out, &rc);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hpack_encode() must be tested as well. There are copy & paste among the tests - please separate the common logic - this should make the tests simpler, current large functions are hard to read.

vp = hpack_hash_add(ht, vp, NULL);
}
Hash_Add(hp, static_table + i, static_table + i);
Hash_Add(hn, np, static_table + i);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something wrong is happening here - there is no reason to keep static entries in the hash table with collision chains and so on: see https://github.com/nghttp2/nghttp2/blob/master/lib/nghttp2_hd.c#L120 - with our HTTP parser approach we can make the lookup even faster.

In general, HPACK must not be a bunch of separate calls. We do some parsing work in our parser and we should reuse it. Also the hash tables, especially with the collision chains, aren't the most efficient data structure for HPACK. https://www.mew.org/~kazu/doc/paper/hpack-2017.pdf and at least https://github.com/nghttp2/nghttp2/ must be learnt, but H2O and Nginx implementations are also god to learn.

We parse an HTTP header in chunks and there are parser states processing particular pieces of an HTTP header. E.g. the parser matches Accept: and moves to Req_HdrAcceptV state: accept is a header from static table, so we know the static table index right now and can set it in code. There is even no need to use some static table, like static_table.

Dynamic tables are trickier. Typically such headers are treated by RGEN_HDR_OTHER(). For the example lets suppose that Accept, Accept-Range, Accept-Encoding and Referer are all treated in dynamic table. Consider that the first 4 headers are stored in the dynamic table and currently we're parising Referer.

  1. at particular point of time and data chunk we may have only part of the header, e.g. Ref, and still want to make progress in the table lookup - it's much better rather than to assemble a header and next traverse its chunks for the table lookup.
  2. we need a memory limit and efficient prunning of old entries.

So I propose following data structure, essentially a binary tree, placed in contigous memory area (typically a page) and algorithm.

Firstly, store a strings with prefix length at the begin of the page. We need at least two pointers root - current tree root and gc_next - a next pointer to an entry for garbage collection. Both of them are 0 initially. So we start from Accept string at the begin of the page:

6 Accept _ _
^
|
root | gc_next

The two _ immidiately after the string are offsets for next nodes, empty (zero) now. We can place other necessary infomation with them (such as the string index in the table).

If we store Accept-Encoding and Accept-Range, then the picture becomes:

6 Accept _ _ 15 Accept-Encoding  0 28 12 Accept-Range _ _
^             ^
|             |
gc_next       root

Now the roos is pointing to Accept-Encoding. As in basic binary tree Accept is lower that Accept-Encoding, so it's the left node (offset 0, and Accept-Range` is greater and it's the right node (offset 28 - where the record begins within the page).

Accept is the first and oldest item in the page, so we should begin eviction from it:

  1. scan the tree, starting from the root, for current gc_next index (currently 0). Use the strings comparison for descending right and left;
  2. move gc_next to next record, so it's index becomes 9 (suppose the strings length and each offset consumes 1 byte - it must be bigger to address 4096 bytes however)
  3. perform classic tree rotation, e.g. RB-tree, for the delete element.

Now we have:

_ ______ _ _ 15 Accept-Encoding  0 28 12 Accept-Range _ _
              ^
              |
      gc_next | root

If we at some parser state have Ref then we can look the tree and find that there is no such entry. In this case we don't win anything. If we match Accept-R then we start ange matching from the second node, so average matched strings are shorter and we use each header sting only once, during the parsing, instead of scanning it second time on calling hpack routines.

The algorithm is the only what cames to mind first, but it's better than the hashes from speed and memory considerations. I appreciate if you can propose a better algorithm.

Please write a benchmark aggregated with the unit test and compare results with nghttp2 performance. We'll publish the results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that approach with static hash table is overcomplicated. But with binary search tree - disturb the necessity of full string comparisons on each node. I'll think deeper about proposed structure. Also some kind of prefix tree may be a suitable variant here (will think in this direction too).

Copy link
Contributor Author

@krizhanovsky krizhanovsky Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came to the idea exactly from patricia tree. The point is that in the example above we can, and probably, should treat the stings as long integers and compare longs instead of strings. We can set the most significant bit for the pointers and string lengths to guarantee that if we try to match 8-byte integer with say 3-byte string the set most significant bit in 4th byte will make the difference. Next idea is that we have 32-byte SIMD registers and, if we place the strings in the array with known offsets, we can load several string prefixes and match them concurrently. I have no read to implement concept, but there are several opportunities to couple fast search and efficient insert and eviction.

Using 4KB as the default size for the dynamic table (and I believe for now we shouldn't negotiate to change it) we must count number of stored items with RFC 7541 4.1 statement about 32 as the record overhead. We may have different overhead in the resulting data structure and that's fine to have actually more than 1 page for table.

@@ -0,0 +1,372 @@
/**
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table generation is up to you, but I'd use Perl script called on the module build time and automatically generate the table. Meantime, I'd leave template file hfcode.h in the source tree with the variables definitions and template arguments to make ctags work, e.g.

typedef struct {
        int16_t symbol;
        uint32_t code;
        uint8_t length;
} hcode;

static hcode source[] = {
    /* hgen generated: source */
};

So in Perl we can find the template arguments like /* hgen generated: source */ in the file and make a substitution. This way we avoid large source code and will have it nice - there is no big deal to update a file comment or rename a variable.

HTTP2Index *const __restrict ip = hp->dynamic;
unsigned int rc;
unsigned int k;
uchar *__restrict dst = buffer_open(out, &k, 0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial aim for HPACK was to make it zero-copy, write compressed data in-place. It was stated that Huffman may produce strings large than the initial and the buffer stuff was introduced. Regardless the complicated buffers management, the encoding naively uses data copies.

If we need to write an indexed header without Huffman encoding, then it's trivial to estimate whether encoded string is larger than the original header string (it seems never actually).

Next, Huffman encoding is doubtful. E.g. RFC 7541 C.4.3. encodes custom-key (10 bytes) and custom-value (12 bytes) into 8 and 9 bytes strings, i.e. around 20% space savings, which is 5 bytes. Compression makes sense only to reduce number of network packets (in MTUs), so tens bytes don't make any sense. To get some benefits we should encode large strings, which for HTTP responses (we don't encode requests with large URI and User-Agent), but Cookies are bad in compression. Even total headers compression gives only about 1.4% of traffic savings on responses.

Huffman encoding is clearly heavy operation, especially with possible memory reallocation and copyings on our and client sides, and it doesn't make sense to implement it just to save some bytes on network transmission which most likely doesn't reduce number of network packets and just increases CPU consumption on both the sides.

#endif

unsigned int
huffman_decode(const char *__restrict source, char *__restrict dst, uintptr_t n)
Copy link
Contributor Author

@krizhanovsky krizhanovsky Feb 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


HTTP2Field *
hpack_decode(HPack * __restrict hp,
HTTP2Input * __restrict source,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that 'source' must me passed here with already initialized (i.e. parsed) TfwStr (as underlying instance for HTTP2Input), but considering HTTP/2 logic - we must initially decode HTTP/2 message and only after that pass it to HTTP parser, where TfwStr instances will be initialized.

buffer_close(source, m);
hrc =
huffman_decode_fragments
(source, buffer, length);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I correctly understand, 'HTTP2Output *buffer' is intended for storing the result of HPACK decoding process; regarding the integration with Tempesta FW code it seems that HTTP2Input and HTTP2Output buffers should be reworked in a way that one resulting buffer should remain (as descriptor of decoded output) and, possibly, incorporate with TfwStr descriptor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agree. The implementation is done as a standalone library with some interfaces: the library does some processing on some data which was prepared by some other layer and pass the result to another layer - different logic is ran against different representations of the same data. But it's more efficient to make all the layers to know about each other and run all the processing logic at once on the same data chunk.

This is ideology for QUIC design and Tempesta FW is built on the same principles (we interbreed TCP, TLS, and HTTP) and this makes me think that we can natively integrate QUIC into our design.

index,
state,
buffer);
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot find the static or dynamic table searching during 'hpack_decode()' procedure - only adding indexes into dynamic table.

/* it does not used by buffer_get() call: */
p->tail = 0;
p->offset = 0;
}
Copy link
Contributor

@aleksostapenko aleksostapenko Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more string descriptor (HTTP2Input) over the already existing one (TfwStr) seems too complicated; maybe it is worth to extend TfwStr and apply this logic for it or make separate descriptor (see comment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interfaces are really complicated (seems in aiming to reach generic API for the library). Frankly, I didn't pay enough attention in review of this part of code because it's clear there shouldn't be such data structures.

@vankoven
Copy link
Contributor

Most of the changes from this PR was reworked and integrated into #1368, so the PR can be closed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants