HTTP/2 (reopened #694) #1176

krizhanovsky · 2019-02-02T19:50:11Z

Just need to review the code and understand which changes/extensions/fixes are required for #309. After that the branch can be closed and the code can be recommited to a new branch. The PR is just for documentation what is still for TODO.

…(for encoders)

…r's code (to simplify support & debugging) and some bugfixes

…tion between functions

krizhanovsky

HPACK only is implemented, the whole RFC 7540 is left intact. The code doesn't correspond to our coding style and depends on extra libs (hash, bitops and so on). Next, it's developed as a standalone library ignoring current HTTP parser logic, so it requires plenty of data copings and multiple data processing. All in all I'm for rewriting the code from scratch.

tempesta_fw/tls.c

tempesta_fw/http2/errors.h

tempesta_fw/http2/hstatic.h

tempesta_fw/http2/common.h

krizhanovsky · 2019-02-17T13:11:14Z

tempesta_fw/http2/t/hpack_test.c

+				}
+				hpack_set_window(hp, window);
+			}
+			fields = hpack_decode(hp, &in, length, &out, &rc);


hpack_encode() must be tested as well. There are copy & paste among the tests - please separate the common logic - this should make the tests simpler, current large functions are hard to read.

krizhanovsky · 2019-02-17T13:40:36Z

tempesta_fw/http2/hindex.c

+			vp = hpack_hash_add(ht, vp, NULL);
+		}
+		Hash_Add(hp, static_table + i, static_table + i);
+		Hash_Add(hn, np, static_table + i);


Something wrong is happening here - there is no reason to keep static entries in the hash table with collision chains and so on: see https://github.com/nghttp2/nghttp2/blob/master/lib/nghttp2_hd.c#L120 - with our HTTP parser approach we can make the lookup even faster.

In general, HPACK must not be a bunch of separate calls. We do some parsing work in our parser and we should reuse it. Also the hash tables, especially with the collision chains, aren't the most efficient data structure for HPACK. https://www.mew.org/~kazu/doc/paper/hpack-2017.pdf and at least https://github.com/nghttp2/nghttp2/ must be learnt, but H2O and Nginx implementations are also god to learn.

We parse an HTTP header in chunks and there are parser states processing particular pieces of an HTTP header. E.g. the parser matches Accept: and moves to Req_HdrAcceptV state: accept is a header from static table, so we know the static table index right now and can set it in code. There is even no need to use some static table, like static_table.

Dynamic tables are trickier. Typically such headers are treated by RGEN_HDR_OTHER(). For the example lets suppose that Accept, Accept-Range, Accept-Encoding and Referer are all treated in dynamic table. Consider that the first 4 headers are stored in the dynamic table and currently we're parising Referer.

at particular point of time and data chunk we may have only part of the header, e.g. Ref, and still want to make progress in the table lookup - it's much better rather than to assemble a header and next traverse its chunks for the table lookup.

we need a memory limit and efficient prunning of old entries.

So I propose following data structure, essentially a binary tree, placed in contigous memory area (typically a page) and algorithm.

Firstly, store a strings with prefix length at the begin of the page. We need at least two pointers root - current tree root and gc_next - a next pointer to an entry for garbage collection. Both of them are 0 initially. So we start from Accept string at the begin of the page:

6 Accept _ _ ^ | root | gc_next

The two _ immidiately after the string are offsets for next nodes, empty (zero) now. We can place other necessary infomation with them (such as the string index in the table).

If we store Accept-Encoding and Accept-Range, then the picture becomes:

6 Accept _ _ 15 Accept-Encoding 0 28 12 Accept-Range _ _ ^ ^ | | gc_next root

Now the roos is pointing to Accept-Encoding. As in basic binary tree Accept is lower that Accept-Encoding, so it's the left node (offset 0, and Accept-Range` is greater and it's the right node (offset 28 - where the record begins within the page).

Accept is the first and oldest item in the page, so we should begin eviction from it:

scan the tree, starting from the root, for current gc_next index (currently 0). Use the strings comparison for descending right and left;

move gc_next to next record, so it's index becomes 9 (suppose the strings length and each offset consumes 1 byte - it must be bigger to address 4096 bytes however)

perform classic tree rotation, e.g. RB-tree, for the delete element.

Now we have:

_ ______ _ _ 15 Accept-Encoding 0 28 12 Accept-Range _ _ ^ | gc_next | root

If we at some parser state have Ref then we can look the tree and find that there is no such entry. In this case we don't win anything. If we match Accept-R then we start ange matching from the second node, so average matched strings are shorter and we use each header sting only once, during the parsing, instead of scanning it second time on calling hpack routines.

The algorithm is the only what cames to mind first, but it's better than the hashes from speed and memory considerations. I appreciate if you can propose a better algorithm.

Please write a benchmark aggregated with the unit test and compare results with nghttp2 performance. We'll publish the results.

Agree that approach with static hash table is overcomplicated. But with binary search tree - disturb the necessity of full string comparisons on each node. I'll think deeper about proposed structure. Also some kind of prefix tree may be a suitable variant here (will think in this direction too).

I came to the idea exactly from patricia tree. The point is that in the example above we can, and probably, should treat the stings as long integers and compare longs instead of strings. We can set the most significant bit for the pointers and string lengths to guarantee that if we try to match 8-byte integer with say 3-byte string the set most significant bit in 4th byte will make the difference. Next idea is that we have 32-byte SIMD registers and, if we place the strings in the array with known offsets, we can load several string prefixes and match them concurrently. I have no read to implement concept, but there are several opportunities to couple fast search and efficient insert and eviction.

Using 4KB as the default size for the dynamic table (and I believe for now we shouldn't negotiate to change it) we must count number of stored items with RFC 7541 4.1 statement about 32 as the record overhead. We may have different overhead in the resulting data structure and that's fine to have actually more than 1 page for table.

krizhanovsky · 2019-02-17T22:10:20Z

tempesta_fw/http2/hgen/hgen.c

@@ -0,0 +1,372 @@
+/**


The table generation is up to you, but I'd use Perl script called on the module build time and automatically generate the table. Meantime, I'd leave template file hfcode.h in the source tree with the variables definitions and template arguments to make ctags work, e.g.

typedef struct { int16_t symbol; uint32_t code; uint8_t length; } hcode; static hcode source[] = { /* hgen generated: source */ };

So in Perl we can find the template arguments like /* hgen generated: source */ in the file and make a substitution. This way we avoid large source code and will have it nice - there is no big deal to update a file comment or rename a variable.

krizhanovsky · 2019-02-18T18:05:11Z

tempesta_fw/http2/hpack_encoder.c

+	HTTP2Index *const __restrict ip = hp->dynamic;
+	unsigned int rc;
+	unsigned int k;
+	uchar *__restrict dst = buffer_open(out, &k, 0);


The initial aim for HPACK was to make it zero-copy, write compressed data in-place. It was stated that Huffman may produce strings large than the initial and the buffer stuff was introduced. Regardless the complicated buffers management, the encoding naively uses data copies.

If we need to write an indexed header without Huffman encoding, then it's trivial to estimate whether encoded string is larger than the original header string (it seems never actually).

Next, Huffman encoding is doubtful. E.g. RFC 7541 C.4.3. encodes custom-key (10 bytes) and custom-value (12 bytes) into 8 and 9 bytes strings, i.e. around 20% space savings, which is 5 bytes. Compression makes sense only to reduce number of network packets (in MTUs), so tens bytes don't make any sense. To get some benefits we should encode large strings, which for HTTP responses (we don't encode requests with large URI and User-Agent), but Cookies are bad in compression. Even total headers compression gives only about 1.4% of traffic savings on responses.

Huffman encoding is clearly heavy operation, especially with possible memory reallocation and copyings on our and client sides, and it doesn't make sense to implement it just to save some bytes on network transmission which most likely doesn't reduce number of network packets and just increases CPU consumption on both the sides.

krizhanovsky · 2019-02-18T22:20:01Z

tempesta_fw/http2/huffman.c

+#endif
+
+unsigned int
+huffman_decode(const char *__restrict source, char *__restrict dst, uintptr_t n)


The algorithm looks good: ht_decode is relatively small (about 4*848 bytes, which is smaller that 4 pages of huff_decode_table and handles the most probable 1/2 of characters in single step. The algorithms is somewhat similar to http://fastcompression.blogspot.com/2015/07/huffman-revisited-part-3-depth-limited.html

A good readings about fast Huffman decoders:
https://www.ifi.uzh.ch/dam/jcr:ffffffff-82b7-d340-0000-000026dd8fb6/Prefix.pdf
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.93.9447&rep=rep1&type=pdf
http://fastcompression.blogspot.com/2015/07/huffman-revisited-part-1.html
http://fastcompression.blogspot.com/2015/07/huffman-revisited-part-2-decoder.html
http://fastcompression.blogspot.com/2015/07/huffman-revisited-part-3-depth-limited.html
http://fastcompression.blogspot.com/2015/10/huffman-revisited-part-4-multi-bytes.html
http://fastcompression.blogspot.com/2015/10/huffman-revisited-part-5-combining.html

UPD Perfomance results from @sysprg:

nginx time = 3.719 nghttp2 time = 4.844 our time = 3.187

aleksostapenko · 2019-02-20T13:42:08Z

tempesta_fw/http2/hpack_decoder.c

+
+HTTP2Field *
+hpack_decode(HPack * __restrict hp,
+	     HTTP2Input * __restrict source,


It seems that 'source' must me passed here with already initialized (i.e. parsed) TfwStr (as underlying instance for HTTP2Input), but considering HTTP/2 logic - we must initially decode HTTP/2 message and only after that pass it to HTTP parser, where TfwStr instances will be initialized.

aleksostapenko · 2019-02-20T13:45:07Z

tempesta_fw/http2/hpack_decoder.c

+						buffer_close(source, m);
+						hrc =
+						    huffman_decode_fragments
+						    (source, buffer, length);


If I correctly understand, 'HTTP2Output *buffer' is intended for storing the result of HPACK decoding process; regarding the integration with Tempesta FW code it seems that HTTP2Input and HTTP2Output buffers should be reworked in a way that one resulting buffer should remain (as descriptor of decoded output) and, possibly, incorporate with TfwStr descriptor.

Yes, agree. The implementation is done as a standalone library with some interfaces: the library does some processing on some data which was prepared by some other layer and pass the result to another layer - different logic is ran against different representations of the same data. But it's more efficient to make all the layers to know about each other and run all the processing logic at once on the same data chunk.

This is ideology for QUIC design and Tempesta FW is built on the same principles (we interbreed TCP, TLS, and HTTP) and this makes me think that we can natively integrate QUIC into our design.

aleksostapenko · 2019-02-20T13:47:54Z

tempesta_fw/http2/hpack_decoder.c

+								    index,
+								    state,
+								    buffer);
+#endif


I cannot find the static or dynamic table searching during 'hpack_decode()' procedure - only adding indexes into dynamic table.

aleksostapenko · 2019-02-20T13:52:48Z

tempesta_fw/http2/buffers.c

+/* it does not used by buffer_get() call: */
+	p->tail = 0;
+	p->offset = 0;
+}


One more string descriptor (HTTP2Input) over the already existing one (TfwStr) seems too complicated; maybe it is worth to extend TfwStr and apply this logic for it or make separate descriptor (see comment).

The interfaces are really complicated (seems in aiming to reach generic API for the library). Frankly, I didn't pay enough attention in review of this part of code because it's clear there shouldn't be such data structures.

…es (common.h, rotate.h, netconv.h)

vankoven · 2019-12-30T13:40:30Z

Most of the changes from this PR was reworked and integrated into #1368, so the PR can be closed now.

sysprg added 12 commits March 6, 2017 15:05

First version of the HTTP/2 implementation

47b83f8

Updated version with unit tests and makefiles

1e29e01

Updated version with many bugfixes and aligment added to the buffers …

fe8f582

…(for encoders)

Current progress in the HTTP/2 implementations: migration to hash tables

67b3d48

killed windows-related lines in makefiles

eb8b7c2

Re-use of the empty output buffers, unification of the Huffman decode…

1df6d77

…r's code (to simplify support & debugging) and some bugfixes

Better error handling on the buffers layer, bugfixes and code unifica…

92d302f

…tion between functions

Many bugfixes after debug

c7d75e6

Code style fixed

4db1607

Merging the lastest HTTP/2 code with the head revision

8de3cd0

Merge branch 'master' into httpv2

6617aaa

check APLN protocol negotiation after handshake

f1d62c2

krizhanovsky assigned krizhanovsky and aleksostapenko Feb 2, 2019

krizhanovsky requested a review from aleksostapenko February 2, 2019 19:50

krizhanovsky changed the title ~~HTP/2 (reopened #694)~~ HTTP/2 (reopened #694) Feb 3, 2019

krizhanovsky mentioned this pull request Feb 4, 2019

HTTP/2 #309

Closed

7 tasks

krizhanovsky commented Feb 18, 2019

View reviewed changes

aleksostapenko reviewed Feb 20, 2019

View reviewed changes

Removing of cross-platform code and dependencies from external includ…

5dba52a

…es (common.h, rotate.h, netconv.h)

sysprg force-pushed the httpv2 branch from f4f78d6 to 5dba52a Compare June 7, 2019 15:24

krizhanovsky mentioned this pull request Sep 2, 2019

HTTP/2 HPACK layer implementation (#309). #1338

Merged

krizhanovsky closed this Dec 30, 2019

krizhanovsky deleted the httpv2 branch December 30, 2019 23:04

krizhanovsky mentioned this pull request Aug 14, 2020

HTTP normalization #2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTP/2 (reopened #694) #1176

HTTP/2 (reopened #694) #1176

krizhanovsky commented Feb 2, 2019

krizhanovsky left a comment

krizhanovsky Feb 17, 2019

krizhanovsky Feb 17, 2019

aleksostapenko Feb 20, 2019

krizhanovsky Feb 20, 2019 •

edited

Loading

krizhanovsky Feb 17, 2019

krizhanovsky Feb 18, 2019

krizhanovsky Feb 18, 2019 •

edited

Loading

aleksostapenko Feb 20, 2019

aleksostapenko Feb 20, 2019

krizhanovsky Feb 20, 2019

aleksostapenko Feb 20, 2019

aleksostapenko Feb 20, 2019 •

edited

Loading

krizhanovsky Feb 20, 2019

vankoven commented Dec 30, 2019

HTTP/2 (reopened #694) #1176

HTTP/2 (reopened #694) #1176

Conversation

krizhanovsky commented Feb 2, 2019

krizhanovsky left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krizhanovsky Feb 20, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krizhanovsky Feb 18, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aleksostapenko Feb 20, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vankoven commented Dec 30, 2019

krizhanovsky Feb 20, 2019 •

edited

Loading

krizhanovsky Feb 18, 2019 •

edited

Loading

aleksostapenko Feb 20, 2019 •

edited

Loading