HTTP/2 streams prioritization #1196

krizhanovsky · 2019-02-23T22:20:59Z

Scope

According to RFC 7540 5.3 and RFC 9218 we should schedule or service from the cache requests according to their priority and dependency. Reprioritization also must be implemented. This issue must implement prioritization suitable for both the HTTP/2 and HTTP/3.

Note priority index structure discussed in #1233 (comment) . There are several approaches for priority algorithms and we need a fast O(1) scheduler:

H20 uses O(1) scheduler with WFQ
Cloudflare priority buckets (this is originally for HTTP/3, but it seems Cloudflare uses it also for HTTP/2)
zeroweight

#712 seems also require WFQ, so probably the algorithm should be generic enough to be applied for h2 messages and server queueing. Note that the priority tree is volatile and can be changed during a web page load and requesting new, higher priority, resources.

It seems to avoid TCP HoL blocking, we should move HTTP/2 fragmentation into TLS layer (just as we form TLS records by a TCP callbak). See Reorganizing Website Architecture for HTTP/2 and Beyond slides 15-20.

Need to address equal QoS (root stream prioritization) for different clients. See The ultimate guide to HTTP resource prioritization at 17:00.

Hooks/API for server-side prioritization (as an extension logic) should be implemented, see Better HTTP/2 Prioritization for a Faster Web.

Security

HTTP/2 dependency and priority feature is the subject for security violations, see chapter "Victim 3-Dependency and Priority" from HTTP/2: In-depth analysis of the top four flaws of the next generation web protocol by Imperva. Like nghttp2 we should limit dependency graph size by MAX_CONCURRENT_STREAMS and throw away old streams on reaching the limit.

Testing

The feature is relatively complex, so a functional test must be developed along with the feature. At least following must be tested:

try to build a dependency cycle and ensure that Tempesta blocks the attempt #2114
try to build too large dependency tree and ensure that Tempesta limits the used memory
send several low prio requests followed by serveral high prio and make sure that repriotization actually took place
enable h2spec tests.

Also the solution must be benchmarked against CloudFlare, H2O, HAProxy and Nginx implementations with ATF resources in mind (see Of the Utmost Importance:Resource Prioritization in HTTP/3 over QUIC for the methodology and references for benchmarking tools).

References

The text was updated successfully, but these errors were encountered:

aleksostapenko · 2019-07-02T20:22:47Z

Additional discussion regarding possible dependency/priority tree structure is in #1233 (comment).

krizhanovsky · 2023-03-11T12:08:35Z

TCP send queue and frames prioritization

At the moment Tempesta FW pushes all the HTTP/2 frames to the TCP send queue, just like it does for HTTP/1 pipelining, so a higher priority frame can be pushed to the queue long after many low priority frames in the head of the queue. During prioritization we need to reorder the queue. However, the queue reordering will introduce unnecessary overhead.

Generic considerations

Prioritization should be done in tcp_write_xmit() since streams priorities may change while we're processing the TCP send queue
Due to skb splitting no any skb can keep data from different streams (HTTP responses), so streams prioritization is equal to skbs prioritization
We do our best to minimize the kernel changes, however with QUIC from QUIC & HTTP/3 #724, which we're also going to upstream, streams handling should go to the kernel. With this task we should keep as much logic as possible in Tempesta code base and patch the kernel only in a minimal way, but it'd be good to make the streams foundation in the kernel for QUIC streams.

Proposal

ss_do_send() should call a new function instead of skb_entail() operating with the TCP write queue. The function should add an skb to a tree (or other data structure depending on the scheduling algorithm) of lists. Each list correponds to a stream (HTTP response).
ss_do_send() also should call a new function instead of tcp_push(), which does seems unnecessary for us tasks (e.g. URG data is never used with HTTP). The function is probably just a wrapper for __tcp_push_pending_frames().
tcp_write_xmit() should call for HTTP/2 sockets a Tempesta FW prioritization callback instead of tcp_send_head(). THe callback will run the scheduling algorithm and return an skb, which it decides to send now. @EvgeniiMekhanik does this callback can be used also in Flow contol and pull-model for message forwarding #1394 as the required second callback?

krizhanovsky · 2023-04-26T14:06:22Z

Comment from #1394 :

Doesn't block connection and continue to forward Headers frames while flow control limit is exhausted
Doesn't truncate and suspend any frames except DATA ones on reaching close to flow control limit.

Will be done later in the task of stream prioritization.

EvgeniiMekhanik · 2023-05-26T15:41:08Z

Also we should rework tfw_h2_sk_prepare_xmit function to T_FSM_STATE and make it more clear.
Also we should send all skbs, which are correspond to one frame during one loop iteration in tcp_write_xmit

EvgeniiMekhanik · 2023-07-11T15:25:05Z

First of all i think we should rework tcp_send_head like this.

tcp_send_head()
{
#ifdef CONFIG_SECURITY_TEMPESTA
if (sock_flag(sk, SOCK_TEMPESTA)) {
/* Use our data structure. */
} else {
/* Use traditional send queue. */
}
#else
#endif
}

tcp_send_head is called from several places in TCP code, so we should patch the whole function, not replace it in tcp_write_xmit.

Then we should check all places in linux kernel code, where tcp_send_head is called and patch them if it is required.

EvgeniiMekhanik · 2023-07-26T11:54:33Z

How it is done in nginx.

As i see prioritization model in nginx is very bad. ngnix builds stream dependency tree as it described in the RFC 7540, but uses it very bad. There is a main function for adding frame in the send queue.

static ngx_inline void
ngx_http_v2_queue_frame(ngx_http_v2_connection_t *h2c,
    ngx_http_v2_out_frame_t *frame)
{
    ngx_http_v2_out_frame_t  **out;

    for (out = &h2c->last_out; *out; out = &(*out)->next) {

        if ((*out)->blocked || (*out)->stream == NULL) {
            break;
        }

        if ((*out)->stream->node->rank < frame->stream->node->rank
            || ((*out)->stream->node->rank == frame->stream->node->rank
                && (*out)->stream->node->rel_weight
                   >= frame->stream->node->rel_weight))
        {
            break;
        }
    }

    frame->next = *out;
    *out = frame;
}

As we can see frame is inserted in the sending list according to rank (level in the tree) of the stream and weight. But it is not corresponds to RFC (we should not send data for streams which depends on other streams) and in theory we can have O(N) here if each next frame is more priority than previous one.

How it is done in h2o

Prioritization model in h2o is much more sophisticated. There is a tree of streams (each level of this tree contains array with 64 entries). Stream is inserted in this tree according to it's dependency and its weight. Also there is a bit array to fast find the most priority stream.

Priority struct suggestion

If we can use external code we can get h2o realization for our purpose or we can use Weighted Fair Queue from the WFQ. We should also use tree of such WFQ to take stream dependency into account. Also there are two different RFC 7540 5.3 and RFC 9218 which require different approach. All described below applies only to RFC7540 and i think that we should rely on this RFC now.

Prioritization implemenation.

According to previous investigation i see that h2o use tcp_info to calculate available CWND.

*mss = tcpi.tcpi_snd_mss;
*cwnd_size = tcpi.tcpi_snd_cwnd;
*cwnd_avail = tcpi.tcpi_snd_cwnd > tcpi.tcpi_unacked ? tcpi.tcpi_snd_cwnd - tcpi.tcpi_unacked + 2 : 2;

and then

sock->_latency_optimization.suggested_tls_payload_size = calc_suggested_tls_payload_size(sock, mss);
sock->_latency_optimization.suggested_write_size =
            cwnd_avail * (size_t)sock->_latency_optimization.suggested_tls_payload_size;

We should work same. First of all i think that since we deal with stream prioritization and add streams in the special tree, we should add skb_head pointer to the struct stream. Then in ss_do_send we should add all skbs to the stream (that is same as add them to priority structure) and call wrapper to __tcp_push_pending_frames. I think that we can add additional callback instread of sk_prepare_xmit at the beginning of tcp_write_xmit which should add skbs with already made frames from our priority tree to sk_write_queue according to current cwnd_avail. Also I think that we can make encryption here also using mss as a limit. It seems that we should not have any problems in this case: there are a lot of places in TCP code where sk_write_queue is used. But if there is no skb in this queue there is no problem! Later in when ack will come, tcp_data_snd_check will be called and new __tcp_push_pending_frames will be called again.

Why this proposal is better then previous one

If we change tcp_send_head in tcp_write_xmit to our own callback, we will have several problems. First of all there can be a chance that there are some skbs in the TP write queue. If we push skb from our priority tree to network we should update all this skbs. Also we call tcp_write_xmit in the loop, so we push all pending skbs, but we should push only skbs == available CWND not more. If we move skbs from our priority tree to the TCP write queue at the beginning of tcp_write_xmit we don't break anything.

We should discuss about what type of RFC we should support.

EvgeniiMekhanik · 2023-08-11T15:46:22Z

We decide to support RFC 9218, since RFC 7540 is deprecated and there is a big chance that all modern clients will not support it soon. In the RFC 9218 there are two priority parameters urgency and incremental:

The urgency (u) parameter is an integer value between 0 and 7 inclusive, in descending order of priority.
The incremental (i) parameter value is boolean. It indicates if an HTTP response can be processed incrementally, i.e., provide some meaningful output as chunks of the response arrive. If a client makes concurrent requests with the incremental parameter set to false, there is no benefit in serving responses with the same urgency concurrently because the client is not going to process those responses incrementally.
I think we should implement array (with size 7) of lists of streams. Each array entry contains TWO list head. First list head is a head of list of incremental streams and second is a head of non-incremental streams. New stream with appropriate urgency is added to appropriate list at the end of the list. All requests served in the order of it's urgency, we don't send requests with urgency 1 while there is any data to send for requests with urgency 0. Non incremental requests served one by one, without bandwidth sharing, but we share bandwidth between incremental and non incremental requests.

krizhanovsky · 2023-08-14T16:13:30Z

Could you please check RFC 9218 support for HTTP/2 in nghttp2, Chrome and Firefox? With "check" I mean grepping source code and checking the issues, if they open (e.g. https://bugs.chromium.org/p/chromium/issues/detail?id=1404785)

EvgeniiMekhanik · 2023-08-15T11:15:28Z

I check RFC 9218 support in several projects:

nghttp2 has RFC 9218 support
Apache has RFC 9218 support
Chromium and Firefox has RFC 9218 support ONLY for HTTP3. (https://calendar.perfplanet.com/2022/http-3-prioritization-demystified/)

krizhanovsky added the enhancement label Feb 23, 2019

krizhanovsky added this to the 0.8 TLS 1.3 milestone Feb 23, 2019

krizhanovsky mentioned this issue Feb 23, 2019

HTTP/2 #309

Closed

7 tasks

krizhanovsky modified the milestones: 0.8 TLS 1.3 & HTTP/2, 0.9 TDBv0.2 Mar 9, 2019

aleksostapenko mentioned this issue Apr 26, 2019

HTTP/2 Framing layer implementation (#309). #1233

Merged

krizhanovsky modified the milestones: 0.9 TDBv0.2 - Beta, 0.8 TLS 1.3 Aug 9, 2019

krizhanovsky added the h2 label Mar 5, 2020

krizhanovsky mentioned this issue Mar 12, 2020

Fragmentation for h2 messages #1378

Closed

4 tasks

vankoven mentioned this issue Mar 25, 2020

Flow contol and pull-model for message forwarding #1394

Closed

7 tasks

krizhanovsky mentioned this issue Mar 28, 2020

HTTP QoS for asymmetric DDoS mitigation #488

Open

krizhanovsky modified the milestones: 0.8 TLS 1.3 & TDBv0.2, 0.9 QUIC & Advanced DDoS mitigation - Beta Apr 17, 2020

krizhanovsky mentioned this issue May 27, 2020

Do not generate static responses #163

Closed

krizhanovsky modified the milestones: 0.9 - TDB, 1.2 TBD, 1.1 - TLS 1.3 Jan 3, 2022

krizhanovsky modified the milestones: 1.1 - TLS 1.3, 0.8 - HTTP normalization Jun 23, 2022

krizhanovsky added the performance label Feb 4, 2023

krizhanovsky added the crucial label Mar 10, 2023

krizhanovsky modified the milestones: 1.x: TBD, 0.8 - Beta Apr 19, 2023

EvgeniiMekhanik mentioned this issue Apr 26, 2023

HTTP/2: Add more tests for flow control tempesta-tech/tempesta-test#444

Open

4 tasks

krizhanovsky assigned EvgeniiMekhanik May 10, 2023

RomanBelozerov mentioned this issue Apr 16, 2024

http2: tests for CVE-2019-9512/9517 tempesta-tech/tempesta-test#612

Open

6 tasks

kingluo mentioned this issue Apr 30, 2024

Mekhanik evgenii/1196 itog 2 #1973

Open

krizhanovsky mentioned this issue May 4, 2024

Mekhanik evgenii/1196 itog 2 tempesta-tech/tempesta-test#515

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTP/2 streams prioritization #1196

HTTP/2 streams prioritization #1196

krizhanovsky commented Feb 23, 2019 •

edited

aleksostapenko commented Jul 2, 2019

krizhanovsky commented Mar 11, 2023

krizhanovsky commented Apr 26, 2023 •

edited

EvgeniiMekhanik commented May 26, 2023

EvgeniiMekhanik commented Jul 11, 2023

EvgeniiMekhanik commented Jul 26, 2023 •

edited

EvgeniiMekhanik commented Aug 11, 2023

krizhanovsky commented Aug 14, 2023

EvgeniiMekhanik commented Aug 15, 2023

HTTP/2 streams prioritization #1196

HTTP/2 streams prioritization #1196

Comments

krizhanovsky commented Feb 23, 2019 • edited

Scope

Security

Testing

References

aleksostapenko commented Jul 2, 2019

krizhanovsky commented Mar 11, 2023

TCP send queue and frames prioritization

Generic considerations

Proposal

krizhanovsky commented Apr 26, 2023 • edited

EvgeniiMekhanik commented May 26, 2023

EvgeniiMekhanik commented Jul 11, 2023

EvgeniiMekhanik commented Jul 26, 2023 • edited

How it is done in nginx.

How it is done in h2o

Priority struct suggestion

Prioritization implemenation.

Why this proposal is better then previous one

EvgeniiMekhanik commented Aug 11, 2023

krizhanovsky commented Aug 14, 2023

EvgeniiMekhanik commented Aug 15, 2023

krizhanovsky commented Feb 23, 2019 •

edited

krizhanovsky commented Apr 26, 2023 •

edited

EvgeniiMekhanik commented Jul 26, 2023 •

edited