Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP/2 streams prioritization #1196

Open
4 tasks done
krizhanovsky opened this issue Feb 23, 2019 · 9 comments
Open
4 tasks done

HTTP/2 streams prioritization #1196

krizhanovsky opened this issue Feb 23, 2019 · 9 comments

Comments

@krizhanovsky
Copy link
Contributor

krizhanovsky commented Feb 23, 2019

Scope

According to RFC 7540 5.3 and RFC 9218 we should schedule or service from the cache requests according to their priority and dependency. Reprioritization also must be implemented. This issue must implement prioritization suitable for both the HTTP/2 and HTTP/3.

Note priority index structure discussed in #1233 (comment) . There are several approaches for priority algorithms and we need a fast O(1) scheduler:

#712 seems also require WFQ, so probably the algorithm should be generic enough to be applied for h2 messages and server queueing. Note that the priority tree is volatile and can be changed during a web page load and requesting new, higher priority, resources.

It seems to avoid TCP HoL blocking, we should move HTTP/2 fragmentation into TLS layer (just as we form TLS records by a TCP callbak). See Reorganizing Website Architecture for HTTP/2 and Beyond slides 15-20.

Need to address equal QoS (root stream prioritization) for different clients. See The ultimate guide to HTTP resource prioritization at 17:00.

Hooks/API for server-side prioritization (as an extension logic) should be implemented, see Better HTTP/2 Prioritization for a Faster Web.

Security

HTTP/2 dependency and priority feature is the subject for security violations, see chapter "Victim 3-Dependency and Priority" from HTTP/2: In-depth analysis of the top four flaws of the next generation web protocol by Imperva. Like nghttp2 we should limit dependency graph size by MAX_CONCURRENT_STREAMS and throw away old streams on reaching the limit.

Testing

The feature is relatively complex, so a functional test must be developed along with the feature. At least following must be tested:

Also the solution must be benchmarked against CloudFlare, H2O, HAProxy and Nginx implementations with ATF resources in mind (see Of the Utmost Importance:Resource Prioritization in HTTP/3 over QUIC for the methodology and references for benchmarking tools).

References

@aleksostapenko
Copy link
Contributor

Additional discussion regarding possible dependency/priority tree structure is in #1233 (comment).

@krizhanovsky
Copy link
Contributor Author

TCP send queue and frames prioritization

At the moment Tempesta FW pushes all the HTTP/2 frames to the TCP send queue, just like it does for HTTP/1 pipelining, so a higher priority frame can be pushed to the queue long after many low priority frames in the head of the queue. During prioritization we need to reorder the queue. However, the queue reordering will introduce unnecessary overhead.

Generic considerations

  1. Prioritization should be done in tcp_write_xmit() since streams priorities may change while we're processing the TCP send queue
  2. Due to skb splitting no any skb can keep data from different streams (HTTP responses), so streams prioritization is equal to skbs prioritization
  3. We do our best to minimize the kernel changes, however with QUIC from QUIC & HTTP/3 #724, which we're also going to upstream, streams handling should go to the kernel. With this task we should keep as much logic as possible in Tempesta code base and patch the kernel only in a minimal way, but it'd be good to make the streams foundation in the kernel for QUIC streams.

Proposal

  1. ss_do_send() should call a new function instead of skb_entail() operating with the TCP write queue. The function should add an skb to a tree (or other data structure depending on the scheduling algorithm) of lists. Each list correponds to a stream (HTTP response).
  2. ss_do_send() also should call a new function instead of tcp_push(), which does seems unnecessary for us tasks (e.g. URG data is never used with HTTP). The function is probably just a wrapper for __tcp_push_pending_frames().
  3. tcp_write_xmit() should call for HTTP/2 sockets a Tempesta FW prioritization callback instead of tcp_send_head(). THe callback will run the scheduling algorithm and return an skb, which it decides to send now. @EvgeniiMekhanik does this callback can be used also in Flow contol and pull-model for message forwarding #1394 as the required second callback?

@krizhanovsky
Copy link
Contributor Author

krizhanovsky commented Apr 26, 2023

Comment from #1394 :

Doesn't block connection and continue to forward Headers frames while flow control limit is exhausted
Doesn't truncate and suspend any frames except DATA ones on reaching close to flow control limit.

Will be done later in the task of stream prioritization.

@EvgeniiMekhanik
Copy link
Contributor

Also we should rework tfw_h2_sk_prepare_xmit function to T_FSM_STATE and make it more clear.
Also we should send all skbs, which are correspond to one frame during one loop iteration in tcp_write_xmit

@EvgeniiMekhanik
Copy link
Contributor

First of all i think we should rework tcp_send_head like this.

tcp_send_head()
{
#ifdef CONFIG_SECURITY_TEMPESTA
if (sock_flag(sk, SOCK_TEMPESTA)) {
/* Use our data structure. */
} else {
/* Use traditional send queue. */
}
#else
#endif
}

tcp_send_head is called from several places in TCP code, so we should patch the whole function, not replace it in tcp_write_xmit.

Then we should check all places in linux kernel code, where tcp_send_head is called and patch them if it is required.

@EvgeniiMekhanik
Copy link
Contributor

EvgeniiMekhanik commented Jul 26, 2023

How it is done in nginx.

As i see prioritization model in nginx is very bad. ngnix builds stream dependency tree as it described in the RFC 7540, but uses it very bad. There is a main function for adding frame in the send queue.

static ngx_inline void
ngx_http_v2_queue_frame(ngx_http_v2_connection_t *h2c,
    ngx_http_v2_out_frame_t *frame)
{
    ngx_http_v2_out_frame_t  **out;

    for (out = &h2c->last_out; *out; out = &(*out)->next) {

        if ((*out)->blocked || (*out)->stream == NULL) {
            break;
        }

        if ((*out)->stream->node->rank < frame->stream->node->rank
            || ((*out)->stream->node->rank == frame->stream->node->rank
                && (*out)->stream->node->rel_weight
                   >= frame->stream->node->rel_weight))
        {
            break;
        }
    }

    frame->next = *out;
    *out = frame;
}

As we can see frame is inserted in the sending list according to rank (level in the tree) of the stream and weight. But it is not corresponds to RFC (we should not send data for streams which depends on other streams) and in theory we can have O(N) here if each next frame is more priority than previous one.

How it is done in h2o

Prioritization model in h2o is much more sophisticated. There is a tree of streams (each level of this tree contains array with 64 entries). Stream is inserted in this tree according to it's dependency and its weight. Also there is a bit array to fast find the most priority stream.

Priority struct suggestion

If we can use external code we can get h2o realization for our purpose or we can use Weighted Fair Queue from the WFQ. We should also use tree of such WFQ to take stream dependency into account. Also there are two different RFC 7540 5.3 and RFC 9218 which require different approach. All described below applies only to RFC7540 and i think that we should rely on this RFC now.

Prioritization implemenation.

According to previous investigation i see that h2o use tcp_info to calculate available CWND.

*mss = tcpi.tcpi_snd_mss;
*cwnd_size = tcpi.tcpi_snd_cwnd;
*cwnd_avail = tcpi.tcpi_snd_cwnd > tcpi.tcpi_unacked ? tcpi.tcpi_snd_cwnd - tcpi.tcpi_unacked + 2 : 2;

and then

sock->_latency_optimization.suggested_tls_payload_size = calc_suggested_tls_payload_size(sock, mss);
sock->_latency_optimization.suggested_write_size =
            cwnd_avail * (size_t)sock->_latency_optimization.suggested_tls_payload_size;

We should work same. First of all i think that since we deal with stream prioritization and add streams in the special tree, we should add skb_head pointer to the struct stream. Then in ss_do_send we should add all skbs to the stream (that is same as add them to priority structure) and call wrapper to __tcp_push_pending_frames. I think that we can add additional callback instread of sk_prepare_xmit at the beginning of tcp_write_xmit which should add skbs with already made frames from our priority tree to sk_write_queue according to current cwnd_avail. Also I think that we can make encryption here also using mss as a limit. It seems that we should not have any problems in this case: there are a lot of places in TCP code where sk_write_queue is used. But if there is no skb in this queue there is no problem! Later in when ack will come, tcp_data_snd_check will be called and new __tcp_push_pending_frames will be called again.

Why this proposal is better then previous one

If we change tcp_send_head in tcp_write_xmit to our own callback, we will have several problems. First of all there can be a chance that there are some skbs in the TP write queue. If we push skb from our priority tree to network we should update all this skbs. Also we call tcp_write_xmit in the loop, so we push all pending skbs, but we should push only skbs == available CWND not more. If we move skbs from our priority tree to the TCP write queue at the beginning of tcp_write_xmit we don't break anything.

We should discuss about what type of RFC we should support.

@EvgeniiMekhanik
Copy link
Contributor

We decide to support RFC 9218, since RFC 7540 is deprecated and there is a big chance that all modern clients will not support it soon. In the RFC 9218 there are two priority parameters urgency and incremental:

  • The urgency (u) parameter is an integer value between 0 and 7 inclusive, in descending order of priority.
  • The incremental (i) parameter value is boolean. It indicates if an HTTP response can be processed incrementally, i.e., provide some meaningful output as chunks of the response arrive. If a client makes concurrent requests with the incremental parameter set to false, there is no benefit in serving responses with the same urgency concurrently because the client is not going to process those responses incrementally.
    I think we should implement array (with size 7) of lists of streams. Each array entry contains TWO list head. First list head is a head of list of incremental streams and second is a head of non-incremental streams. New stream with appropriate urgency is added to appropriate list at the end of the list. All requests served in the order of it's urgency, we don't send requests with urgency 1 while there is any data to send for requests with urgency 0. Non incremental requests served one by one, without bandwidth sharing, but we share bandwidth between incremental and non incremental requests.

@krizhanovsky
Copy link
Contributor Author

Could you please check RFC 9218 support for HTTP/2 in nghttp2, Chrome and Firefox? With "check" I mean grepping source code and checking the issues, if they open (e.g. https://bugs.chromium.org/p/chromium/issues/detail?id=1404785)

@EvgeniiMekhanik
Copy link
Contributor

I check RFC 9218 support in several projects:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants