-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cloning shared skbs #173
Comments
Related to these fixes: #56 (comment) |
In the regular course of action When a client of a back end server sit on the same host as Tempesta, SKBs are transmitted and received via the loopback driver. It means that the same cloned SKBs that are sent out, are received and make their way up the stack to Tempesta. In other words, receiving cloned SKBs is a pretty normal situation in the network stack. The stack is aware of that, and knows how to work with that. Note that utilities like The fix mentioned above makes SKBs shared on their way out of Tempesta to a client or a back end server ( What is unusual about the fix is that is makes cloned SKBs shared, which looks more like an oxymoron as "cloning" and "sharing" of SKBs are just two different kinds of sharing of SKBs. So, usually, these two kinds of sharing do not appear together in the same SKB. Having both in the same SKB is totally unconventional. Tempesta needs to be able to modify data in an SKB, like inserting or removing HTTP headers. That may happen anywhere in the data, not just in the packet header. Neither cloning or sharing of SKBs provides that, and only a full SKB copy can provide full protection. Still, SKBs that get to Tempesta are assumed to be in sole possession of the Linux kernel, so it's assumed that data in those SKBs can be modified. Usually that is the case anyway, but it needs to be stated clearly. With all of the above, there are several places where Tempesta and the kernel need modifications:
|
Current solution is to unclone all cloned SKBs that come to Tempesta (see 4971758). That allows Tempesta to not worry about cloned packets altogether. Incoming cloned SKBs are unwelcome in following situations:
The SKBs are almost surely modified for HTTP requests, but at this time there's no modification of HTTP responses. In case performance becomes critical, it may make sense to unclone SKBs only when necessary, i.e. when they are definitely modified in any of the cases above. Also, in case cloned SKBs are seen on the way out of Tempesta (in case we decide to do it in an "effective way" in future), certain measures described above will be necessary that deal with SKBs that are both cloned and shared, as Tempesta makes SKBs shared on their way out. The bigger issue remains that an SKB may still be split or merged in the kernel network stack, and that may make pointers into the SKB invalid. These are the pointers that point to every small part of an HTTP message, like each header field, and the body. These pointers are used later for caching of HTTP messages. A solution needs to be found. |
Actually, there is no problem with skb handled data, only network header pointers are adjusted while data is kept untouched. See #122 (comment). There is example of sendfile(2) which can send the same data many times and surely w/o mangling of original pages. Meantime, skb head data also kept safe. |
I don't think that's completely true. Below are the cases where Tempesta can lose SKB data or end up with invalid pointers to HTTP message data stored in an SKB.
|
There is also GFSM issue. Lets assume HTTP FSM calls other FSM, for example ICAP, and the called FSM returns POSTPONE code. This means that the FSM uses the skb (probably not only data kept by the skb, but also the skb itself in case of TL program or low-level network classifier) and lower layer, the caller, FSM must not free the skb. This is not the case for current Frang implementation, but we must keep this in mind during #102 implementing. Skb can be freed when all GFSM and/or TL programs return PASS or BLOCK on it. This could be tracked by bitmask, i.e. skb can be freed when POSTPONE bit is zero. Since network layer is stateless, then probably we can make an agreement than low layer network classifiers don't postpone skbs while the higher layer GFSM users use skb data only and we can free skbs and get/put data pages carries by skbs (this is originally @keshonok proposal). |
Also review the patch https://github.com/krizhanovsky/linux-3.10.10-sync_sockets/commit/926bef5ea8b1df0136c6bfae41c50984a14c787a UPD: the patch is the part of #56 (comment) fix. |
|
Actually there is not much we can do: This is pure linux kernel issue which should be investigated more. For now 4.1.6 does ever more |
The patch 2636242 makes the skb 'shared', i.e.
skb_shared()
returns true, sotcp_transmit_skb()
makes the skb copy or clone and this is surely unwished behavior. Also, for examplepacket_rcv()
from net/packet/af_packet.c does skb clones on following pattern:So the fix introduces performance problem by unnecessary copying. Also since we do modify skb data we should prohibit skb clones at all in Tempesta path with something like:
And the patch seems leads to real skb dops on ingress packets:
The problem also should be debugged and we should prohibit cloned skbs in Tempesta path.
The text was updated successfully, but these errors were encountered: