New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
L2 rework #10547
L2 rework #10547
Conversation
Codecov Report
@@ Coverage Diff @@
## master #10547 +/- ##
==========================================
+ Coverage 48.2% 48.28% +0.07%
==========================================
Files 279 279
Lines 43312 43316 +4
Branches 10372 10372
==========================================
+ Hits 20878 20913 +35
+ Misses 18288 18252 -36
- Partials 4146 4151 +5
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some left-over references to api->send
, and the code doesn't compile:
In file included from /zephyr-path/subsys/net/ip/net_if.c:15:
/zephyr-path/subsys/net/ip/net_if.c: In function ‘init_iface’:
/zephyr-path/subsys/net/ip/net_if.c:246:17: error: ‘const struct net_if_api’ has no member named ‘send’
NET_ASSERT(api->send);
^~
/zephyr-path/include/net/net_core.h:64:9: note: in definition of macro ‘NET_ASSERT’
if (!(cond)) { \
^~~~
make[2]: *** [zephyr/subsys/net/ip/CMakeFiles/subsys__net__ip.dir/build.make:89: zephyr/subsys/net/ip/CMakeFiles/subsys__net__ip.dir/net_if.c.obj] Error 1
make[1]: *** [CMakeFiles/Makefile2:1808: zephyr/subsys/net/ip/CMakeFiles/subsys__net__ip.dir/all] Error 2
Therefore I am unable to test it on sam gmac.
It always amaze me how local sanitycheck running never finds out such obvious thing... |
a68854a
to
c5cd160
Compare
d591f50
to
a5114c3
Compare
c677b87
to
14059f1
Compare
Overall looks good. |
include/net/ethernet.h
Outdated
@@ -329,7 +329,6 @@ struct ethernet_context { | |||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "net/ethernet: Remove as many ifdef for CONFIG_NET_VLAN as possible" could be sent to upstream separately.
I just tested this on sam e70 with the It is quite weird, your changes should only affect the the TX side in that driver, but zephyr seems to be leaking RX buffers. After pinging zephyr with
And it stops responding to further pings. I am going to dismiss my review for now, as this might be a broader issue. The TX side that you changed here seems to work fine. |
@tgorochowik thanks for retesting this. Looks like I may have messed up your driver because icmp echo replies are just built on rx echo requests packets, so these net_pkt do come for RX reserve. I'll check |
Well, that's bad. That's a hack, and we won't have a clean/reliable stack until we give that "optimization" up. RX packets are for RX, TX for TX. They may come from absolutely different pool implementations. E.g., from hardware-backed pool, allowing only RX or only TX, respectively. |
That has been like that almost since day 1. Actually, this RX/TX differentiation is going to disappear mostly in next to come buffer management rework (using various pools will remain). TX or RX you don't care much: if one of the pool if fully used, the system will eventually be unresponsive (normally, if it's not due to a leak - unlike what I did here - but an overload, timeout will kick in and system will start to response again). So it appears like a hack because of this tx/rx differentiation, but it's actually not a hack. |
Yeah, I know.
Well, user may not care, but hardware definitely does. The processes behind RX and TX are different, and buffer management is part of that process. And there's no need to "help" system become unresponsive by misusing buffers. What should be done with RX buffers is quickly parse/copy data out of it, and get buffer back into the pool to accept new data. Holding onto the RX buffer for longer than needed, misusing it to stuff TX data and putting it into TX queue (which can be arbitrarily long) should be big no-no. Again, IMHO we won't have a decent stack until we accept things above and implement it like that. Otherwise, this is "oh, BTW" note for now, it just your comment triggered things I had in mind for a while. |
Not really. Hardware buffers, or local driver buffers are always unrelated to net_pkt buffers. We always copy (either on tx or rx path, in the drivers). On rx path, if drivers are not able to get a free net_pkt or the necessary net_buf , it will have to drop the frame from its internal buffers. That's fully nominal. |
Ok, so let me reformulate: all the above true if we want [one sweet day] to achieve zero-copy networking. Given that we don't have a reliable networking so far, I'm happy with anything which helps to achieve that first. |
0-copy is a different beast. We'll try to get to that eventually, but we are far from it. |
d4e82fc
to
673ff91
Compare
This is currently unoptimized, as all frags are allocated with relevant ll reserve for such header space. However, this is the first step towards getting rid of that ll reserve concept everywhere. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Current code generating Ethernet header is scattered all over the place, sometimes in functions that are supposed to check something (and not filling the header). Not to say about innefficiency. Src ll address does not need to be set in L2 as net_if.c handles that already. Broadcast dst ll address is the same in ipv4 or ipv6, thus factorizing. In each case, multicast is filled in only at the relevant place. This is the first step towards changing L2 sending logic, when L2 send API function will be the only point of sending. The redirection from driver to L2 again (which finally uses the right device API function to send) it a temporary hack. This simplifies the code but will also enable using statically allocated net_buf and ethernet header payload buffer afterwards. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Currently, first part is done in L2's send, then the next one in ethernet device driver net_if send function. That last one was already moved to a L2 based implementation. Let's just move forward and place the whole logic of the L2's send in that second function. This is the first step, ethernet centric only, to move towards a one-pass sending logic in net stack. In future, net_if's send will disappear. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Now that sending is done at last time, in one pass, no need to go through net_if_send_data here. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Now instead of such path: net_if_send_data -> L2's send -> net_if tx_queue -> net_if_tx -> driver net_if's send It will be: net_if_send_data -> net_if tx_queue -> net_if_tx -> L2's send -> driver net_if's send Only Ethernet is adapted, but 15.4 and bt will follow up. All Ethernet drivers are made compatible with that new scheme also. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
And adapt loopback and slip drivers relevantly Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
As for Ethernet, up to ieee802154 L2's send to actually sent the packet. It's currently unoptimized as 6lo compression, 15.4 fragmentation and so on will reallocate net_buf etc... but it's the first step towards removing ll reserve space and more. Applying changes to Openthread L2 as well. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
This is a much more trivial move than in Ethernet or ieee802154 L2s. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Many tests use either Ethernet or Dummy L2 and as such require modifications towards the driver API on their fake devices. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
And apply that to modem driver setting that pointer to NULL. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Like ARP, GPTP works on top of Ethernet but is considered as a layer 2 protocol, so we cannot set an AF_* family type. Instead let's have a bit telling that current packet is a gptp message (family has to be AF_UNSPEC). The bit is unionized with a TCP related bit: TCP cannot be present on an AF_UNSPEC packet so there will not be any collision. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Instead of redoing what Ethernet L2 already does, let just create the gptp message without any Ethenet header. Which one will be done as sending phase by Ethernet L2 relevantly. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Such sent flag is in a union in net_pkt, shared with a gptp flag. Tweaking it when the family is not AF_INET or AF_INET6 will generate corrupted gptp packets. Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
673ff91
to
f8aa817
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, also the latest version fixes memory leak in sam-e70 board
subsys/net/l2/ethernet/arp.c
Outdated
net_pkt_set_vlan_tag(pkt, net_eth_get_vlan_tag(iface)); | ||
|
||
eth = net_eth_fill_header(ctx, pkt, htons(NET_ETH_PTYPE_ARP), | ||
NULL, NULL); | ||
if(!net_eth_fill_header(ctx, pkt, htons(NET_ETH_PTYPE_ARP), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, it went to the mainline like that, with missing space after "if"?
@@ -71,7 +71,8 @@ void net_arp_init(void); | |||
*/ | |||
|
|||
#else /* CONFIG_NET_ARP */ | |||
|
|||
#define net_arp_prepare(_kt, _u1, _u2) _kt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice names.
@@ -49,8 +49,9 @@ struct net_l2 { | |||
* This function is used by net core to push a packet to lower layer | |||
* (interface's L2), which in turn might work on the packet relevantly. | |||
* (adding proper header etc...) | |||
* Returns a negative error code, or the number of bytes sent otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Driver patches as included in this commit don't seem to follow "the number of bytes sent otherwise", so stats would be off (unless I'm mixing up all the send()'s around).
|
||
if (net_pkt_family(pkt) != AF_INET6) { | ||
return NET_DROP; | ||
return -EINVAL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe -EPROTONOTSUP or how it's called?
* to be sent but has not reached the driver | ||
* yet. Used only if defined(CONFIG_NET_TCP) | ||
*/ | ||
union { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure bitfield union works as expected? I'd expect compiler bugs.
@tbursztyka, I'm curious how well this was tested. Changes to existing Ethernet drivers are pretty light, whereas this refactor immediately breaks my cute WIP driver #11680, where I took freedom to rely on the fact that any net_buf (except last) has len of even u32_t's. Apparently, any optimization in Zephyr is premature so far ;-). But I wonder if no other driver had anything like that? |
@pfalcon would have been nice to get your review before it got merged. This PR has been public for more than a month. |
Really? In my list it's up since ~ end of June, before you went on leave, though maybe it was just a branch, not PR then. Who'd have known that would be merged last week? ;-) Otherwise, I kinda skip such reviews, because you definitely already heard my comments: this is too complex, can we split it up, etc. I just accept that some changes are necessarily complex, and rely on our discussion in Le Mans that it's otherwise OK. You're absolutely right that I should at least eyeball them for simple issues nonetheless. |
Changing the way packet are sent: instead of going like net_if_send_data -> l2 send -> net_if tx queue -> net_if_tx -> driver send
it will be now:
net_if_send _data -> net_if tx queue -> net_if_tx -> l2 send -> driver send
This will enable removing ll reserve from l2 as well as doing interesting optimization in L2 side.
Now L2 is responsible for unrefing the packet on success. (net_if still keeps that responsibility in case of error) so drivers do not have to care about it.
l2 send does not return a verdict, but a int: 0+ (lenght of sent packet) on succes, negative errno otherwise.
driver API have to provide their own send/tx function, relatively to their respective L2. There is no more the net_if_api send function. That will permit l2 to have whatever signature they need for that function (15.4 has done so for quite a while, but it was thus a hack).
@tgorochowik Could you check if the sam gmac driver still works fine? It's a bit different in there, as it queues tx pkt. I hope I did not break it.
@Vudentz It was relatively simple in bt L2 to do the change, but I could not test it. I believe it's fine though.
All of this is part of the preliminary work to revise net_pkt buffer management totally.